日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

java写入carbondata_Carbondata使用过程中遇到的几个问题及解决办法

發布時間:2025/3/11 编程问答 46 豆豆
生活随笔 收集整理的這篇文章主要介紹了 java写入carbondata_Carbondata使用过程中遇到的几个问题及解决办法 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文總結了幾個本人在使用 Carbondata 的時候遇到的幾個問題及其解決辦法。這里使用的環境是:Spark 2.1.0、Carbondata 1.2.0。

必須指定 HDFS nameservices

在初始化 CarbonSession 的時候,如果不指定 HDFS nameservices,在數據導入是沒啥問題的;但是數據查詢會出現相關數據找不到問題: scala> val carbon = SparkSession.builder().tempnfig(sc.getConf).getOrCreateCarbonSession("hdfs:///user/iteblog/carb")

scala> carbon.sql("""CREATE TABLE temp.iteblog(id bigint) STORED BY 'carbondata'""")

17/11/09 16:20:58 AUDIT command.CreateTable: [www.iteblog.com][iteblog][Thread-1]Creating Table with Database name [temp] and Table name [iteblog]

17/11/09 16:20:58 WARN hive.HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `temp`.`iteblog` into Hive metastore in Spark SQL specific format, which is NOT tempmpatible with Hive.

17/11/09 16:20:59 AUDIT command.CreateTable: [www.iteblog.com][iteblog][Thread-1]Table created with Database name [temp] and Table name [iteblog]

res2: org.apache.spark.sql.DataFrame = []

scala> carbon.sql("insert overwrite table temp.iteblog select id from temp.mytable limit 10")

17/11/09 16:21:46 AUDIT rdd.CarbonDataRDDFactory$: [www.iteblog.com][iteblog][Thread-1]Data load request has been received for table temp.iteblog

17/11/09 16:21:46 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT

17/11/09 16:23:03 AUDIT rdd.CarbonDataRDDFactory$: [www.iteblog.com][iteblog][Thread-1]Data load is successful for temp.iteblog

res3: org.apache.spark.sql.DataFrame = []

scala> carbon.sql("select * from temp.iteblog limit 10").show(10,100)

17/11/09 16:23:15 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 3.0 (TID 1011, static.iteblog.com, executor 2): java.lang.RuntimeException: java.io.FileNotFoundException: /user/iteblog/carb/temp/iteblog/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510215706696.carbondata (No such file or directory)

at org.apache.carbondata.tempre.indexstore.blockletindex.IndexWrapper.(IndexWrapper.java:39)

at org.apache.carbondata.tempre.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:141)

at org.apache.carbondata.tempre.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:216)

at org.apache.carbondata.tempre.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:36)

at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRetemprdReader.initialize(VectorizedCarbonRetemprdReader.java:116)

at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:229)

at org.apache.carbondata.spark.rdd.CarbonRDD.tempmpute(CarbonRDD.scala:62)

at org.apache.spark.rdd.RDD.tempmputeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.rdd.MapPartitionsRDD.tempmpute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.tempmputeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.rdd.MapPartitionsRDD.tempmpute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.tempmputeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

at org.apache.spark.scheduler.Task.run(Task.scala:99)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)

at java.util.tempncurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.tempncurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.io.FileNotFoundException: /user/iteblog/carb/temp/iteblog/Fact/Part0/Segment_0/part-0-0_batchno0-0-1510215706696.carbondata (No such file or directory)

at java.io.FileInputStream.open(Native Method)

at java.io.FileInputStream.(FileInputStream.java:138)

at java.io.FileInputStream.(FileInputStream.java:93)

at org.apache.carbondata.tempre.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:128)

at org.apache.carbondata.tempre.reader.ThriftReader.open(ThriftReader.java:77)

at org.apache.carbondata.tempre.reader.CarbonHeaderReader.readHeader(CarbonHeaderReader.java:46)

at org.apache.carbondata.tempre.util.DataFileFooterConverterV3.getSchema(DataFileFooterConverterV3.java:90)

at org.apache.carbondata.tempre.util.CarbonUtil.readMetadatFile(CarbonUtil.java:925)

at org.apache.carbondata.tempre.indexstore.blockletindex.IndexWrapper.(IndexWrapper.java:37)

... 20 more

可以看出,如果創建 CarbonSession 的時候,如果不指定 HDFS nameservices,在數據導入是沒啥問題的;查找的時候就會出現文件找不到。這個最直接的解決版本就是創建 CarbonSession 的時候指定 HDFS nameservices。針對這個問題一個改進措施是讓 Carbondata 能夠根據提供的 hadoop 配置信息自動補充 HDFS nameservices 信息。

不支持 tinyint 數據類型 scala> carbon.sql("""CREATE TABLE temp.iteblog(status tinyint) STORED BY 'carbondata'""")

org.apache.carbondata.spark.exception.MalformedCarbonCommandException: Unsupported data type: StructField(status,ByteType,true).getType

at org.apache.spark.sql.parser.CarbonSpark2SqlParser$$anonfun$getFields$1.apply(CarbonSpark2SqlParser.scala:427)

at org.apache.spark.sql.parser.CarbonSpark2SqlParser$$anonfun$getFields$1.apply(CarbonSpark2SqlParser.scala:417)

at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

at scala.collection.immutable.List.foreach(List.scala:381)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)

at scala.collection.immutable.List.map(List.scala:285)

at org.apache.spark.sql.parser.CarbonSpark2SqlParser.getFields(CarbonSpark2SqlParser.scala:417)

at org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable(CarbonSparkSqlParser.scala:135)

at org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable(CarbonSparkSqlParser.scala:72)

at org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateTableContext.accept(SqlBaseParser.java:578)

at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)

at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66)

at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66)

at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93)

at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:65)

at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:54)

at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53)

at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:82)

at org.apache.spark.sql.parser.CarbonSparkSqlParser.parse(CarbonSparkSqlParser.scala:68)

at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)

at org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan(CarbonSparkSqlParser.scala:49)

at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)

... 50 elided

這是因為 Carbondata 目前不支持 tinyint 類型,Carbondata 目前支持的數據類型可以參見:http://carbondata.apache.org/supported-data-types-in-carbondata.html。但是奇怪的是 CARBONDATA-18 這里面已經解決了這個問題,不知道為啥到當前版本卻不支持了。

添加分區出現NoSuchTableException

如果你使用 ALTER TABLE temp.iteblog ADD PARTITION('2017') 語句來添加分區,你會遇到下面的異常: scala> carbon.sql("ALTER TABLE temp.iteblog ADD PARTITION('2012')")

org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'iteblog' not found in database 'default';

at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76)

at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76)

at scala.Option.getOrElse(Option.scala:121)

at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:76)

at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)

at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110)

at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110)

at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)

at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:109)

at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601)

at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601)

at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)

at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:600)

at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:106)

at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:69)

at org.apache.spark.sql.hive.CarbonSessionCatalog.lookupRelation(CarbonSessionState.scala:83)

at org.apache.spark.sql.internal.CatalogImpl.refreshTable(CatalogImpl.scala:461)

at org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.processSchema(carbonTableSchema.scala:283)

at org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.run(carbonTableSchema.scala:229)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)

at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)

at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)

at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)

at org.apache.spark.sql.Dataset.(Dataset.scala:185)

at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)

at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)

... 50 elided

運行上面的SQL語句,Carbondata 表相關的分區其實已經添加好了,但是通過 Spark 刷新表的相關信息就出錯了。從出錯的信息可以看出,雖然我們已經傳遞了表所在的 DB 相關信息,但是 Spark 的 catalyst 并沒有獲取到,這個 bug 是因為代碼里面并沒有將表數據相關信息傳遞給 catalyst, 這個 bug 還影響分區的 split 相關操作。不過此 bug 在 CARBONDATA-1593 里面已經解決。

insert overwrite 操作超過三次將會出現 NPE

如果你在導數的時候執行 insert overwrite 大于等于三次,那么恭喜你,你肯定會遇到下面的異常,如下: scala> carbon.sql("insert overwrite table temp.iteblog select id from co.order_common_p where dt = '2012-10'")

17/10/26 13:00:05 AUDIT rdd.CarbonDataRDDFactory$: [www.iteblog.com][iteblog][Thread-1]Data load request has been received for table temp.iteblog

17/10/26 13:00:05 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT

17/10/26 13:00:08 ERROR filesystem.AbstractDFSCarbonFile: main Exception occurred:File does not exist: hdfs://mycluster/user/iteblog/carb/temp/iteblog/Fact/Part0/Segment_0

17/10/26 13:00:09 ERROR command.LoadTable: main

java.lang.NullPointerException

at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)

at org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)

at org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)

at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)

at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

at org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)

at org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:333)

at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)

at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)

at org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)

at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)

at org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)

at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)

at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)

at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)

at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)

at org.apache.spark.sql.Dataset.(Dataset.scala:185)

at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)

at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)

at $line20.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:31)

at $line20.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)

at $line20.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:38)

at $line20.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:40)

at $line20.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:42)

at $line20.$read$$iw$$iw$$iw$$iw$$iw.(:44)

at $line20.$read$$iw$$iw$$iw$$iw.(:46)

at $line20.$read$$iw$$iw$$iw.(:48)

at $line20.$read$$iw$$iw.(:50)

at $line20.$read$$iw.(:52)

at $line20.$read.(:54)

at $line20.$read$.(:58)

at $line20.$read$.()

at $line20.$eval$.$print$lzycompute(:7)

at $line20.$eval$.$print(:6)

at $line20.$eval.$print()

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)

at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)

at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)

at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)

at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)

at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)

at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)

at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)

at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)

at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)

at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)

at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)

at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)

at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923)

at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)

at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)

at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)

at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)

at org.apache.spark.repl.Main$.doMain(Main.scala:68)

at org.apache.spark.repl.Main$.main(Main.scala:51)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

17/10/26 13:00:09 AUDIT command.LoadTable: [www.iteblog.com][iteblog][Thread-1]Dataload failure for temp.iteblog. Please check the logs

java.lang.NullPointerException

at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)

at org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)

at org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)

at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)

at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

at org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)

at org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:333)

at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)

at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)

at org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)

at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)

at org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)

at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)

at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)

at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)

at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)

at org.apache.spark.sql.Dataset.(Dataset.scala:185)

at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)

at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)

... 50 elided

scala>

雖然出現 NPE 異常,但是數據其實已經導到 Carbondata 相關表里面了。引起這個異常的原因其實是因為每次執行完 insert overwrite 操作的時候,都需要刪除之前的數據(也就是Segment目錄)。但是 Segment 目錄存在重復刪除,導致找不到相關目錄所以出現了 NPE 異常。這個問題在 CARBONDATA-1486 解決了。

不支持超過32767個字符的列

如果你有一列數據長度大于32767(Short.MaxValue),并且 enable.unsafe.sort=true ,那么你往 Carbondata 表導數據的時候會出現異常,如下: java.lang.NegativeArraySizeException

at org.apache.carbondata.processing.newflow.sort.unsafe.UnsafeCarbonRowPage.getRow(UnsafeCarbonRowPage.java:182)

at org.apache.carbondata.processing.newflow.sort.unsafe.holder.UnsafeInmemoryHolder.readRow(UnsafeInmemoryHolder.java:63)

at org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startSorting(UnsafeSingleThreadFinalSortFilesMerger.java:114)

at org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger.startFinalMerge(UnsafeSingleThreadFinalSortFilesMerger.java:81)

at org.apache.carbondata.processing.newflow.sort.impl.UnsafeParallelReadMergeSorterImpl.sort(UnsafeParallelReadMergeSorterImpl.java:105)

at org.apache.carbondata.processing.newflow.steps.SortProcessorStepImpl.execute(SortProcessorStepImpl.java:62)

at org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:87)

at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:51)

at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.(NewCarbonDataLoadRDD.scala:442)

at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.internalCompute(NewCarbonDataLoadRDD.scala:405)

at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:62)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

這是 Carbondata 設計的缺陷,目前沒有辦法解決這個問題,不過可以實現一個類似于 varchar(size) 的數據類型。

日期格式錯誤導致數據丟失

如果你將帶有日期類型的數據導入到 Carbondata 表中,可能會出現數據丟失: scala> carbon.sql("""CREATE TABLE temp.iteblog(dt DATE) STORED BY 'carbondata'""")

17/11/09 16:44:46 AUDIT temp.mand.CreateTable: [www.iteblog.com][iteblog][Thread-1]Creating Table with Database name [temp] and Table name [iteblog]

17/11/09 16:44:47 WARN hive.HiveExternalCatalog: Couldn't find temp.responding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `temp`.`iteblog` into Hive metastore in Spark SQL specific format, which is NOT temp.patible with Hive.

17/11/09 16:44:47 AUDIT temp.mand.CreateTable: [www.iteblog.com][iteblog][Thread-1]Table created with Database name [temp] and Table name [iteblog]

res1: org.apache.spark.sql.DataFrame = []

scala> carbon.sql("select dt from temp.mydate").show(10,100)

17/11/09 16:44:52 ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library

+--------+

| dt|

+--------+

|20170509|

|20170511|

|20170507|

|20170504|

|20170502|

|20170506|

|20170501|

|20170508|

|20170510|

|20170505|

+--------+

only showing top 10 rows

scala> carbon.sql("insert into table temp.iteblog select dt from temp.mydate limit 10")

17/11/09 16:45:14 AUDIT rdd.CarbonDataRDDFactory$: [www.iteblog.com][iteblog][Thread-1]Data load request has been received for table temp.iteblog

17/11/09 16:45:14 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT

17/11/09 16:45:16 AUDIT rdd.CarbonDataRDDFactory$: [www.iteblog.com][iteblog][Thread-1]Data load is successful for temp.iteblog

res3: org.apache.spark.sql.DataFrame = []

scala> carbon.sql("select * from temp.iteblog limit 10").show(10,100)

+----+

| dt|

+----+

|null|

|null|

|null|

|null|

|null|

|null|

|null|

|null|

|null|

|null|

+----+

這是因為 Carbondata 對數據類型(DATE)有默認的格式,由參數 carbon.date.format 控制,默認值是 yyyy-MM-dd。所以你使用 yyyy-MM-dd 格式去解析 20170505 數據肯定會出現錯誤,從而導致數據丟失了。同理,時間戳類型(TIMESTAMP) 也有默認的格式,由參數 carbon.timestamp.format 空值,默認值為 yyyy-MM-dd HH:mm:ss。

總結

以上是生活随笔為你收集整理的java写入carbondata_Carbondata使用过程中遇到的几个问题及解决办法的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 国产youjizz | 亚洲福利精品视频 | 日本在线视频不卡 | 日韩五码在线 | 日韩播放 | 免费看欧美成人a片无码 | www.999av| 婷婷伊人五月天 | 日韩视频一二三区 | 久久久精品人妻一区二区三区色秀 | 富二代成人短视频 | 国产乱子伦农村叉叉叉 | 日本乱偷中文字幕 | 欧美巨鞭大战丰满少妇 | 日本一区二区三区免费电影 | 熟女高潮一区二区三区 | 亚洲男人天堂2019 | 国产精品不卡av | 那里可以看毛片 | 精品国产69 | 中文字幕一区二区三区四区欧美 | 大尺度做爰呻吟舌吻网站 | 亚洲日皮 | 在线成人国产 | 小镇姑娘高清播放视频 | 中文在线免费视频 | 三级网站在线 | 中文字幕va| 国产香蕉尹人视频在线 | 七月婷婷综合 | 欧美黑人三级 | 五月天在线播放 | 手机在线免费视频 | 超碰精品在线 | 亚洲精品无码久久久久久久 | 伊人久久久久久久久久久 | 亚洲国产精品无码久久久 | 日本一本不卡 | 亚洲综合av网| 中文字幕激情视频 | 乌克兰极品av女神 | 亚洲成人播放 | 成人免费黄色 | 欧美视频1区 | 五月综合色婷婷 | 91 免费看片 | 超碰97av在线| 国产免费黄色小视频 | 91无限观看 | 日韩在线播放av | 95国产精品 | 伊人网站| 国产成人99久久亚洲综合精品 | 色美av| 国产精视频 | 91理论片午午伦夜理片久久 | 无码aⅴ精品一区二区三区浪潮 | 五月天丁香视频 | 欧美肥老妇视频九色 | 亚洲国产成人一区二区精品区 | 94av | 亚洲情人网 | av一区不卡 | 国产伦精品一区二区三区免费迷 | 影音先锋亚洲精品 | 四虎成人影视 | 91chinese在线| 婷婷色网站 | 91精品国产91久久久久久 | 色窝窝综合色窝窝久久 | 另类少妇人与禽zozz0性伦 | 黄色片在线观看免费 | 国产一区不卡 | 97成人免费 | 黄色三级免费 | 久久丝袜美腿 | 日韩少妇内射免费播放 | www.日日夜夜 | 久草视频免费在线观看 | av伊人久久 | 久久精品国产亚洲AV高清综合 | 爱爱视频欧美 | 公侵犯人妻一区二区 | 丰满双乳秘书被老板狂揉捏 | 激情成人av | 黄色片在哪里看 | 国产成人午夜高潮毛片 | 亚洲一区二区伦理 | 亚洲第一页中文字幕 | 国产老女人乱淫免费 | 久久综合狠狠 | 第一页综合 | av免费在线播放 | 高h奶汁双性受1v1 | 国产精品免费一区 | 国产精品免费视频一区 | 精品久久久无码中文字幕边打电话 | 精品人妻人人做人人爽 | 欧美精品福利 |