日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 >

在 aws emr 上,将 hbase table A 的数据,对 key 做 hash,写到另外一张 table B

發(fā)布時間:2025/4/16 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 在 aws emr 上,将 hbase table A 的数据,对 key 做 hash,写到另外一张 table B 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

先 scan 原表,然后 bulkload 到新表。

采坑紀(jì)錄
1. bulkload 產(chǎn)生 hfile 前,需要先對 hash(key) 做 repartition,在 shuffle 的 read 階段,產(chǎn)生了以下錯誤

org.apache.spark.shuffle.FetchFailedException: failed to allocate 16777216 byte(s) of direct memory (used: 3623878656, max: 3635150848) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:523) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:454) ... Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 3623878656, max: 3635150848) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640) at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594) ...

原因:在 shuffle 的 read 階段,會申請一個跟 block(或partition)一樣大小的內(nèi)存,因?yàn)槊總€分區(qū)過大,內(nèi)存不夠了
相關(guān)說明:https://issues.apache.org/jira/browse/SPARK-13510

因?yàn)閚etty默認(rèn)使用了offheap memory,所以報了這個錯誤。可選擇加入java參數(shù) "-Dio.netty.noUnsafe=true",不使用 offheap 內(nèi)存

?

2. bulkload 產(chǎn)生 hfile 的時候,多次發(fā)生因 executor 被 killed,導(dǎo)致 application 失敗。通過觀察,發(fā)現(xiàn)是 executor 往本地寫文件的時候,本地空間不夠了。
相關(guān)問題:https://stackoverflow.com/questions/29131449/why-does-hadoop-report-unhealthy-node-local-dirs-and-log-dirs-are-bad

于是增加 yarn 集群機(jī)器,使用 hdfs balancer 均衡數(shù)據(jù)分布。

============= yarn-nodemanager ============= 2019-02-15 10:18:45,562 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection (DiskHealthMonitor-Timer): Directory /mnt/yarn error, used space above threshold of 90.0%, removing from list of valid directories 2019-02-15 10:18:45,563 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection (DiskHealthMonitor-Timer): Directory /var/log/hadoop-yarn/containers error, used space above threshold of 90.0%, removing from list of valid directories 2019-02-15 10:18:45,563 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService (DiskHealthMonitor-Timer): Disk(s) failed: 1/1 local-dirs are bad: /mnt/yarn; 1/1 log-dirs are bad: /var/log/hadoop-yarn/containers 2019-02-15 10:18:45,563 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService (DiskHealthMonitor-Timer): Most of the disks failed. 1/1 local-dirs are bad: /mnt/yarn; 1/1 log-dirs are bad: /var/log/hadoop-yarn/containers 2019-02-15 10:18:45,789 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService (AsyncDispatcher event handler): Cache Size Before Clean: 589300919, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0 2019-02-15 10:18:46,668 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl (AsyncDispatcher event handler): Container container_1549968021090_0114_01_000006 transitioned from RUNNING to KILLING 2019-02-15 10:18:46,668 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl (AsyncDispatcher event handler): Container container_1549968021090_0114_01_000016 transitioned from RUNNING to KILLING============= yarn-resourcemanager.log ============= 019-02-15 10:18:45,664 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl (AsyncDispatcher event handler): Node ip-10-6-43-89.ap-south-1.compute.internal:8041 reported UNHEALTHY with details: 1/1 local-dirs are bad: /mnt/yarn; 1/1 log-dirs are bad: /var/log/hadoop-yarn/containers 2019-02-15 10:18:45,664 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl (AsyncDispatcher event handler): ip-10-6-43-89.ap-south-1.compute.internal:8041 Node Transitioned from RUNNING to UNHEALTHY 2019-02-15 10:18:45,664 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1549968021090_0114_01_000006 Container Transitioned from RUNNING to KILLED

?

3. shuffle 在讀取文件時,非常依賴 netty 的 offheap 堆棧,設(shè)置不使用 offheap memory 之后,會有以下錯誤(內(nèi)存調(diào)很大也會出現(xiàn))。

2019-02-17T02:56:12.949+0000: [Full GC (Ergonomics) [PSYoungGen: 465920K->465917K(931840K)] [ParOldGen: 2796146K->2796069K(2796544K)] 3262066K->3261987K(3728384K), [Metaspace: 67840K->67739K(1110016K)], 5.2891526 secs] [Times: user=18.15 sys=0.01, real=5.29 secs] # # java.lang.OutOfMemoryError: GC overhead limit exceeded # -XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 8023"...

  或者是

2019-02-17T02:59:43.073+0000: [Full GC (Ergonomics) [PSYoungGen: 123392K->123391K(422912K)] [ParOldGen: 2796365K->2796364K(2796544K)] 2919757K->2919756K(3219456K), [Metaspace: 67051K->67051K(1107968K)], 3.3979517 secs] [Times: user=13.45 sys=0.00, real=3.39 secs] 2019-02-17T02:59:43.073+0000: [Full GC (Ergonomics) ............ 2019-02-17T02:59:43.073+0000: [Full GC (Ergonomics) ............ 2019-02-17T02:59:43.073+0000: [Full GC (Ergonomics) ............ 2019-02-17T02:59:43.073+0000: [Full GC (Ergonomics) ............ExecutorLostFailure (executor 6 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 125095 ms

  

因?yàn)?shuffle 本身不占用太多內(nèi)存,但產(chǎn)生 hfile 之前的 sort 需要很多內(nèi)存,在 spark 的統(tǒng)一內(nèi)存管理模型中,這是 other 部分的空間。推測是 spark 統(tǒng)一內(nèi)存模型,計算出現(xiàn)錯誤,擠壓了 other 部分的空間大小。于是加入下面的參數(shù)

spark.memory.fraction=0.2

?

4. 產(chǎn)生了 HFile 之后,需要導(dǎo)入到 hbase,遇到下面問題

Sat Feb 16 21:48:36 UTC 2019, RpcRetryingCaller{globalStartTime=1550353152797, pause=100, retries=35}, org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.ipc.RemoteException): org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hbase/data/data/ap/users_v2/9a9d8ee1e23d335afb01aced349054d8/.tmp/70ad5fa3d4834fb6a47abee6101594ff could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation.at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3372)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3296)

因?yàn)槲覀?spark 與 hbase 不在一個 yarn 上,沒有共享 hdfs。一開始 HFile 寫在 spark 的集群上,于是產(chǎn)生了很多問題。之后改成 HFile 寫在 hbase 的同集群中,這一步很快就過了。具體原因不詳。

?

5. emr的配置中,spark 的本地文件緩存路徑為 /mnt/yarn/usercache/hadoop/appcache/application_1549968021090_0135/blockmgr-535fd27a-4b80-4116-b855-17ab7be68f1c。與 hdfs 在一個硬盤上。

?

===============================================================================================

最終提交 spark 的命令為

spark-submit \--master yarn \--name UserTableFromPrimitiveToV2 \--queue default \--deploy-mode cluster \--driver-cores 2 \--driver-memory 5g \--num-executors 30 \--executor-cores 2 \--executor-memory 4g \--conf spark.driver.memoryOverhead=1g \--conf spark.executor.memoryOverhead=2g \--conf spark.dynamicAllocation.enabled=false \--conf spark.yarn.maxAppAttempts=1 \--conf spark.blacklist.enabled=false \--conf spark.memory.fraction=0.2 \--conf spark.executor.extraJavaOptions="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=0 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseParallelGC -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p' -Dio.netty.noUnsafe=true" \--class com.hotstar.ap.ingest.batch.tool.migration.UserTableFromPrimitiveToV2 \./batch/build/libs/batch-all.jar \-e dev

  

轉(zhuǎn)載于:https://www.cnblogs.com/keepthinking/p/10386811.html

總結(jié)

以上是生活随笔為你收集整理的在 aws emr 上,将 hbase table A 的数据,对 key 做 hash,写到另外一张 table B的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。