Impala查询 - HDFS缓存数据
HDFS緩存數(shù)據(jù)命令
- 查看緩存池信息
- 查看已緩存的數(shù)據(jù)信息
- Impala表卸載緩存數(shù)據(jù)
- 創(chuàng)建緩存池
- 顯示表狀態(tài)
- 將表加入緩存
- 指定分區(qū)將表加入緩存
性能測試結(jié)果:
| 是否緩存 | 數(shù)據(jù)條數(shù) | 數(shù)據(jù)量 | 處理時間(并發(fā)1) | 處理時間(并發(fā)3) | 處理時間(并發(fā)5) |
| 是 | - | 116.5G | 8s | 20s | 29s |
| 否 | - | 116.5G | 68.17 | 136 | 240 |
| 是 | - | 72.7G | 8.38 | 20.6 | 30.3 |
| 否 | - | 72.7G | 80s | 165s | 235s |
?
異常處理
1.hdfs緩存池空間不足
?
[fwqzx002.zh:21000] ods_crawler> alter table xxx partition (pt_created_date='201811') set cached in 'article_pool'; Query: alter table xxx partition (pt_created_date='201811') set cached in 'article_pool' ERROR: ImpalaRuntimeException: Caching path /user/hive/warehouse/xxxx/pt_created_date=201811 of size 24274436556 bytes at replication 1 would exceed pool article_pool's remaining capacity of 20450868109 bytes.at org.apache.hadoop.hdfs.server.namenode.CacheManager.checkLimit(CacheManager.java:405)at org.apache.hadoop.hdfs.server.namenode.CacheManager.addDirective(CacheManager.java:531)at org.apache.hadoop.hdfs.server.namenode.FSNDNCacheOp.addCacheDirective(FSNDNCacheOp.java:45)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheDirective(FSNamesystem.java:6782)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addCacheDirective(NameNodeRpcServer.java:1883)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addCacheDirective(ClientNamenodeProtocolServerSideTranslatorPB.java:1265)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)CAUSED BY: InvalidRequestException: Caching path /user/hive/warehouse/xxxx/pt_created_date=201811 of size 24274436556 bytes at replication 1 would exceed pool article_pool's remaining capacity of 20450868109 bytes.at org.apache.hadoop.hdfs.server.namenode.CacheManager.checkLimit(CacheManager.java:405)at org.apache.hadoop.hdfs.server.namenode.CacheManager.addDirective(CacheManager.java:531)at org.apache.hadoop.hdfs.server.namenode.FSNDNCacheOp.addCacheDirective(FSNDNCacheOp.java:45)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheDirective(FSNamesystem.java:6782)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addCacheDirective(NameNodeRpcServer.java:1883)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addCacheDirective(ClientNamenodeProtocolServerSideTranslatorPB.java:1265)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)CAUSED BY: RemoteException: Caching path /user/hive/warehouse/xxxx/pt_created_date=201811 of size 24274436556 bytes at replication 1 would exceed pool article_pool's remaining capacity of 20450868109 bytes.at org.apache.hadoop.hdfs.server.namenode.CacheManager.checkLimit(CacheManager.java:405)at org.apache.hadoop.hdfs.server.namenode.CacheManager.addDirective(CacheManager.java:531)at org.apache.hadoop.hdfs.server.namenode.FSNDNCacheOp.addCacheDirective(FSNDNCacheOp.java:45)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheDirective(FSNamesystem.java:6782)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addCacheDirective(NameNodeRpcServer.java:1883)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addCacheDirective(ClientNamenodeProtocolServerSideTranslatorPB.java:1265)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)原因是創(chuàng)建時指定-limit 40000000000,即大小為40000000000 字節(jié) ≈ 37.25GB,而我緩存的數(shù)據(jù)是18.21(已成功緩存分區(qū))+22.61(緩存異常分區(qū)) 大于
之前創(chuàng)建緩存池命令
hdfs cacheadmin -addPool article_pool -owner impala -limit 40000000000?解決方法:修改緩存池大小為自己可用值,默認(rèn)為不限制
2.HDFS未全部緩存分區(qū)數(shù)據(jù)
明顯看出數(shù)據(jù)Size 大小與BytesCached大小不相同
[fwqzx002.zh:21000] ods_crawler> show table stats tableName ; Query: show table stats tableName +-----------------+-------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------------------------------------------------------------------------------+ | pt_created_date | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +-----------------+-------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------------------------------------------------------------------------------+ | 201810 | -1 | 75 | 18.21GB | 5.36GB | 1 | PARQUET | false | hdfs://nameservice1/user/hive/warehouse/tableName /pt_created_date=201810 | | 201811 | -1 | 94 | 22.61GB | 55.35MB | 1 | PARQUET | false | hdfs://nameservice1/user/hive/warehouse/tableName /pt_created_date=201811 | | 201812 | -1 | 141 | 33.70GB | 33.22GB | 1 | PARQUET | false | hdfs://nameservice1/user/hive/warehouse/tableName /pt_created_date=201812 | | Total | -1 | 310 | 74.51GB | 38.63GB | | | | | +-----------------+-------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------------------------------------------------------------------------------+?查看hdfs,發(fā)現(xiàn)需要緩存大小(BYTES_NEEDED)與實際緩存(BYTES_CACHED)的大小不相同
[hdfs@fwqzx002 root]$ hdfs cacheadmin -listDirectives -stats Found 4 entriesID POOL REPL EXPIRY PATH BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED20 article_pool3 1 never /user/hive/warehouse/tableName/pt_created_date=201812 36183104282 35666895768 141 13921 article_pool3 1 never /user/hive/warehouse/tableName 0 0 0 022 article_pool3 1 never /user/hive/warehouse/tableName/pt_created_date=201810 19549131891 5751122434 75 2223 article_pool3 1 never /user/hive/warehouse/tableName/pt_created_date=201811 24274436556 58042919 94 1實際操作中未報異常,猜測HDFS緩存達(dá)到上限,去查看下HDFS配置
查看HDFS “dfs.datanode.max.locked.memory”參數(shù),發(fā)現(xiàn)為4G,集群中有10個DataNode節(jié)點,一共加起來最多緩存40G,上面實際緩存大小≈38G左右已基本達(dá)到上限(因為數(shù)據(jù)并非絕對的平均存儲,可能部分節(jié)點數(shù)據(jù)超過4G就會達(dá)到緩存上限)
?
嘗試修改“dfs.datanode.max.locked.memory”為50G(根據(jù)個人服務(wù)器適當(dāng)調(diào)整),結(jié)果全部緩存成功
?
- 注:show table stats bsl_zhongda_weibo_article_hive 與 hdfs cacheadmin -listDirectives -stats 區(qū)別在于,show table stats bsl_zhongda_weibo_article_hive 未完成Bytes Cached顯示為-1 ;后者是計算分配的空間,即后者非實際緩存完。
?
?
?
?
?
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的Impala查询 - HDFS缓存数据的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: F5 BIGip 负载均衡 IP算法解密
- 下一篇: Canal Mysql同步至ES/Hba