大数据报错问题
20220308
rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@2f312c56.
操作次數太大可能上億
Failed to connect to master k8s04:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: hadoop和spark服務沒開啟
20220214
SQL 錯誤 [1064] [42000]: there is no scanNode Backend
SQL 錯誤: Error executing query
服務器連不上,堡壘機出問題了或者大數據組件沒成功啟動
各組件ui控制臺打不開很可能就是防火墻沒關
https://jingyan.baidu.com/article/ff42efa9fd8c1cc19e2202bb.html
18/11/20 16:44:44 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message. org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
在配置資源中加入這句話也許能解決你的問題:–conf spark.dynamicAllocation.enabled=false
https://www.codercto.com/a/39980.html
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1091099277-192.168.100.251-
https://blog.csdn.net/weixin_44519124/article/details/107211857
HIVE啟動錯誤:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeExcept
https://blog.csdn.net/qq_34885598/article/details/84935218
20211229
https://blog.csdn.net/u010374412/article/details/103148374
Spark-SparkSession.Builder 源碼解析
spark.master 可以是 local、lcoal[*]、local[int]
20211220
Doris暫不支持update,如果是主鍵,可以重新insert新數據
或者先delete 再insert
20211116
SQL 錯誤 [1]: Query failed (#20211116_082325_00132_qzuea): line 4:5: backquoted identifiers are not supported; use double quotes to quote identifiers`level`飄號去掉SQL 錯誤 [84148226]: Query failed (#20211116_083721_00144_qzuea): Exceeded limit of 100 open writers for partitions
寫入數據的時候
因為不同的數據源導致的問題 改成同一個數據源
dt分區字段放在最后SQL 錯誤 [47]: Query failed (#20211116_090238_00166_qzuea): line 26:3: Column 'primary_key_id' cannot be resolved
字段未編寫對
20210831
Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
https://blog.csdn.net/weixin_45171416/article/details/107525222
記錄一次定位spark shuffle總是報connection reset by peer的問題
https://blog.csdn.net/zhuge134/article/details/86556319
“spark”+“java.lang.StackOverflowError”
https://blog.csdn.net/u010936936/article/details/88363449
java.io.ioexception: 磁盤空間不足,Spark:java.io.IOException:設備上沒有剩余空間
https://blog.csdn.net/kwu_ganymede/article/details/49094881
https://blog.csdn.net/weixin_35399893/article/details/118842634
ConnectionResetError: [WinError 10054] 遠程主機強迫關閉了一個現有的連接。
spark 連接問題 估計和sparkui有關
20210826
python 通過pyhive 連接 hiveimport pandas as pd
from sqlalchemy.engine import create_engine
from pyhive import hive
conn = hive.connect(host='192.168.1.73', port=10000, database='ztdata_hive', username='root')
cursor = conn.cursor()
sql = "select * from tb_customer_sec_type"
cursor.execute(sql)
for i in cursor:print(i)pyhive報錯Could not start SASL: b‘Error in sasl_client_start (-4) SASL(-4)https://blog.csdn.net/ssgmoshou/article/details/107767680
import os
os.environ['JAVA_HOME']=r'D:\Java\jdk1.8.0_301'import findspark
findspark.init()
from pyspark.sql import SparkSession
pyspark導包順序
pyspark 不支持中文路徑名稱
20210823
1.所用的mysql驅動要和服務器的mysql版本對應
2.python 要裝64位
3.spark conf 目錄下 的conf 文件對應驅動路徑要改
4.java 1.8 有兩個 jre 目錄
5. 各種環境變量路徑不要搞錯了
6. parcharm下 要安裝 pyspark
7. cmd 中文亂碼,應該是spark路徑沒配對
pyspark 環境搭建可能出錯的原因
pip安裝locust時報錯-ERROR: Could not build wheels for gevent which use PEP 517 and cannot be installed
https://blog.csdn.net/ly021499/article/details/103288570
安裝版本可能要改了
20210820
[Spark] DataFram讀取JSON文件異常 出現 Since Spark 2.3, the queries from raw JSON/CSV files are disallowed…
從文件讀只能通過schema來讀
pyspark program : CreateProcess error=5, 拒絕訪問。
環境變量中沒有所謂的 pyspark_home
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils… does not exist in the JVM
https://www.javatt.com/p/46998
py4j.protocol.Py4JJavaError: An error occurred while calling o45.load.
也有可能是數據庫驅動的問題
dfs.namenode.rpc-address 172.16.1.102:8080xml 配置格式
數據庫驅動找不到 把 connector jar包 copy 到lib下面
hive-site.xml的 數據庫驅動包路徑不要改
File “D:\dashuju\spark-3.0.0-bin-hadoop2.7\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py”, line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o43.jdbc.
Mysql錯誤:The server time zone value is unrecognized or represents more than one time zone
https://blog.csdn.net/zjccsg/article/details/69254134
https://blog.csdn.net/hy_coming/article/details/104128024 重點修改my.ini
https://blog.csdn.net/sxeric/article/details/113832302
py4j.protocol.Py4JJavaError: An error occurred while calling o43.jdbc.
: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
pyspark myslq 服務沒啟動
錯誤:Error: JAVA_HOME is incorrectly set. Please update $HADOOP_HOME\etc\hadoop\hadoop\hadoop-env.cmd
https://blog.csdn.net/weixin_39971186/article/details/88842359
這種情況是因為jdk 安裝路徑有空格 program file 文件夾最好不要裝在這個文件夾
win10 Java環境變量,hadoop 環境變量
https://www.cnblogs.com/lijins/p/10091485.html
ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.ClassNotFoundException: main.s
python not found 也是把版本
spark,hadoop,pyspark 的版本不對
根據pyspakr的官方文檔下載對應的 spark和hadoop 版本
Spark Python error“FileNotFoundError:[WinError 2]系統找不到指定的文件”
重啟pycharm
pyspark解決報錯“py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled ”
https://blog.csdn.net/Together_CZ/article/details/90402660
ERROR: Could not import pypandoc - required to package PySpark
用pip安裝pypandoc即可
裝pyspark的時候出現的問題
pip 安裝 pypandoc 即可
Exception: Java gateway process exited before sending the driver its port number(以解決)附源碼
還有可能是 java版本的問題 換成
# os.environ['JAVA_HOME']="D:\Java\jdk1.8.0_301"
報錯消失
https://blog.csdn.net/a2099948768/article/details/79580634
Could not reserve enough space for 2097152KB object heap pyspark
把java 換成64位
pyspark–報錯java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST解決
https://blog.csdn.net/a200822146085/article/details/89467002
pyspark異常處理之:java.lang.OutOfMemoryError: Java heap space
https://blog.csdn.net/a5685263/article/details/102265838
20210819
Spark :【error】System memory 259522560 must be at least 471859200
https://blog.csdn.net/qq_30505673/article/details/84992068?utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-1.control&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-1.control
https://www.cnblogs.com/drl-blogs/p/11086826.html
https://blog.csdn.net/aubekpan/article/details/85329768
Idea Maven構建Scala項目Cannot connect compileserver 解決方法
ava.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200
https://blog.csdn.net/weixin_43322685/article/details/82961748
scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps
https://blog.csdn.net/weixin_42129080/article/details/80961878
當前目錄導包報錯 直接按alt+enter 選第一個
idea + spark 報錯:object apache is not a member of package org
https://blog.csdn.net/xl132598798/article/details/105695593
導入spark 安裝目錄的jar包 就可以
IDEA安裝完插件Scala后 通過add frameworks support找到不到scala插件
https://blog.csdn.net/weixin_43520450/article/details/108677784
https://blog.csdn.net/tanhaodi2012/article/details/100182735
idea 無法創建Scala class 選項解決辦法匯總
https://www.cnblogs.com/libaoquan/p/9004531.html
pom文件配置spark的依賴
總結
- 上一篇: dbeaver数据库工具
- 下一篇: pyspark汇总小结