python安装成功第三方库但import出问题_为什么会在pyspark在RDD中调用python第三方库失败?...
問題描述
Hi, 我在公司線上運行pyspark時調用jieba分詞, 發現可以成功import, 但是在RDD中調用分詞函數時卻提示沒有 module jieba, 在本地虛擬機時沒有這些問題
問題出現的環境背景及自己嘗試過哪些方法
嘗試更換了root安裝jieba的
相關代碼
// 請把代碼文本粘貼到下方(請勿用圖片代替代碼)
>>>import jieba
>>>[x for x in jieba.cut('這是一段測試文本')]
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.448 seconds.
Prefix dict has been built succesfully.
[u'u8fd9u662f', u'u4e00u6bb5', u'u6d4bu8bd5', u'u6587u672c']
//以上是普通調用jieba 能夠成功分詞
>>> cut = name.map(lambda x: [y for y in jieba.cut(x)])
>>> cut.count()
//以上代碼在本地虛擬機中運行不會報錯,但是在線上堡壘機運行會報錯。
你期待的結果是什么?實際看到的錯誤信息又是什么?
18/07/13 10:16:17 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 1.0 (TID 16, hadoop13, executor 17): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/pyspark/worker.py", line 98, in main
command = pickleSer._read_with_length(infile)
File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/pyspark/serializers.py", line 164, in _read_with_length
return self.loads(obj)
File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/pyspark/serializers.py", line 422, in loads
return pickle.loads(obj)
File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/pyspark/cloudpickle.py", line 664, in subimport
__import__(name)
ImportError: ('No module named jieba', , ('jieba',))
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
總結
以上是生活随笔為你收集整理的python安装成功第三方库但import出问题_为什么会在pyspark在RDD中调用python第三方库失败?...的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: win10老自动重启怎么办 解决Win1
- 下一篇: python嵌套循环跳出_如何跳出嵌套的