日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python安装成功第三方库但import出问题_为什么会在pyspark在RDD中调用python第三方库失败?...

發布時間:2025/4/5 python 43 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python安装成功第三方库但import出问题_为什么会在pyspark在RDD中调用python第三方库失败?... 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

問題描述

Hi, 我在公司線上運行pyspark時調用jieba分詞, 發現可以成功import, 但是在RDD中調用分詞函數時卻提示沒有 module jieba, 在本地虛擬機時沒有這些問題

問題出現的環境背景及自己嘗試過哪些方法

嘗試更換了root安裝jieba的

相關代碼

// 請把代碼文本粘貼到下方(請勿用圖片代替代碼)

>>>import jieba

>>>[x for x in jieba.cut('這是一段測試文本')]

Building prefix dict from the default dictionary ...

Loading model from cache /tmp/jieba.cache

Loading model cost 0.448 seconds.

Prefix dict has been built succesfully.

[u'u8fd9u662f', u'u4e00u6bb5', u'u6d4bu8bd5', u'u6587u672c']

//以上是普通調用jieba 能夠成功分詞

>>> cut = name.map(lambda x: [y for y in jieba.cut(x)])

>>> cut.count()

//以上代碼在本地虛擬機中運行不會報錯,但是在線上堡壘機運行會報錯。

你期待的結果是什么?實際看到的錯誤信息又是什么?

18/07/13 10:16:17 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 1.0 (TID 16, hadoop13, executor 17): org.apache.spark.api.python.PythonException: Traceback (most recent call last):

File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/pyspark/worker.py", line 98, in main

command = pickleSer._read_with_length(infile)

File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/pyspark/serializers.py", line 164, in _read_with_length

return self.loads(obj)

File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/pyspark/serializers.py", line 422, in loads

return pickle.loads(obj)

File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/pyspark/cloudpickle.py", line 664, in subimport

__import__(name)

ImportError: ('No module named jieba', , ('jieba',))

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)

at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207)

at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)

at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

總結

以上是生活随笔為你收集整理的python安装成功第三方库但import出问题_为什么会在pyspark在RDD中调用python第三方库失败?...的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。