日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python词性标注_文本分类的词性标注

發布時間:2025/3/21 python 15 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python词性标注_文本分类的词性标注 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

我是一個新的python,正在處理一個文本分類問題。我用不同的在線資源開發了一個代碼。但是這個代碼并沒有做詞性標注。有人能幫我找出我的代碼中我真正出錯的那一行嗎。我在代碼中做詞性標記,但結果中沒有顯示。我也試過用nltk做詞性標注,但這對我也不起作用。如有任何幫助,我們將不勝感激。謝謝。在# Add the Data using pandas

Corpus = pd.read_csv(r"U:\FAHAD UL HASSAN\Python Code\projectdatacor.csv",encoding='latin-1')

# Data Pre-processing - This will help in getting better results through the classification algorithms

# Remove blank rows if any.

Corpus['description'].dropna(inplace=True)

# Change all the text to lower case. This is required as python interprets 'design' and 'DESIGN' differently

Corpus['description'] = [entry.lower() for entry in Corpus['description']]

# Punctuation Removal

Corpus['description'] = Corpus.description.str.replace('[^\w\s]', '')

# Tokenization : In this each entry in the corpus will be broken into set of words

Corpus['description']= [word_tokenize(entry) for entry in Corpus['description']]

# Remove Stop words, Non-Numeric and perfom Word Stemming/Lemmenting.

# WordNetLemmatizer requires Pos tags to understand if the word is noun or verb or adjective etc. By default it is set to Noun

STOPWORDS = set(stopwords.words('english'))

tag_map = defaultdict(lambda : wn.NOUN)

tag_map['J'] = wn.ADJ

tag_map['V'] = wn.VERB

tag_map['R'] = wn.ADV

for index,entry in enumerate(Corpus['description']):

# Declaring Empty List to store the words that follow the rules for this step

Final_words = []

# Initializing WordNetLemmatizer()

word_Lemmatized = WordNetLemmatizer()

# pos_tag function below will provide the 'tag' i.e if the word is Noun(N) or Verb(V) or something else.

for word, tag in pos_tag(entry):

# Below condition is to check for Stop words and consider only alphabets

if word not in STOPWORDS and word.isalpha():

word_Final = word_Lemmatized.lemmatize(word,tag_map[tag[0]])

Final_words.append(word_Final)

# The final processed set of words for each iteration will be stored in 'description_final'

Corpus.loc[index,'description_final'] = str(Final_words)

print(Corpus['description_final'].head())

這些就是我得到的結果。這段代碼做了很多事情,比如標記化,刪除了stopwords,但是它在我的結果中顯示了pos標記。在

^{pr2}$

總結

以上是生活随笔為你收集整理的python词性标注_文本分类的词性标注的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。