日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【Kaggle微课程】Natural Language Processing - 3. Word Vectors

發布時間:2024/7/5 编程问答 38 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【Kaggle微课程】Natural Language Processing - 3. Word Vectors 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

    • 1. 詞嵌入 Word Embeddings
    • 2. 分類模型
    • 3. 文檔相似度
    • 練習:
      • 1. 使用文檔向量訓練模型
      • 2. 文本相似度

learn from https://www.kaggle.com/learn/natural-language-processing

1. 詞嵌入 Word Embeddings

參考博文:05.序列模型 W2.自然語言處理與詞嵌入 https://michael.blog.csdn.net/article/details/108886394

類似的詞語有著類似的向量表示,向量間可以相減作類比

  • 加載模型
import numpy as np import spacy# Need to load the large model to get the vectors nlp = spacy.load('en_core_web_lg')
  • 提取單詞向量
text = "These vectors can be used as features for machine learning models." with nlp.disable_pipes():vectors = np.array([token.vector for token in nlp(text)]) vectors.shape # (12, 300) 12個詞,每個是300維的詞向量
  • 合并單詞向量為文檔向量,最簡單的做法是,平均每個單詞的向量
import pandas as pd# Loading the spam data # ham is the label for non-spam messages spam = pd.read_csv('../input/nlp-course/spam.csv')with nlp.disable_pipes():doc_vectors = np.array([nlp(text).vector for text in spam.text]) doc_vectors.shape # (5572, 300)

2. 分類模型

有了文檔向量,你可以使用 sklearn 模型、XGB模型等進行建模

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(doc_vectors, spam.label, test_size=0.1, random_state=1)
  • SVM 的例子
from sklearn.svm import LinearSVC# Set dual=False to speed up training, and it's not needed svc = LinearSVC(random_state=1, dual=False, max_iter=10000) svc.fit(X_train, y_train) print(f"Accuracy: {svc.score(X_test, y_test) * 100:.3f}%", ) Accuracy: 97.312%

3. 文檔相似度

cosine similarity 余弦相似度 cos?θ=a?b∥a∥∥b∥\cos \theta=\frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\|\|\mathbf{b}\|}cosθ=aba?b?

def cosine_similarity(a, b):return a.dot(b)/np.sqrt(a.dot(a) * b.dot(b)) a = nlp("REPLY NOW FOR FREE TEA").vector b = nlp("According to legend, Emperor Shen Nung discovered tea when leaves from a wild tree blew into his pot of boiling water.").vector cosine_similarity(a, b)

輸出:

0.7030031

練習:

試試你為餐館建立的情緒分析模型。在給定的一些示例文本的數據集中找到最相似的評論

%matplotlib inlineimport matplotlib.pyplot as plt import numpy as np import pandas as pd import spacy# Set up code checking from learntools.core import binder binder.bind(globals()) from learntools.nlp.ex3 import * print("\nSetup complete")
  • 加載模型、數據
# Load the large model to get the vectors nlp = spacy.load('en_core_web_lg')review_data = pd.read_csv('../input/nlp-course/yelp_ratings.csv') review_data.head()

reviews = review_data[:100] # We just want the vectors so we can turn off other models in the pipeline with nlp.disable_pipes():vectors = np.array([nlp(review.text).vector for idx, review in reviews.iterrows()]) vectors.shape # (100, 300)100條評論的向量表示
  • 為了節省時間,加載已經處理好的所有評論詞向量
# Loading all document vectors from file vectors = np.load('../input/nlp-course/review_vectors.npy')

1. 使用文檔向量訓練模型

  • SVM
from sklearn.svm import LinearSVC from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(vectors, review_data.sentiment, test_size=0.1, random_state=1)# Create the LinearSVC model model = LinearSVC(random_state=1, dual=False) # Fit the model model.fit(X_train, y_train)# run to see model accuracy print(f'Model test accuracy: {model.score(X_test, y_test)*100:.3f}%')

輸出:

Model test accuracy: 93.847%
  • KNN
# Scratch space in case you want to experiment with other models from sklearn.neighbors import KNeighborsClassifier second_model = KNeighborsClassifier(5) second_model.fit(X_train, y_train) print(f'Model test accuracy: {second_model.score(X_test, y_test)*100:.3f}%')

輸出:

Model test accuracy: 86.998%

2. 文本相似度

  • Centering the Vectors

有時在計算相似性時,人們會計算所有文檔的平均向量,然后每個文檔的向量減去這個向量。為什么你認為這有助于相似性度量?

有時候你的文檔已經相當相似了。例如,這個數據集是對企業的所有評論,這些文檔之間有很強的相似度,與新聞文章、技術手冊和食譜相比。最終你得到0.8和1之間的所有相似性,并且沒有反相似文檔(相似性<0)。當中心化向量時,您將比較數據集中的文檔,而不是所有可能的文檔。

  • 找到最相似的評論
review = """I absolutely love this place. The 360 degree glass windows with the Yerba buena garden view, tea pots all around and the smell of fresh tea everywhere transports you to what feels like a different zen zone within the city. I know the price is slightly more compared to the normal American size, however the food is very wholesome, the tea selection is incredible and I know service can be hit or miss often but it was on point during our most recent visit. Definitely recommend!I would especially recommend the butternut squash gyoza."""def cosine_similarity(a, b):return np.dot(a, b)/np.sqrt(a.dot(a)*b.dot(b))review_vec = nlp(review).vector## Center the document vectors # Calculate the mean for the document vectors, should have shape (300,) vec_mean = vectors.mean(axis=0) # 平均向量 # Subtract the mean from the vectors centered = vectors - vec_mean # 中心化向量# Calculate similarities for each document in the dataset # Make sure to subtract the mean from the review vector sims = [cosine_similarity(centered_vec, review_vec - vec_mean) for centered_vec in centered]# Get the index for the most similar document most_similar = np.argmax(sims) print(review_data.iloc[most_similar].text)

輸出:

After purchasing my final christmas gifts at the Urban Tea Merchant in Vancouver, I was surprised to hear about Teopia at the new outdoor mall at Don Mills and Lawrence when I went back home to Toronto for Christmas. Across from the outdoor skating rink and perfect to sit by the ledge to people watch, the location was prime for tea connesieurs... or people who are just freezing cold in need of a drinK! Like any gourmet tea shop, there were large tins of tea leaves on the walls, and although the tea menu seemed interesting enough, you can get any specialty tea as your drink. We didn't know what to get... so the lady suggested the Goji Berries... it smelled so succulent and juicy... instantly SOLD! I got it into a tea latte and watched the tea steep while the milk was steamed, and surprisingly, with the click of a button, all the water from the tea can be instantly drained into the cup (see photo).. very fascinating!The tea was aromatic and tasty, not over powering. The price was also very reasonable and I recommend everyone to get a taste of this place :)
  • 評論1

  • 與評論1最相似的評論

  • 看看相似的評論

如果你看看其他類似的評論,你會看到很多咖啡店。為什么你認為咖啡評論和只提到茶的例子評論相似?

咖啡店的評論也將類似于我們的茶館評論,因為咖啡和茶在語義上是相似的。大多數咖啡館都提供咖啡和茶,所以你會經常看到這兩個詞同時出現。

刷完了課程,獲得鼓勵證書,繼續加油!


我的CSDN博客地址 https://michael.blog.csdn.net/

長按或掃碼關注我的公眾號(Michael阿明),一起加油、一起學習進步!

總結

以上是生活随笔為你收集整理的【Kaggle微课程】Natural Language Processing - 3. Word Vectors的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 国产香蕉97碰碰碰视频在线观看 | 成人午夜在线 | 欧美人和黑人牲交网站上线 | 超碰国产一区二区三区 | 日本www色视频 | 欧洲毛片| 国产精久久久久 | 国产精彩视频一区二区 | 日本青青草视频 | 男女午夜视频 | 成人国产综合 | 黄色在线免费 | 人妻一区二区视频 | 免费观看污视频 | 日韩r级在线观看 | 色婷网 | 男生操女生网站 | 日本福利片在线观看 | 亚洲视频在线观看一区二区 | 一级黄色av片 | 中文字幕乱码中文乱码777 | 香蕉国产在线观看 | 成人免费无码大片a毛片抽搐色欲 | 一级黄色a级片 | 精品视频一区二区三区四区五区 | 91色爱 | 九色蝌蚪9l视频蝌蚪9l视频 | 日韩黄色免费视频 | 天天性综合 | 欧美999| 日韩激情网站 | 亚洲综合在线一区二区 | 成年人黄国产 | 精品三级国产 | 国产视频在线观看视频 | 亚洲国产日韩精品 | 成人免费毛片高清视频 | 3d动漫精品啪啪一区二区下载 | 五月天婷婷在线播放 | 久久久久久久久久久影视 | 亚洲午夜精品久久久久久人妖 | 少妇做爰免费理伦电影 | 天堂网成人 | 国产精品区二区三区日本 | 真实的国产乱xxxx在线 | 精品久久人人妻人人做人人 | 久久久久香蕉 | 欧洲日韩一区二区三区 | 免费看国产片在线观看 | www久久久| 日韩精品――色哟哟 | 欧美黄网站 | 亚洲国内精品 | 亚洲黄页 | 青青草激情视频 | 亚洲av综合色区无码一二三区 | 黄色av免费网站 | 免费人成视频在线播放 | 怡春院在线视频 | sese在线视频| 亚洲精品天堂成人片av在线播放 | 色女综合 | 蜜臀99久久精品久久久久小说 | 亚洲精品电影在线观看 | 爱逼av| 男男黄网站 | 日韩麻豆视频 | 住在隔壁的她动漫免费观看全集下载 | 99av国产精品欲麻豆 | 精品久久久久久久久久久国产字幕 | 九色国产精品 | 草色网| 日本xxxx在线观看 | 波多野结av衣东京热无码专区 | 国产第一网站 | 91精品在线观看视频 | 日韩欧美在线观看一区二区 | 国产91丝袜在线18 | 国产主播在线看 | 香蕉网在线播放 | 国产精品永久久久久久久久久 | 99er这里只有精品 | 先锋影音中文字幕 | 91国内产香蕉| 日韩伦理一区二区三区 | 免费成人在线观看动漫 | 伦理欧美 | 久久久午夜电影 | 亚洲 另类 春色 国产 | 国产日韩av一区二区 | 夜夜躁狠狠躁 | 欧美日韩一区二区三区69堂 | 国产wwww | 色乱码一区二区三在线看 | 伊人精品在线观看 | 影音先锋男人的天堂 | 久操热线| 91一区二区三区四区 | 国产精品xxx视频 |