日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python文本字符串比对_python-模糊字符串比较

發布時間:2024/7/19 python 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python文本字符串比对_python-模糊字符串比较 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

python-模糊字符串比較

我正在努力完成的是一個程序,該程序讀取文件并根據原始句子比較每個句子。 與原始句子完全匹配的句子將得到1分,而與之相反的句子將得到0分。所有其他模糊句子將得到1到0分之間的分數。

我不確定要使用哪種操作在Python 3中完成此操作。

我包括了示例文本,其中文本1是原始文本,其他前面的字符串是比較文本。

文字:樣本

文字1:那是一個黑暗而暴風雨的夜晚。 我一個人坐在紅色的椅子上。 我并不孤單,因為我只有三只貓。

文字20:那是一個陰暗而暴風雨的夜晚。 我獨自一人坐在深紅色的椅子上。 我并不孤單,因為我有三只貓//應該得分最高但不能得分1

文字21:那是一個陰暗而狂暴的夜晚。 我一個人坐在一個深紅色的大教堂上。 我并不孤單,因為我有三只貓//分數應低于文字20

文字22:我一個人坐在一個深紅色的大教堂上。 我并不孤單,因為我有三只貓科動物。 那是一個陰暗而狂暴的夜晚。//分數應低于文字21,但不能低于0

文字24:那是一個黑暗而暴風雨的夜晚。 我并不孤單。 我沒有坐在紅色的椅子上。 我有三只貓。//應該得分為0!

4個解決方案

96 votes

有一個名為difflib的軟件包。通過pip安裝:

pip install fuzzywuzzy

簡單用法:

>>> from fuzzywuzzy import fuzz

>>> fuzz.ratio("this is a test", "this is a test!")

96

該軟件包建立在difflib的基礎上。您問為什么不僅僅使用它? 除了更簡單之外,它還具有許多不同的匹配方法(例如令牌順序不敏感,部分字符串匹配),這使其在實踐中更加強大。 process.extract函數特別有用:從集合中找到最佳匹配的字符串和比率。 從他們的自述文件:

偏比

>>> fuzz.partial_ratio("this is a test", "this is a test!")

100

代幣分類率

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

90

>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")

100

代幣設定比率

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")

84

>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")

100

處理

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]

>>> process.extract("new york jets", choices, limit=2)

[('New York Jets', 100), ('New York Giants', 78)]

>>> process.extractOne("cowboys", choices)

("Dallas Cowboys", 90)

congusbongus answered 2019-10-25T04:21:53Z

79 votes

標準庫中有一個模塊(稱為SequenceMatcher),可以比較字符串并根據它們的相似性返回分數。 SequenceMatcher類應該做您想要做的。

編輯:來自python提示符的小例子:

>>> from difflib import SequenceMatcher as SM

>>> s1 = ' It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats.'

>>> s2 = ' It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines.'

>>> SM(None, s1, s2).ratio()

0.9112903225806451

HTH!

mac answered 2019-10-25T04:22:25Z

15 votes

unicode的索引和搜索速度比unicode(bytes)快得多。

from fuzzyset import FuzzySet

corpus = """It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines

It was a murky and tempestuous night. I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines

I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines. It was a murky and tempestuous night.

It was a dark and stormy night. I was not alone. I was not sitting on a red chair. I had three cats."""

corpus = [line.lstrip() for line in corpus.split("\n")]

fs = FuzzySet(corpus)

query = "It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats."

fs.get(query)

# [(0.873015873015873, 'It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines')]

警告:注意不要在模糊集中混用unicode和bytes。

hobs answered 2019-10-25T04:22:59Z

1 votes

該任務稱為復述識別,這是自然語言處理研究的活躍領域。 我已經鏈接了幾篇最新的論文,您可以在GitHub上找到其中的許多開源代碼。

請注意,所有回答的問題均假設兩個句子之間存在某些字符串/表面相似性,而實際上兩個字符串相似性很少的句子在語義上可以相似。

如果您對這種相似性感興趣,可以使用Skip-Thoughts。根據GitHub指南安裝軟件,然后轉到自述文件中的釋義檢測部分:

import skipthoughts

model = skipthoughts.load_model()

vectors = skipthoughts.encode(model, X_sentences)

這會將您的句子(X_sentences)轉換為向量。 稍后,您可以通過以下方式找到兩個向量的相似性:

similarity = 1 - scipy.spatial.distance.cosine(vectors[0], vectors[1])

我們假設vector [0]和vector1是要查找其分數的X_sentences [0]和X_sentences1的對應向量。

還有其他將句子轉換為向量的模型,您可以在此處找到。

將句子轉換為向量后,相似度只是找到這些向量之間的余弦相似度的問題。

Ash answered 2019-10-25T04:24:05Z

總結

以上是生活随笔為你收集整理的python文本字符串比对_python-模糊字符串比较的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。