當前位置：首頁 > 编程语言 > python >内容正文

python

python删除字符串中重复字符_从Python中删除字符串标点符号的最佳方法

發布時間：2023/12/4 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 python删除字符串中重复字符_从Python中删除字符串标点符号的最佳方法小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

似乎有一個比以下更簡單的方法：

3import string

s ="string. With. Punctuation?" # Sample string

out = s.translate(string.maketrans("",""), string.punctuation)

有？

在我看來很直接。你為什么要改變它？如果你想更容易的話，就把你剛才寫的東西包裝在一個函數中。

好吧，用str.translate的副作用來做這項工作，似乎有點刻薄。我在想可能有一些更像str.strip(chars)的東西可以處理整個字符串，而不僅僅是我錯過的邊界。

也取決于數據。在服務器名稱中包含下劃線的數據(在某些地方很常見)上使用這個命令可能會很糟糕。只要確保你知道數據，知道它是什么，或者你可以以clbuttic問題的一個子集結束。

也取決于你所說的標點符號。"The temperature in the O'Reilly & Arbuthnot-Smythe server's main rack is 40.5 degrees.只包含一個標點符號，第二個。

我很驚訝沒有人提到string.punctuation根本不包括非英語標點符號。我在想。，！？：&215；""？等等。

不適用于Unicode字符串？

@約翰馬欽，你忘了' '是標點符號。

從效率的角度看，你不會打敗

1s.translate(None, string.punctuation)

對于較高版本的python，請使用以下代碼：

1s.translate(str.maketrans('', '', string.punctuation))

它使用一個查找表在C中執行原始字符串操作——除了編寫自己的C代碼，沒有什么比這更好的了。

如果速度不是問題，另一個選擇是：

2exclude = set(string.punctuation)

s = ''.join(ch for ch in s if ch not in exclude)

這比s.replace替換為每個字符要快，但不能像下面計時中看到的那樣執行非純Python方法，如regexes或string.translate。對于這種類型的問題，在盡可能低的水平上做它會有回報。

定時代碼：

25import re, string, timeit

s ="string. With. Punctuation"

exclude = set(string.punctuation)

table = string.maketrans("","")

regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):

return ''.join(ch for ch in s if ch not in exclude)

def test_re(s): # From Vinko's solution, with fix.

return regex.sub('', s)

def test_trans(s):

return s.translate(table, string.punctuation)

def test_repl(s): # From S.Lott's solution

for c in string.punctuation:

s=s.replace(c,"")

return s

print"sets :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)

print"regex :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)

print"translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)

print"replace :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

結果如下：

4sets : 19.8566138744

regex : 6.86155414581

translate : 2.12455511093

replace : 28.4436721802

感謝您提供的計時信息，我本來想自己做一些類似的事情，但您的設計比我所能做的任何事情都要好，現在我可以將它用作我將來要編寫的任何計時代碼的模板：)。

很好的回答。您可以通過刪除表來簡化它。文檔說："對于只刪除字符的翻譯，將table參數設置為none"(docs.python.org/library/stdtypes.html_str.translate)

對''.join()使用列表理解會使它更快一些，但速度不足以打敗regex或translate。請參閱不帶&91；&93；、python的列表理解，了解原因。

你能解釋一下"從主導入…"的語法嗎？？

同樣值得注意的是，translate()對于str和unicode對象的行為也不同，因此您需要確保始終使用相同的數據類型，但此答案中的方法對這兩種類型都同樣有效，這很方便。

在python3中，table = string.maketrans("","")應替換為table = str.maketrans({key: None for key in string.punctuation})？

做set(string.punctuation)的目的是什么？它只有獨特的價值觀。

@Mlissner-效率。它是一個列表/字符串，你需要做一個線性掃描來找出字母是否在字符串中。不過，使用集合或字典，通常會更快(除了非常小的字符串)，因為它不必檢查每個值。

@sparkandshine是的，除非您必須將每個鍵的序號映射到替換字符，所以在python 3中，它將是s.translate({ord(c): None for c in string.punctuation})。

為了更新討論，從python 3.6開始，regex現在是最有效的方法！它幾乎比翻譯快2倍。而且，套裝和替換也不再那么糟糕了！它們都提高了4倍以上。)

在python 3中，翻譯表也可以由table = str.maketrans('', '', string.punctuation)docs.python.org/3/library/stdtypes.html_str.maketrans創建。

我得到這個錯誤TypeError: translate() takes exactly one argument (2 given)。

知道怎么解決這個問題嗎

類型錯誤：translate()只接受一個參數(給定2個)

@是的，讀你上面的評論…你用的是Python3，而不是Python2。

正則表達式足夠簡單，如果你知道的話。

3import re

s ="string. With. Punctuation?"

s = re.sub(r'[^\w\s]','',s)

在上面的代碼中，我們用空字符串替換(re.sub)所有非[字母數字字符(w)和空格(s)]。因此。還有？通過regex運行s變量后，變量"s"中將不存在標點符號。

偉大的。你能解釋一下嗎？

Unicode的，做的工作嗎？？？？？？？？？？？？？？？

解釋："outlier取代需要Word字符(^)或空間與空字符串。要小心，雖然，比賽下劃線W太，例如。

"我認為它會sislam與Unicode Unicode與標志集，s = re.sub(r'[^\w\s]','',s, re.UNICODE)IU。在Linux測試它與Python 3它甚至沒有標志使用泰米爾語文學？？？？？？？？？？？？？？？？？？？？？？？？？？？？？。。。。。。。

這是一個地方，你應該使用正則表達式。

為了方便使用，我總結了在python 2和python 3中從字符串中去掉標點符號的注意事項。詳細描述請參考其他答案。

Python 2

5import string

s ="string. With. Punctuation?"

table = string.maketrans("","")

new_s = s.translate(table, string.punctuation) # Output: string without punctuation

Python 3

5import string

s ="string. With. Punctuation?"

table = str.maketrans({key: None for key in string.punctuation})

new_s = s.translate(table) # Output: string without punctuation

小紙條：你不需要理解做一dict鍵映射到給定的dictof a None；{key: None for key in string.punctuation}可以取代這是一個dict.fromkeys(string.punctuation)所有工作在C層有一個單一的呼叫。

"謝謝你shadowranger，此更新。

1myString.translate(None, string.punctuation)

啊，我試過了，但在任何情況下都不行。mystring.translate(string.maketrans("，"，)，string.標點)工作正常。

什么時候不起作用？

注意，對于python 3中的str和python 2中的unicode，不支持deletechars參數。

@AGF：即使在使用字典參數的unicode和py3k情況下，您仍然可以使用.translate()刪除標點符號。

mystring.translate(string.maketrans("，")，string.punctuation)不與Unicode字符串(發現辛苦)

marcmaxson @工作：myString.translate(str.maketrans("","", string.punctuation))是Unicode字符串在Python 3。雖然只在string.punctuation包括ASCII標點。點擊鏈接在我以前的評論。它顯示如何刪除所有標點(包括Unicode One)。

TypeError: translate() takes exactly one argument (2 given)：(

"briantingle看Python代碼：3評論(它在我的護照的一個參數)。鏈接到Python代碼是2，湖和它的作品與Unicode的Python 3適應

我通常用這樣的東西：

7>>> s ="string. With. Punctuation?" # Sample string

>>> import string

>>> for c in string.punctuation:

... s= s.replace(c,"")

...

>>> s

'string With Punctuation'

一個丑陋的一行：reduce(lambda s,c: s.replace(c, ''), string.punctuation, s)。

很好，但是不能去除一些像長的連字符這樣的問題。

string.punctuation只是ASCII碼！一種更正確(但也更慢)的方法是使用unicodedata模塊：

5# -*- coding: utf-8 -*-

from unicodedata import category

s = u'String — with - ?punctation ?...'

s = ''.join(ch for ch in s if category(ch)[0] != 'P')

print 'stripped', s

你可以regex.sub(ur"\p{P}+","", text)：

不一定簡單，但如果你更熟悉這個家庭的話，那就另當別論了。

3import re, string

s ="string. With. Punctuation?" # Sample string

out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

因為字符串。標點符號有序列，-。以正確的、升序的、無間隔的、ASCII順序。雖然python有這個權利，但是當您嘗試使用string.pu點符號的子集時，它可能會因為意外的"-"而成為一個顯示阻止符。

事實上，這仍然是錯誤的。序列"93"被視為一個轉義(巧合的是沒有關閉)，因此繞過了另一個故障)，但沒有轉義。您應該使用re.escape(字符串、標點符號)來防止這種情況發生。

是的，我省略了它，因為它在示例中起到了保持簡單的作用，但是您認為它應該被合并是正確的。

對于python 3 str或python 2 unicode值，str.translate()只接受字典；在該映射中查找代碼點(整數)，并刪除映射到None的任何內容。

刪除(一些？)標點符號，使用：

4import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))

s.translate(remove_punct_map)

dict.fromkeys()類方法使得創建映射變得簡單，根據鍵的順序將所有值設置為None。

要刪除所有標點符號，而不僅僅是ASCII標點符號，您的表需要大一點；請參見J.F.Sebastian的答案(python 3版本)：

5import unicodedata

import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)

if unicodedata.category(chr(i)).startswith('P'))

支持Unicode，string.punctuation是不夠的。我的回答湖

事實上，我的回答："j.f.sebastian只是使用相同的字符作為一個頂級評為。添加您的Python版本表3。

只有把作品評為最為ASCII字符串。你要明確索賠的Unicode支持。

它j.f.sebastian @：Unicode字符串。IT標準的ASCII標點。我從來沒有聲稱它帶所有的標點。點是：-)提供正確的技術研究與unicode對象2 strPython對象。

string.punctuation漏掉了現實世界中常用的大量標點符號。一個適用于非ASCII標點的解決方案如何？

4import regex

s = u"string. With. Some?Really Weird、Non？ASCII。「(Punctuation)」?"

remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)

remove.sub(u"", s).strip()

就我個人而言，我認為這是從python字符串中刪除標點符號的最佳方法，因為：

它刪除所有Unicode標點符號

它很容易修改，例如，如果要刪除標點符號，可以刪除\{S}，但保留類似$的符號。

您可以對要保留的內容和要刪除的內容進行具體說明，例如，\{Pd}只刪除破折號。

此regex還規范化空白。它將標簽、回車和其他奇怪的東西映射到漂亮的單個空間。

這使用了Unicode字符屬性，您可以在維基百科上了解更多信息。

下面是一個針對python 3.5的一行程序：

2import string

"l*ots! o(f. p@u)n[c}t]u[a'ti"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

這可能不是最好的解決方案，但我就是這樣做的。

2import string

f = lambda x: ''.join([i for i in x if i not in string.punctuation])

這是我寫的一個函數。它不是很有效，但很簡單，您可以添加或刪除任何您想要的標點：

7def stripPunc(wordList):

"""Strips punctuation from list of words"""

puncList = [".",";",":","!","?","/","\",",","#","@","$","&",")","(","""]

for punc in puncList:

for word in wordList:

wordList=[word.replace(punc,'') for word in wordList]

return wordList

我還沒看到這個答案。只需使用regex；它會刪除除單詞字符(\w和數字字符(\d外的所有字符，后跟空白字符(\s)：

3import re

s ="string. With. Punctuation?" # Sample string

out = re.sub(ur'[^\w\d\s]+', '', s)

是因為它是一個\d冗余\w子集)。

a number of Word字符子集被認為是人物？我的思想一個Word字符可以是任何字符構建真實的Word，例如A ZA Z？

A是的，在"Word"正則表達式包括字母，數字和下劃線。請描述一\w湖：在文檔docs.python.org / 3 /圖書館/ re.html

正如更新一樣，我在python3中重寫了@brian示例，并對其進行了更改，以將regex編譯步驟移到函數內部。我在這里想的是時間的每一個步驟需要使功能工作。也許您使用的是分布式計算，并且不能在您的工作人員之間共享regex對象，并且需要在每個工作人員處執行re.compile步驟。另外，我還想知道針對python3的兩種不同的maketrans實現的時間。

1table = str.maketrans({key: None for key in string.punctuation})

1table = str.maketrans('', '', string.punctuation)

另外，我還添加了另一個方法來使用set，在這里我利用交集函數來減少迭代次數。

這是完整的代碼：

44import re, string, timeit

s ="string. With. Punctuation"

def test_set(s):

exclude = set(string.punctuation)

return ''.join(ch for ch in s if ch not in exclude)

def test_set2(s):

_punctuation = set(string.punctuation)

for punct in set(s).intersection(_punctuation):

s = s.replace(punct, ' ')

return ' '.join(s.split())

def test_re(s): # From Vinko's solution, with fix.

regex = re.compile('[%s]' % re.escape(string.punctuation))

return regex.sub('', s)

def test_trans(s):

table = str.maketrans({key: None for key in string.punctuation})

return s.translate(table)

def test_trans2(s):

table = str.maketrans('', '', string.punctuation)

return(s.translate(table))

def test_repl(s): # From S.Lott's solution

for c in string.punctuation:

s=s.replace(c,"")

return s

print("sets :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))

print("sets2 :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))

print("regex :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))

print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))

print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))

print("replace :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))

這是我的結果：

6sets : 3.1830138750374317

sets2 : 2.189873124472797

regex : 7.142953420989215

translate : 4.243278483860195

translate2 : 2.427158243022859

replace : 4.579746678471565

這里有一個沒有regex的解決方案。

7import string

input_text ="!where??and!!or$$then:)"

punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))

print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then

用空格替換標點符號

將單詞之間的多個空格替換為單個空格

刪除尾隨空格(如果有)條()

6>>> s ="string. With. Punctuation?"

>>> s = re.sub(r'[^\w\s]','',s)

>>> re.split(r'\s*', s)

['string', 'With', 'Punctuation']

請編輯有更多的信息。"試試這個代碼和只讀"答案是氣餒，因為他們searchable不包含的內容，不要解釋為什么有人要"試試這個"。

3import re

s ="string. With. Punctuation?" # Sample string

out = re.sub(r'[^a-zA-Z0-9\s]', '', s)

在不太嚴格的情況下，使用一行程序可能會有所幫助：

1''.join([c for c in s if c.isalnum() or c.isspace()])

20#FIRST METHOD

#Storing all punctuations in a variable

punctuation='!?,.:;"\')(_-'

newstring='' #Creating empty string

word=raw_input("Enter string:")

for i in word:

if(i not in punctuation):

newstring+=i

print"The string without punctuation is",newstring

#SECOND METHOD

word=raw_input("Enter string:")

punctuation='!?,.:;"\')(_-'

newstring=word.translate(None,punctuation)

print"The string without punctuation is",newstring

#Output for both methods

Enter string: hello! welcome -to_python(programming.language)??,

The string without punctuation is: hello welcome topythonprogramminglanguage

20with open('one.txt','r')as myFile:

str1=myFile.read()

print(str1)

punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"',"'"]

for i in punctuation:

str1 = str1.replace(i,"")

myList=[]

myList.extend(str1.split(""))

print (str1)

for i in myList:

print(i,end='

print ("____________")

使用regex函數進行搜索和替換，如下所示。如果必須重復執行該操作，則可以保留一個已編譯的regex模式(標點符號)副本，這將加快速度。

是否更正了string.標點符號區域設置？如果是這樣，這可能不是最好的解決方案。

我不確定，我沒用過。我假設海報/讀者會知道他們要替換的標點符號。

嗯……我也不知道。我希望。標點符號可以在本地更正，但我不會依賴它。如果用戶有一組特定的字符，那么編譯后的regex是一個很好的方法，這可能是正確的。

使用python從文本文件中刪除停止字

20print('====THIS IS HOW TO REMOVE STOP WORS====')

with open('one.txt','r')as myFile:

str1=myFile.read()

stop_words ="not","is","it","By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"

myList=[]

myList.extend(str1.split(""))

for i in myList:

if i not in stop_words:

print ("____________")

print(i,end='

這是如何把我們的文件改成大寫的或小寫字母。

17print('@@@@This is lower case@@@@')

with open('students.txt','r')as myFile:

str1=myFile.read()

str1.lower()

print(str1.lower())

print('*****This is upper case****')

with open('students.txt','r')as myFile:

str1=myFile.read()

str1.upper()

print(str1.upper())

我喜歡使用這樣的函數：

6def scrub(abc):

while abc[-1] is in list(string.punctuation):

abc=abc[:-1]

while abc[0] is in list(string.punctuation):

abc=abc[1:]

return abc

這是剝離從開始和結束字符；使用abc.strip(string.punctuation)而不是那個。它不會刪除在搜索中的人物。

總結

以上是生活随笔為你收集整理的python删除字符串中重复字符_从Python中删除字符串标点符号的最佳方法的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：怎么样才能瘦肚子
下一篇： target not created怎么