當前位置：首頁 > 编程语言 > python >内容正文

python

python的百分号和斜杠除_关于python：如何替换除字母，数字，正斜杠和反斜杠之外的所有字符...

發布時間：2024/7/19 python 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 python的百分号和斜杠除_关于python：如何替换除字母，数字，正斜杠和反斜杠之外的所有字符... 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

想要解析文本并僅返回字母，數字，正斜杠和反斜杠，并用''替換所有其他斜杠。

是否可以僅使用一種正則表達式模式，而不是隨后需要循環的幾種正則表達式模式？無法獲取下面的樣式，不能替換正斜杠。

line1 ="1/R~e`p!l@@a#c$e%% ^A&l*l( S)-p_e+c=ial C{har}act[er]s ;E ?xce|pt Forw:ard" $An>d B,?a..ck Sl'as

line2 = line

RGX_PATTERN ="[^\w]","_"

for pattern in RGX_PATTERN:

line = re.sub(r"%s" %pattern, '', line)

print("replace1:" + line)

#Prints: 1ReplaceAllSpecialCharactersExceptForwardAndBackSlashes2

以下來自SO的代碼已經過測試，發現比regex更快，但隨后它替換了所有要保留的特殊字符，包括/和。有什么方法可以對其進行編輯以使其適合我的用例，并且仍然保持它在正則表達式方面的優勢？

line2 = ''.join(e for e in line2 if e.isalnum())

print("replace2:" + line2)

#Prints: 1ReplaceAllSpecialCharactersExceptForwardAndBackSlashes2

作為一個額外的障礙，要解析的文本應采用ASCII格式，因此，如有可能，其他編碼中的字符也應替換為''

更快一點并且適用于Unicode：

full_pattern = re.compile('[^a-zA-Z0-9\\\/]|_')

def re_replace(string):

return re.sub(full_pattern, '', string)

如果您真的想要它，那是迄今為止最好的方法(但有點晦澀)：

def wanted(character):

return character.isalnum() or character in '\\/'

ascii_characters = [chr(ordinal) for ordinal in range(128)]

ascii_code_point_filter = [c if wanted(c) else None for c in ascii_characters]

def fast_replace(string):

# Remove all non-ASCII characters. Heavily optimised.

string = string.encode('ascii', errors='ignore').decode('ascii')

# Remove unwanted ASCII characters

return string.translate(ascii_code_point_filter)

時序：

SETUP="

busy = ''.join(chr(i) for i in range(512))

import re

full_pattern = re.compile('[^a-zA-Z0-9\\\/]|_')

def in_whitelist(character):

return character.isalnum() or character in '\\/'

def re_replace(string):

return re.sub(full_pattern, '', string)

def wanted(character):

return character.isalnum() or character in '\\/'

ascii_characters = [chr(ordinal) for ordinal in range(128)]

ascii_code_point_filter = [c if wanted(c) else None for c in ascii_characters]

def fast_replace(string):

string = string.encode('ascii', errors='ignore').decode('ascii')

return string.translate(ascii_code_point_filter)

python -m timeit -s"$SETUP""re_replace(busy)"

python -m timeit -s"$SETUP""''.join(e for e in busy if in_whitelist(e))"

python -m timeit -s"$SETUP""fast_replace(busy)"

結果：

10000 loops, best of 3: 63 usec per loop

10000 loops, best of 3: 135 usec per loop

100000 loops, best of 3: 4.98 usec per loop

在所有這些方面產生與我的輸出完全相同的輸出：

@Master_Yoda; 您可能正在使用Python2。OP正在使用Python 3。

好電話，沒有注意到這一點。

經過測試，它確實處理了非ascii文本

你為什么不能做這樣的事情：

def in_whitelist(character):

return character.isalnum() or character in ['\','/']

line2 = ''.join(e for e in line2 if in_whitelist(e))

根據建議進行編輯以壓縮功能。

為了簡潔起見，我個人將最后一部分更改為character in [\, ]。

好。這工作了。只是必須轉義反斜杠[\\, ]

轉義字符串文字后為我工作...哦，@ Khaelid同意。

無法完成所有這些操作：(等)

總結

以上是生活随笔為你收集整理的python的百分号和斜杠除_关于python：如何替换除字母，数字，正斜杠和反斜杠之外的所有字符...的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： mac 用户文件夹权限_Mac视频播
下一篇： html 点击div 改变高度,HTML

python

python的百分号和斜杠 除_关于python：如何替换除字母，数字，正斜杠和反斜杠之外的所有字符...

總結

python的百分号和斜杠除_关于python：如何替换除字母，数字，正斜杠和反斜杠之外的所有字符...