當前位置：首頁 > 编程语言 > python >内容正文

python

Python：年轻人，不会正则表达式你睡得着觉？有点出息没有？

發布時間：2024/4/17 python 45 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python：年轻人，不会正则表达式你睡得着觉？有点出息没有？小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

現在
你已經會使用 python 模擬瀏覽器
進行一些 Http 的請求了
那么請求完之后
服務器返回給我們一堆源代碼
我們可不是啥都要的啊
我們是有原則的

我們想要的東西
怎么能一股腦的啥都往自己兜里塞呢？

使不得
使不得
所以
在服務器返回給我們的源碼之中
我們要過濾
拿到我們想要的就好
其它就丟一旁
那么
我們就需要學會怎么使用
正則表達式
通過它
我們才能過濾出我們想要的內容
...
接下來就是
學習 python 的正確姿勢

真香警告
這篇文章不適合急性子的人看，要不然會把手機砸了的！但是，如果你能看完，那么正則表達式對你來說，算個 p 的難度啊？
其實
正則表達式不僅僅適用于 python
很多編程語言
很多地方都會使用到正則
試想一下
如何從下面這段字符串中快速檢索所有的數字出來呢？
zui12shu234ai45der6en7sh88ixia7898os0huaib

簡單來說
正則表達式就是定義一些特殊的符號
來匹配不同的字符
比如
d
就可以代表
一個數字，等價于 0-9 的任意一個
那么你肯定想知道
其它的特殊符號表示的啥意思吧？
恩
就不告訴你

本篇完
再見

這是各種符號的解釋

你能看到這里
也是

不知道你看懵逼了沒？
反正我是不想看了
接下來
才是干貨

小帥b就給你精簡一下
通俗的把最常用的匹配告訴你

ok
知道了這些之后
我們怎么用 python 來進行判斷呢？
那就要使用到 python 的庫了
它就是
re
接下來我們就來使用 re 模塊
對其常用的方法
來使用正則表達式
re.match
使用這個方法
主要傳入兩個參數
第一個就是我們的匹配規則
第二個就是需要被過濾的內容
例如
我們想要從這
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Xiaoshuaib has 100 bananas
</pre>
拿到一個數字
那么我們就可以這樣
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import re
content = 'Xiaoshuaib has 100 bananas'
res = re.match('^Xi.(d+)s.s/pre>,content)
print(res.group(1))
</pre>
通過我們剛剛說的匹配符號
可以定義出相應的匹配規則
在這里我們將我們需要的目標內容用 () 括起來
此刻我們獲得結果是
0
那么如果我們想要 100 這個數字呢？
可以這樣
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import re
content = 'Xiaoshuaib has 100 bananas'
res = re.match('^Xi.?(d+)s.s/pre>,content)
print(res.group(1))
</pre>
看出區別了么
第二段代碼我們多了一個？符號
在這里呢
涉及到兩個概念
一個是
貪婪匹配
另一個是
非貪婪匹配
所謂貪婪匹配
就是我們的第一段代碼
一個數一個數都要去匹配
而非貪婪呢
我們是直接把 100 給匹配出來了

剛剛我們用到的
.？
是我們在匹配過程中最常使用到的
表示的就是匹配任意字符
但是
.？的 . 代表所有的單個字符，除了
如果我們的字符串有換行了
怎么辦呢？
比如這樣
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">content = """Xiaoshuaib has 100
bananas"""
</pre>
那么我們就需要用到 re 的匹配模式了
說來也簡單
直接用 re.S 就可以了
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import re
content = """Xiaoshuaib has 100
bananas"""
res = re.match('^Xi.?(d+)s.s/pre>,content,re.S)
print(res.group(1))
</pre>
可能有些朋友會覺得
匹配一個東西還要寫開頭結尾
有點麻煩
那么就可以使用 re 的另一個方法了
re.search
它會直接去掃描字符串
然后把匹配成功的第一個結果的返回給你
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import re
content = """Xiaoshuaib has 100
bananas"""
res = re.search('Xi.?(d+)s.s',content,re.S)
print(res.group(1))
</pre>
這樣子也是可以獲取 100 的
但是如果我們的內容是這樣的
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">content = """Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;"""
</pre>
想要獲取所有的 100 呢？
這時候就要用到 re 的另一個方法了
re.findall
通過它我們就能輕松的獲取所有匹配的內容了
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import re
content = """Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;"""
res = re.findall('Xi.?(d+)s.?s;',content,re.S)
print(res)
</pre>
這里的結果是
['100', '100', '100', '100']

又有朋友覺得
如果我們想直接替換匹配的內容呢
就比如剛剛的字符串
可不可以把 100 直接替換成 250 呢？

那就要用到 re 的另一個方法了
re.sub
可以這樣
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import re
content = """Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;
Xiaoshuaib has 100 bananas;"""
content = re.sub('d+','250',content)
print(content)
</pre>
那么結果就變成了
Xiaoshuaib has 250 bananas;
Xiaoshuaib has 250 bananas;
Xiaoshuaib has 250 bananas;
Xiaoshuaib has 250 bananas;
250 個香蕉
吃....得完么？？

再來說說 re 的另一個常用到的方法吧
re.compile
這個主要就是把我們的匹配符封裝一下
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import re
content = "Xiaoshuaib has 100 bananas"
pattern = re.compile('Xi.?(d+)s.s',re.S)
res = re.match(pattern,content)
print(res.group(1))
</pre>
其實和我們之前寫的一樣的
<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">res = re.match('^Xi.?(d+)s.s/pre>,content,re.S)
</pre>
只不過 compile 一下
便于以后復用
好了
關于 re 模塊和正則表達式就介紹完啦
知道了怎么請求數據
也知道了將返回的數據如何正則過濾
那么
爬蟲對我們來說還難么？

這次本篇真的完啦
再見

轉載于:https://blog.51cto.com/14186420/2346650

總結

以上是生活随笔為你收集整理的Python：年轻人，不会正则表达式你睡得着觉？有点出息没有？的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Spring源码分析-深入理解生命周期之
下一篇：利用Python制作王者荣耀出装小助手，

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python

Python：年轻人，不会正则表达式你睡得着觉？有点出息没有？

總結