當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

python2异步编程_python异步编程入门

發(fā)布時(shí)間：2023/12/15 python 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 python2异步编程_python异步编程入门小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

這幾天看代碼，總是會(huì)接觸到很多異步編程，之前只想著實(shí)現(xiàn)功能，從來(lái)沒(méi)考慮過(guò)代碼的運(yùn)行快慢問(wèn)題，故學(xué)習(xí)一番。

從0到1，了解python異步編程的演進(jìn)

1、urllib與requests爬蟲(chóng)

requests對(duì)請(qǐng)求做了優(yōu)化，因此比urllib快一點(diǎn)。

Requests是Python中的HTTP客戶(hù)端庫(kù)，網(wǎng)絡(luò)請(qǐng)求更加直觀方便，它與Urllib最大的區(qū)別就是在爬取數(shù)據(jù)的時(shí)候連接方式的不同。urllb爬取完數(shù)據(jù)是直接斷開(kāi)連接的，而requests爬取數(shù)據(jù)之后可以繼續(xù)復(fù)用socket，并沒(méi)有斷開(kāi)連接。

在python2.7版本下，Python urllib模塊分為兩部分，urllib和urllib2。Python3.5 版本下將python2.7版本的urllib和urllib2 合并在一起成一個(gè)新的urllib。

urllib：

#-*- coding:utf-8 -*-

import urllib.request

import ssl

from lxml import etree

url = 'https://movie.douban.com/top250'

context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_1)

def fetch_page(url):

response = urllib.request.urlopen(url, context=context)

return response

def parse(url):

response = fetch_page(url)

page = response.read()

html = etree.HTML(page)

xpath_movie = '//*[@id="content"]/div/div[1]/ol/li'

xpath_title = './/span[@class="title"]'

xpath_pages = '//*[@id="content"]/div/div[1]/div[2]/a'

pages = html.xpath(xpath_pages)

fetch_list = []

result = []

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for p in pages:

fetch_list.append(url + p.get('href'))

for url in fetch_list:

response = fetch_page(url)

page = response.read()

html = etree.HTML(page)

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for i, movie in enumerate(result, 1):

title = movie.find(xpath_title).text

print(i, title)

def main():

parse(url)

if __name__ == '__main__':

main()

requests代替標(biāo)準(zhǔn)庫(kù)urllib：

import requests

from lxml import etree

from time import time

url = 'https://movie.douban.com/top250'

def fetch_page(url):

response = requests.get(url)

return response

def parse(url):

response = fetch_page(url)

page = response.content

html = etree.HTML(page)

xpath_movie = '//*[@id="content"]/div/div[1]/ol/li'

xpath_title = './/span[@class="title"]'

xpath_pages = '//*[@id="content"]/div/div[1]/div[2]/a'

pages = html.xpath(xpath_pages)

fetch_list = []

result = []

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for p in pages:

fetch_list.append(url + p.get('href'))

for url in fetch_list:

response = fetch_page(url)

page = response.content

html = etree.HTML(page)

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for i, movie in enumerate(result, 1):

title = movie.find(xpath_title).text

# print(i, title)

2、lxml庫(kù)與正則表達(dá)式進(jìn)行解析

lxml庫(kù)進(jìn)行解析需要一定時(shí)間，但依賴(lài)正則表達(dá)式的程序會(huì)更加難以維護(hù)，擴(kuò)展性不高。

常見(jiàn)的組合是Requests+BeautifulSoup（解析網(wǎng)絡(luò)文本的工具庫(kù)），解析工具常見(jiàn)的還有正則，xpath。

將lxml庫(kù)換成標(biāo)準(zhǔn)的re庫(kù)：

#-*- coding:utf-8 -*-

import requests

from time import time

import re

url = 'https://movie.douban.com/top250'

def fetch_page(url):

response = requests.get(url)

return response

def parse(url):

response = fetch_page(url)

page = response.content

fetch_list = set()

result = []

for title in re.findall(rb'(.*)', page):

result.append(title)

for postfix in re.findall(rb'

fetch_list.add(url + postfix.decode())

for url in fetch_list:

response = fetch_page(url)

page = response.content

for title in re.findall(rb'

result.append(title)

for i, title in enumerate(result, 1):

title = title.decode()

# print(i, title)

3、進(jìn)階：多進(jìn)程和多線程

網(wǎng)絡(luò)應(yīng)用方面的編程（如上例中的爬蟲(chóng)），通常瓶頸都在IO層面，解決等待讀寫(xiě)的問(wèn)題比提高文本解析速度來(lái)的更有性?xún)r(jià)比。

程序切換—CPU時(shí)間的分配：操作系統(tǒng)自動(dòng)為每個(gè)程序分配一些 CPU/內(nèi)存/磁盤(pán)/鍵盤(pán)/顯示器等資源的使用時(shí)間，過(guò)期后自動(dòng)切換到下一個(gè)程序。當(dāng)然，被切換的程序，如果沒(méi)有執(zhí)行完，它的狀態(tài)會(huì)被保存起來(lái)，方便下次輪詢(xún)到的時(shí)候繼續(xù)執(zhí)行。

1）進(jìn)程：進(jìn)程就是“程序切換”的第一種方式。進(jìn)程，是執(zhí)行中的計(jì)算機(jī)程序。也就是說(shuō)，每個(gè)代碼在執(zhí)行的時(shí)候，首先本身即是一個(gè)進(jìn)程。一個(gè)進(jìn)程具有：就緒，運(yùn)行，中斷，僵死，結(jié)束等狀態(tài)（不同操作系統(tǒng)不一樣）。每個(gè)程序，本身首先是一個(gè)進(jìn)程。

2）線程：線程，也是“程序切換”的一種方式。線程，是在進(jìn)程中執(zhí)行的代碼。一個(gè)進(jìn)程下可以運(yùn)行多個(gè)線程，這些線程之間共享主進(jìn)程內(nèi)申請(qǐng)的操作系統(tǒng)資源。在一個(gè)進(jìn)程中啟動(dòng)多個(gè)線程的時(shí)候，每個(gè)線程按照順序執(zhí)行。現(xiàn)在的操作系統(tǒng)中，也支持線程搶占，也就是說(shuō)其它等待運(yùn)行的線程，可以通過(guò)優(yōu)先級(jí)，信號(hào)等方式，將運(yùn)行的線程掛起，自己先運(yùn)行。線程，必須在一個(gè)存在的進(jìn)程中啟動(dòng)運(yùn)行。線程使用進(jìn)程獲得的系統(tǒng)資源，不會(huì)像進(jìn)程那樣需要申請(qǐng)CPU等資源。

3）線程與進(jìn)程的區(qū)別：線程一般以并發(fā)執(zhí)行，正是由于這種并發(fā)和數(shù)據(jù)共享機(jī)制，使多任務(wù)間的協(xié)作成為可能。進(jìn)程一般以并行執(zhí)行，這種并行能使得程序能同時(shí)在多個(gè)CPU上運(yùn)行。

4）協(xié)程：協(xié)程，也是”程序切換“的一種。簡(jiǎn)單說(shuō)，協(xié)程也是線程，只是協(xié)程的調(diào)度并不是由操作系統(tǒng)調(diào)度，而是自己”協(xié)同調(diào)度“。也就是”協(xié)程是不通過(guò)操作系統(tǒng)調(diào)度的線程“。協(xié)程，又稱(chēng)微線程。協(xié)程間是協(xié)同調(diào)度的，這使得并發(fā)量數(shù)萬(wàn)以上的時(shí)候，協(xié)程的性能是遠(yuǎn)遠(yuǎn)高于線程。注意這里也是“并發(fā)”，不是“并行”。

多線程有效地解決了阻塞等待的問(wèn)題。

#-*- coding:utf-8 -*-

import requests

from lxml import etree

from time import time

from threading import Thread

url = 'https://movie.douban.com/top250'

def fetch_page(url):

response = requests.get(url)

return response

def parse(url):

response = fetch_page(url)

page = response.content

html = etree.HTML(page)

xpath_movie = '//*[@id="content"]/div/div[1]/ol/li'

xpath_title = './/span[@class="title"]'

xpath_pages = '//*[@id="content"]/div/div[1]/div[2]/a'

pages = html.xpath(xpath_pages)

fetch_list = []

result = []

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for p in pages:

fetch_list.append(url + p.get('href'))

def fetch_content(url):

response = fetch_page(url)

page = response.content

html = etree.HTML(page)

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

threads = []

for url in fetch_list:

t = Thread(target=fetch_content, args=[url])

t.start()

threads.append(t)

for t in threads:

t.join()

for i, movie in enumerate(result, 1):

title = movie.find(xpath_title).text

# print(i, title)

多進(jìn)程，用4個(gè)進(jìn)程的進(jìn)程池來(lái)并行處理網(wǎng)絡(luò)數(shù)據(jù)。

#-*- coding:utf-8 -*-

import requests

from lxml import etree

from time import time

from concurrent.futures import ProcessPoolExecutor

url = 'https://movie.douban.com/top250'

def fetch_page(url):

response = requests.get(url)

return response

def fetch_content(url):

response = fetch_page(url)

page = response.content

return page

def parse(url):

page = fetch_content(url)

html = etree.HTML(page)

xpath_movie = '//*[@id="content"]/div/div[1]/ol/li'

xpath_title = './/span[@class="title"]'

xpath_pages = '//*[@id="content"]/div/div[1]/div[2]/a'

pages = html.xpath(xpath_pages)

fetch_list = []

result = []

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for p in pages:

fetch_list.append(url + p.get('href'))

with ProcessPoolExecutor(max_workers=4) as executor:

for page in executor.map(fetch_content, fetch_list):

html = etree.HTML(page)

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for i, movie in enumerate(result, 1):

title = movie.find(xpath_title).text

# print(i, title)

這里多進(jìn)程帶來(lái)的優(yōu)點(diǎn)（cpu處理）并沒(méi)有得到體現(xiàn)，反而創(chuàng)建和調(diào)度進(jìn)程帶來(lái)的開(kāi)銷(xiāo)要遠(yuǎn)超出它的正面效應(yīng)，拖了一把后腿。即便如此，多進(jìn)程帶來(lái)的效益相比于之前單進(jìn)程單線程的模型要好得多。

多進(jìn)程和多線程除了創(chuàng)建的開(kāi)銷(xiāo)大之外還有一個(gè)難以根治的缺陷，就是處理進(jìn)程之間或線程之間的協(xié)作問(wèn)題，因?yàn)槭且蕾?lài)多進(jìn)程和多線程的程序在不加鎖的情況下通常是不可控的，而協(xié)程則可以完美地解決協(xié)作問(wèn)題，由用戶(hù)來(lái)決定協(xié)程之間的調(diào)度。

基于gevent的異步程序：

#-*- coding:utf-8 -*-

import requests

from lxml import etree

from time import time

import gevent

from gevent import monkey

monkey.patch_all()

url = 'https://movie.douban.com/top250'

def fetch_page(url):

response = requests.get(url)

return response

def fetch_content(url):

response = fetch_page(url)

page = response.content

return page

def parse(url):

page = fetch_content(url)

html = etree.HTML(page)

xpath_movie = '//*[@id="content"]/div/div[1]/ol/li'

xpath_title = './/span[@class="title"]'

xpath_pages = '//*[@id="content"]/div/div[1]/div[2]/a'

pages = html.xpath(xpath_pages)

fetch_list = []

result = []

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for p in pages:

fetch_list.append(url + p.get('href'))

jobs = [gevent.spawn(fetch_content, url) for url in fetch_list]

gevent.joinall(jobs)

[job.value for job in jobs]

for page in [job.value for job in jobs]:

html = etree.HTML(page)

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for i, movie in enumerate(result, 1):

title = movie.find(xpath_title).text

# print(i, title)

gevent給予了我們一種以同步邏輯來(lái)書(shū)寫(xiě)異步程序的能力，看monkey.patch_all()這段代碼，它是整個(gè)程序?qū)崿F(xiàn)異步的黑科技，當(dāng)我們給程序打了猴子補(bǔ)丁后，Python程序在運(yùn)行時(shí)會(huì)動(dòng)態(tài)地將一些網(wǎng)絡(luò)庫(kù)（例如socket，thread）替換掉，變成異步的庫(kù)。使得程序在進(jìn)行網(wǎng)絡(luò)操作的時(shí)候都變成異步的方式去工作，效率就自然提升很多了。

4、python Async/Await

Python需要一個(gè)獨(dú)立的標(biāo)準(zhǔn)庫(kù)來(lái)支持協(xié)程，于是就有了后來(lái)的asyncio。

把同步的requests庫(kù)改成了支持asyncio的aiohttp庫(kù)，使用3.5的async/await語(yǔ)法編寫(xiě)協(xié)程版本的例子。

#-*- coding:utf-8 -*-

from lxml import etree

from time import time

import asyncio

import aiohttp

url = 'https://movie.douban.com/top250'

async def fetch_content(url):

async with aiohttp.ClientSession() as session:

async with session.get(url) as response:

return await response.text()

async def parse(url):

page = await fetch_content(url)

html = etree.HTML(page)

xpath_movie = '//*[@id="content"]/div/div[1]/ol/li'

xpath_title = './/span[@class="title"]'

xpath_pages = '//*[@id="content"]/div/div[1]/div[2]/a'

pages = html.xpath(xpath_pages)

fetch_list = []

result = []

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for p in pages:

fetch_list.append(url + p.get('href'))

tasks = [fetch_content(url) for url in fetch_list]

pages = await asyncio.gather(*tasks)

for page in pages:

html = etree.HTML(page)

for element_movie in html.xpath(xpath_movie):

result.append(element_movie)

for i, movie in enumerate(result, 1):

title = movie.find(xpath_title).text

# print(i, title)

def main():

loop = asyncio.get_event_loop()

start = time()

for i in range(5):

loop.run_until_complete(parse(url))

end = time()

print ('Cost {} seconds'.format((end - start) / 5))

loop.close()

速度快，且提高了程序的可讀性。

Python Async/Await入門(mén)指南

留坑待續(xù)......

總結(jié)

以上是生活随笔為你收集整理的python2异步编程_python异步编程入门的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：欧美股市最新行情，美股周五开盘吗
下一篇： python去重且顺序不变_Python

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

python

python2异步编程_python异步编程入门

總結(jié)