日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

(上)python3 selenium3 从框架实现代码学习selenium让你事半功倍

發布時間:2023/12/4 python 42 豆豆
生活随笔 收集整理的這篇文章主要介紹了 (上)python3 selenium3 从框架实现代码学习selenium让你事半功倍 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文感謝以下文檔或說明提供的參考。
Selenium-Python中文文檔
Selenium Documentation
Webdriver 參考

如有錯誤歡迎在評論區指出,作者將即時更改。

環境說明

  • 操作系統:Windows7 SP1 64
  • python 版本:3.7.7
  • 瀏覽器:谷歌瀏覽器
  • 瀏覽器版本: 80.0.3987 (64 位)
  • 谷歌瀏覽器驅動:驅動版本需要對應瀏覽器版本,不同的瀏覽器使用對應不同版本的驅動,點擊下載
  • 如果是使用火狐瀏覽器,查看火狐瀏覽器版本,點擊 GitHub火狐驅動下載地址 下載(英文不好的同學右鍵一鍵翻譯即可,每個版本都有對應瀏覽器版本的使用說明,看清楚下載即可)

簡介

Selenium是一個涵蓋了一系列工具和庫的總體項目,這些工具和庫支持Web瀏覽器的自動化。并且在執行自動化時,所進行的操作會像真實用戶操作一樣。

Selenium有3個版本,分別是 Selenium 1.0、Selenium2.0、Selenium3.0;

Selenium 1.0 主要是調用JS注入到瀏覽器;最開始Selenium的作者Jason Huggins開發了JavaScriptTestRunner作為測試工具,當時向多位同事進行了展示(這個作者也是個很有趣的靈魂)。從這個測試工具的名字上可以看出,是基于JavaScript進行的測試。這個工具也就是Selenium的“前身”。

Selenium 2.0 基于 WebDriver 提供的API,進行瀏覽器的元素操作。WebDriver 是一個測試框架也可以說是一個集成的API接口庫。

Selenium 3.0 基于 Selenium 2.0 進行擴展,基本差別不大;本文將以Selenium 3.0 版本進行技術說明。

在官方介紹中介紹了有關支持瀏覽器的說明:“通過WebDriver,Selenium支持市場上所有主流瀏覽器,例如Chrom(ium),Firefox,Internet Explorer,Opera和Safari。

簡單開始

安裝好環境后,簡單的使用selenium讓瀏覽器打開CSDN官網。
在環境配置時需要注意:必須把驅動給配置到系統環境,或者丟到你python的根目錄下。

首先引入 webdriver :

from selenium.webdriver import Chrome

當然也可以:

from selenium import webdriver

引入方式因人而異,之后使用不同的方法新建不同的實例。

from selenium.webdriver import Chrome driver = Chrome()

或者

from selenium import webdriver driver = webdriver.Chrome()

一般性的python語法將不會在下文贅述。
之前所提到,需要把驅動配置到系統環境之中,但不外乎由于其它原因導致的不能驅動路徑不能加入到系統環境中,在這里提供一個解決方法:

from selenium import webdriver driver = webdriver.Chrome(executable_path=r'F:\python\dr\chromedriver_win32\chromedriver.exe')

這里使用 executable_path 指定驅動地址,這個地址是我驅動所存放的位置。當然這個位置可以根據自己需求制定,并且以更加靈活;本文為了更好說明,所以使用了絕對路徑傳入。

火狐瀏覽器:

from selenium import webdriverdriver = webdriver.Firefox() driver.get("http://www.csdn.net")

谷歌瀏覽器:

from selenium import webdriverdriver = webdriver.Chrome() driver.get("http://www.csdn.net")

火狐瀏覽器與谷歌瀏覽器只有實例化方法不同,其它的操作方法均一致。

在代碼最開頭引入 webdriver ,在代碼中實例化瀏覽器對象后,使用get方法請求網址,打開所需要的網址。

實現剖析

查看 webdriver.py 實現(from selenium import webdriver):

import warningsfrom selenium.webdriver.remote.webdriver import WebDriver as RemoteWebDriver from .remote_connection import ChromeRemoteConnection from .service import Service from .options import Optionsclass WebDriver(RemoteWebDriver):"""Controls the ChromeDriver and allows you to drive the browser.You will need to download the ChromeDriver executable fromhttp://chromedriver.storage.googleapis.com/index.html"""def __init__(self, executable_path="chromedriver", port=0,options=None, service_args=None,desired_capabilities=None, service_log_path=None,chrome_options=None, keep_alive=True):"""Creates a new instance of the chrome driver.Starts the service and then creates new instance of chrome driver.:Args:- executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH- port - port you would like the service to run, if left as 0, a free port will be found.- options - this takes an instance of ChromeOptions- service_args - List of args to pass to the driver service- desired_capabilities - Dictionary object with non-browser specificcapabilities only, such as "proxy" or "loggingPref".- service_log_path - Where to log information from the driver.- chrome_options - Deprecated argument for options- keep_alive - Whether to configure ChromeRemoteConnection to use HTTP keep-alive."""if chrome_options:warnings.warn('use options instead of chrome_options',DeprecationWarning, stacklevel=2)options = chrome_optionsif options is None:# desired_capabilities stays as passed inif desired_capabilities is None:desired_capabilities = self.create_options().to_capabilities()else:if desired_capabilities is None:desired_capabilities = options.to_capabilities()else:desired_capabilities.update(options.to_capabilities())self.service = Service(executable_path,port=port,service_args=service_args,log_path=service_log_path)self.service.start()try:RemoteWebDriver.__init__(self,command_executor=ChromeRemoteConnection(remote_server_addr=self.service.service_url,keep_alive=keep_alive),desired_capabilities=desired_capabilities)except Exception:self.quit()raiseself._is_remote = Falsedef launch_app(self, id):"""Launches Chrome app specified by id."""return self.execute("launchApp", {'id': id})def get_network_conditions(self):return self.execute("getNetworkConditions")['value']def set_network_conditions(self, **network_conditions):self.execute("setNetworkConditions", {'network_conditions': network_conditions})def execute_cdp_cmd(self, cmd, cmd_args):return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']def quit(self):try:RemoteWebDriver.quit(self)except Exception:# We don't care about the message because something probably has gone wrongpassfinally:self.service.stop()def create_options(self):return Options()

從注釋中表明這是 “創建chrome驅動程序的新實例,并且創建chrome驅動程序的實例”。

在此只列出本篇文章使用到的參數:

  • executable_path:可執行文件的路徑。如果使用默認值,則假定可執行文件位于PATH中;其中的PATH為系統環境根目錄

在 selenium 實現自動化過程中,必要的一步是啟動服務,查看 init初始化方法中,發現了以下代碼:

self.service = Service(executable_path,port=port,service_args=service_args,log_path=service_log_path) self.service.start()

以上代碼實例化了Service類,并且傳入相關參數,之后啟動服務;在這里最主要的參數為 executable_path,也就是啟動驅動。查看 Service 類(selenium.service):

from selenium.webdriver.common import serviceclass Service(service.Service):"""Object that manages the starting and stopping of the ChromeDriver"""def __init__(self, executable_path, port=0, service_args=None,log_path=None, env=None):"""Creates a new instance of the Service:Args:- executable_path : Path to the ChromeDriver- port : Port the service is running on- service_args : List of args to pass to the chromedriver service- log_path : Path for the chromedriver service to log to"""self.service_args = service_args or []if log_path:self.service_args.append('--log-path=%s' % log_path)service.Service.__init__(self, executable_path, port=port, env=env,start_error_message="Please see https://sites.google.com/a/chromium.org/chromedriver/home")def command_line_args(self):return ["--port=%d" % self.port] + self.service_args

查看基類 start 方法實現(由于基類過長不全部展出,基類在selenium.webdriver.common import service 中):

def start(self):"""Starts the Service.:Exceptions:- WebDriverException : Raised either when it can't start the serviceor when it can't connect to the service"""try:cmd = [self.path]cmd.extend(self.command_line_args())self.process = subprocess.Popen(cmd, env=self.env,close_fds=platform.system() != 'Windows',stdout=self.log_file,stderr=self.log_file,stdin=PIPE)except TypeError:raiseexcept OSError as err:if err.errno == errno.ENOENT:raise WebDriverException("'%s' executable needs to be in PATH. %s" % (os.path.basename(self.path), self.start_error_message))elif err.errno == errno.EACCES:raise WebDriverException("'%s' executable may have wrong permissions. %s" % (os.path.basename(self.path), self.start_error_message))else:raiseexcept Exception as e:raise WebDriverException("The executable %s needs to be available in the path. %s\n%s" %(os.path.basename(self.path), self.start_error_message, str(e)))count = 0while True:self.assert_process_still_running()if self.is_connectable():breakcount += 1time.sleep(1)if count == 30:raise WebDriverException("Can not connect to the Service %s" % self.path)

其中發現:

try:cmd = [self.path]cmd.extend(self.command_line_args())self.process = subprocess.Popen(cmd, env=self.env,close_fds=platform.system() != 'Windows',stdout=self.log_file,stderr=self.log_file,stdin=PIPE) except TypeError:raiseexcept OSError as err:if err.errno == errno.ENOENT:raise WebDriverException("'%s' executable needs to be in PATH. %s" % (os.path.basename(self.path), self.start_error_message))elif err.errno == errno.EACCES:raise WebDriverException("'%s' executable may have wrong permissions. %s" % (os.path.basename(self.path), self.start_error_message))else:raiseexcept Exception as e:raise WebDriverException("The executable %s needs to be available in the path. %s\n%s" %(os.path.basename(self.path), self.start_error_message, str(e)))count = 0while True:self.assert_process_still_running()if self.is_connectable():breakcount += 1time.sleep(1)if count == 30:raise WebDriverException("Can not connect to the Service %s" % self.path)

啟動子進程開啟驅動。在出現異常時接收拋出異常并且報錯。開啟驅動打開瀏覽器。

在異常拋出檢測到此已知道了selenium如何啟動服務。接下來查看get請求網址的實現流程。
查看webdriver基類(selenium.webdriver.remote.webdriver),找到get方法:

def get(self, url):"""Loads a web page in the current browser session."""self.execute(Command.GET, {'url': url})def execute(self, driver_command, params=None):"""Sends a command to be executed by a command.CommandExecutor.:Args:- driver_command: The name of the command to execute as a string.- params: A dictionary of named parameters to send with the command.:Returns:The command's JSON response loaded into a dictionary object."""if self.session_id is not None:if not params:params = {'sessionId': self.session_id}elif 'sessionId' not in params:params['sessionId'] = self.session_idparams = self._wrap_value(params)response = self.command_executor.execute(driver_command, params)if response:self.error_handler.check_response(response)response['value'] = self._unwrap_value(response.get('value', None))return response# If the server doesn't send a response, assume the command was# a successreturn {'success': 0, 'value': None, 'sessionId': self.session_id}

通過get方法得知,調用了 execute 方法,傳入了 Command.GET 與 url。
查看Command.GET的類Command(selenium.webdriver.remote.command)得知,Command為標準WebDriver命令的常量;找到GET常量:

GET = "get"

從文件上,應該是執行命令方式的類文件。
首先整理一下流程:

  • 啟動服務→調用get方法

其中get方法具體流程:

  • get方法調用execute方法,傳入參數為 Command.GET與url,查看Command的值是標準常量。 在execute方法中,

其中 execute 的實現為:

def execute(self, driver_command, params=None):"""Sends a command to be executed by a command.CommandExecutor.:Args:- driver_command: The name of the command to execute as a string.- params: A dictionary of named parameters to send with the command.:Returns:The command's JSON response loaded into a dictionary object."""if self.session_id is not None:if not params:params = {'sessionId': self.session_id}elif 'sessionId' not in params:params['sessionId'] = self.session_idparams = self._wrap_value(params)response = self.command_executor.execute(driver_command, params)if response:self.error_handler.check_response(response)response['value'] = self._unwrap_value(response.get('value', None))return response# If the server doesn't send a response, assume the command was# a successreturn {'success': 0, 'value': None, 'sessionId': self.session_id}

其中核心代碼為:

params = self._wrap_value(params) response = self.command_executor.execute(driver_command, params) if response:self.error_handler.check_response(response)response['value'] = self._unwrap_value(response.get('value', None))return response

主要查看:

self.command_executor.execute(driver_command, params)

其中 command_executor 為初始化后實例,查看派生類 webdriver(selenium import webdriver) command_executor 的實例化為:

RemoteWebDriver.__init__(self,command_executor=ChromeRemoteConnection(remote_server_addr=self.service.service_url,keep_alive=keep_alive),desired_capabilities=desired_capabilities)

查看 ChromeRemoteConnection 類(selenium import remote_connection):

from selenium.webdriver.remote.remote_connection import RemoteConnectionclass ChromeRemoteConnection(RemoteConnection):def __init__(self, remote_server_addr, keep_alive=True):RemoteConnection.__init__(self, remote_server_addr, keep_alive)self._commands["launchApp"] = ('POST', '/session/$sessionId/chromium/launch_app')self._commands["setNetworkConditions"] = ('POST', '/session/$sessionId/chromium/network_conditions')self._commands["getNetworkConditions"] = ('GET', '/session/$sessionId/chromium/network_conditions')self._commands['executeCdpCommand'] = ('POST', '/session/$sessionId/goog/cdp/execute')

得知調用的是基類初始化方法,查看得知 execute 方法實現為:

def execute(self, command, params):"""Send a command to the remote server.Any path subtitutions required for the URL mapped to the command should beincluded in the command parameters.:Args:- command - A string specifying the command to execute.- params - A dictionary of named parameters to send with the command asits JSON payload."""command_info = self._commands[command]assert command_info is not None, 'Unrecognised command %s' % commandpath = string.Template(command_info[1]).substitute(params)if hasattr(self, 'w3c') and self.w3c and isinstance(params, dict) and 'sessionId' in params:del params['sessionId']data = utils.dump_json(params)url = '%s%s' % (self._url, path)return self._request(command_info[0], url, body=data)def _request(self, method, url, body=None):"""Send an HTTP request to the remote server.:Args:- method - A string for the HTTP method to send the request with.- url - A string for the URL to send the request to.- body - A string for request body. Ignored unless method is POST or PUT.:Returns:A dictionary with the server's parsed JSON response."""LOGGER.debug('%s %s %s' % (method, url, body))parsed_url = parse.urlparse(url)headers = self.get_remote_connection_headers(parsed_url, self.keep_alive)resp = Noneif body and method != 'POST' and method != 'PUT':body = Noneif self.keep_alive:resp = self._conn.request(method, url, body=body, headers=headers)statuscode = resp.statuselse:http = urllib3.PoolManager(timeout=self._timeout)resp = http.request(method, url, body=body, headers=headers)statuscode = resp.statusif not hasattr(resp, 'getheader'):if hasattr(resp.headers, 'getheader'):resp.getheader = lambda x: resp.headers.getheader(x)elif hasattr(resp.headers, 'get'):resp.getheader = lambda x: resp.headers.get(x)data = resp.data.decode('UTF-8')try:if 300 <= statuscode < 304:return self._request('GET', resp.getheader('location'))if 399 < statuscode <= 500:return {'status': statuscode, 'value': data}content_type = []if resp.getheader('Content-Type') is not None:content_type = resp.getheader('Content-Type').split(';')if not any([x.startswith('image/png') for x in content_type]):try:data = utils.load_json(data.strip())except ValueError:if 199 < statuscode < 300:status = ErrorCode.SUCCESSelse:status = ErrorCode.UNKNOWN_ERRORreturn {'status': status, 'value': data.strip()}# Some of the drivers incorrectly return a response# with no 'value' field when they should return null.if 'value' not in data:data['value'] = Nonereturn dataelse:data = {'status': 0, 'value': data}return datafinally:LOGGER.debug("Finished Request")resp.close()

從以上實現得知,execute 為向遠程服務器發送請求;execute中調用的_request方法為發送http請求并且返回相關結果,請求結果通過瀏覽器進行響應。

官方說明中說明了請求原理:

At its minimum, WebDriver talks to a browser through a driver.
Communication is two way: WebDriver passes commands to the browser through the driver, and receives information back via the same route.

The driver is specific to the browser, such as ChromeDriver for Google’s Chrome/Chromium, GeckoDriver for Mozilla’s Firefox, etc. Thedriver runs on the same system as the browser. This may, or may not be, the same system where the tests themselves are executing.
This simple example above is direct communication. Communication to the browser may also be remote communication through Selenium Server or RemoteWebDriver. RemoteWebDriver runs on the same system as the driver and the browser.

言而總之我們通過webdriver與瀏覽器進行對話,從而瀏覽器進行響應。

通過以上實例得知,使用 execute 向遠程服務器發送請求會通過 webdriver 與瀏覽器交互,且發送已定義的命令常量可獲得一些相關信息。

由于在代碼中我們實例的是 webdriver 實例,去 webdriver基類(selenium.webdriver.remote.webdriver)中查詢相關信息,是否有相關函數可以獲取信息。發現以下函數:

def title(self):"""Returns the title of the current page.:Usage:title = driver.title"""resp = self.execute(Command.GET_TITLE)return resp['value'] if resp['value'] is not None else "" @property def current_url(self):"""Gets the URL of the current page.:Usage:driver.current_url"""return self.execute(Command.GET_CURRENT_URL)['value'] @property def page_source(self):"""Gets the source of the current page.:Usage:driver.page_source"""return self.execute(Command.GET_PAGE_SOURCE)['value']

以上并沒有列全,我們簡單的嘗試以上函數的使用方法,使用方法在函數中已經說明。嘗試獲取 title(標題)、current_url(當前url)、page_source(網頁源代碼):

from selenium import webdriver driver = webdriver.Chrome() driver.get("http://www.csdn.net") print(driver.title) print(driver.current_url) print('作者博客:https://blog.csdn.net/A757291228') #支持原創,轉載請貼上原文鏈接 # print(driver.page_source)

結果成功獲取到網頁標題以及當前網址:

試試 page_source:

from selenium import webdriver driver = webdriver.Chrome() driver.get("http://www.csdn.net") print(driver.title) print(driver.current_url) print('作者博客:https://blog.csdn.net/A757291228') #支持原創,轉載請貼上鏈接 print(driver.page_source)

成功獲取:

原創不易,看到這里點個贊支持一下唄!謝謝

總結

以上是生活随笔為你收集整理的(上)python3 selenium3 从框架实现代码学习selenium让你事半功倍的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。