如何击败Python的问题
Following the previous article written about solving Python dependencies, we will take a look at the quality of software. This article will cover “inspections” of software stacks and will link a free dataset available on Kaggle. Even though the title says the quality of “machine learning software”, principles and ideas can be reused for inspecting any software quality.
在上一篇有關(guān)解決Python依賴關(guān)系的文章之后,我們將介紹軟件的質(zhì)量。 本文將介紹軟件堆棧的“檢查”,并將鏈接Kaggle上可用的免費(fèi)數(shù)據(jù)集。 即使標(biāo)題說明了“機(jī)器學(xué)習(xí)軟件”的質(zhì)量,也可以重用原理和思想來檢查任何軟件質(zhì)量。
應(yīng)用程序(軟件和硬件)堆棧 (Application (Software & Hardware) Stack)
Let’s consider a Python machine learning application. This application can use a machine learning library, such as TensorFlow. TensorFlow is in that case a direct dependency of the application and by installing it, the machine learning application is using directly TensorFlow and indirectly dependencies of TensorFlow. Examples of such indirect dependencies of our application can be NumPy or absl-py that are used by TensorFlow.
讓我們考慮一個(gè)Python機(jī)器學(xué)習(xí)應(yīng)用程序。 該應(yīng)用程序可以使用機(jī)器學(xué)習(xí)庫,例如TensorFlow 。 在這種情況下,TensorFlow是應(yīng)用程序的直接依賴項(xiàng),通過安裝它,機(jī)器學(xué)習(xí)應(yīng)用程序?qū)⒅苯邮褂肨ensorFlow并間接使用TensorFlow依賴項(xiàng)。 我們應(yīng)用程序的這種間接依賴關(guān)系的示例可以是TensorFlow使用的NumPy或absl-py 。
Our machine learning Python application and all the Python libraries run on top of a Python interpreter in some specific version. Moreover, they can use other additional native dependencies (provided by the operating system) such as glibc or CUDA (if running computations on GPU). To visualize this fact, let’s create a stack with all the items creating the application stack running on top of some hardware.
我們的機(jī)器學(xué)習(xí)Python應(yīng)用程序和所有Python庫在某些特定版本的Python解釋器上運(yùn)行。 此外,他們可以使用其他附加的本機(jī)依賴項(xiàng)(由操作系統(tǒng)提供),例如glibc或CUDA (如果在GPU上運(yùn)行計(jì)算)。 為了形象化這一事實(shí),讓我們創(chuàng)建一個(gè)堆棧,其中所有項(xiàng)都創(chuàng)建在某些硬件之上運(yùn)行的應(yīng)用程序堆棧。
Abstract layers of an application stack.應(yīng)用程序堆棧的抽象層。Note that an issue in any of the described layers causes that our Python application misbehaves, produces wrong output, produces runtime errors, or simply does not start at all.
請(qǐng)注意,任何描述的層中的問題都會(huì)導(dǎo)致我們的Python應(yīng)用程序行為異常,產(chǎn)生錯(cuò)誤的輸出,產(chǎn)生運(yùn)行時(shí)錯(cuò)誤,或者根本無法啟動(dòng)。
Let’s try to identify any possible issues in the described stack by building the software and let’s have it running on our hardware. By doing so we can spot possible issues before pushing our application to a production environment or fine-tune the software so that we get the best possible out of our application on the hardware available.
讓我們嘗試通過構(gòu)建軟件來確定所描述堆棧中的任何可能問題,并使其在我們的硬件上運(yùn)行。 這樣,我們可以在將應(yīng)用程序推送到生產(chǎn)環(huán)境之前發(fā)現(xiàn)可能的問題,或者對(duì)軟件進(jìn)行微調(diào),以便在可用硬件上充分利用應(yīng)用程序。
按需軟件堆棧創(chuàng)建 (On-demand software stack creation)
If our application depends on a TensorFlow release starting version 2.0.0 (e.g. requirements on API offered by tensorflow>=2.0.0), we can test our application with different versions of TensorFlow up to the current 2.3.0 release available on PyPI to this date. The same can be applied to transitive dependencies of TensorFlow, e.g. absl-py, NumPy, or any other. A version change of any transitive dependency can be performed analogically to any other dependency in our software stack.
如果我們的應(yīng)用程序依賴于2.0.0版本開始的TensorFlow版本(例如tensorflow>=2.0.0提供的API要求),我們可以使用不同版本的TensorFlow來測(cè)試我們的應(yīng)用程序,直到PyPI上可用的當(dāng)前2.3.0版本為止。 這個(gè)日期 。 這可以應(yīng)用于TensorFlow的傳遞依賴項(xiàng),例如absl-py , NumPy或任何其他。 任何傳遞依賴的版本更改都可以類似于我們軟件堆棧中的任何其他依賴進(jìn)行。
依賴猴子 (Dependency Monkey)
Note one version change can completely change (or even invalidate) what dependencies in what versions will be present in the application stack considering the dependency graph and version range specifications of libraries present in the software stack. To create a pinned down list of packages in specific versions to be installed a resolver needs to be run in order to resolve packages and their version range requirements.
請(qǐng)注意,考慮到軟件堆棧中存在的庫的依賴關(guān)系圖和版本范圍規(guī)范,一個(gè)版本更改可以完全更改(甚至無效)應(yīng)用程序堆棧中將存在哪些版本的依賴關(guān)系。 要?jiǎng)?chuàng)建要安裝的特定版本的軟件包的固定列表,需要運(yùn)行解析器以解析軟件包及其版本范圍要求。
Do you remember the state space described in the first article of “How to beat Python’s pip” series? Dependency Monkey can in fact create the state space of all the possible software stacks that can be resolved respecting version range specifications. If the state space is too large to resovle in a reasonable time, it can be sampled.
您還記得“如何擊敗Python的點(diǎn)子”系列的第一篇文章中描述的狀態(tài)空間嗎? Dependency Monkey實(shí)際上可以創(chuàng)建所有可能的軟件堆棧的狀態(tài)空間,這些版本可以根據(jù)版本范圍規(guī)范進(jìn)行解析。 如果狀態(tài)空間太大而無法在合理的時(shí)間內(nèi)恢復(fù)狀態(tài),則可以對(duì)其進(jìn)行采樣。
An interpolated score function for a state-space made when installing two dependencies “simplelib” and “anotherlib” in different versions (valid combinations of different versions installed together).當(dāng)在不同版本中安裝兩個(gè)依賴項(xiàng)“ simplelib”和“ anotherlib”(一起安裝的不同版本的有效組合)時(shí),為狀態(tài)空間提供一個(gè)插值得分函數(shù)。A component called “Dependency Monkey” is capable of creating different software stacks considering the dependency graph and version specifications of packages in the dependency graph. This all is done offline based on pre-computed results from Thoth’s solver runs (see the previous article from “How to beat Python’s pip” series). The results of solver runs are synced into Thoth’s database so that they are available in a query-able form. Doing so enables Dependency Monkey to resolve software stacks at a fast pace (see a YouTube video on optimizing Thoth’s resolver). Moreover, the underlying algorithm can consider Python packages published on different Python indices (besides PyPI, it can also use custom TensorFlow builds from an index such as the AICoE one). We will do a more in-depth explanation of Dependency Monkey in one of the upcoming articles. If you are too eager, feel free to browse its online documentation.
考慮到依賴關(guān)系圖和依賴關(guān)系圖中軟件包的版本規(guī)格,稱為“依賴關(guān)系猴子”的組件能夠創(chuàng)建不同的軟件堆棧。 所有這些都是根據(jù)Thoth的求解器運(yùn)行的預(yù)先計(jì)算的結(jié)果脫機(jī)完成的(請(qǐng)參閱“ How to beat Python's pip”系列的上一篇文章) 。 求解器運(yùn)行的結(jié)果將同步到Thoth的數(shù)據(jù)庫中,以便以可查詢的形式提供它們。 這樣做使Dependency Monkey能夠快速解決軟件堆棧的問題 (請(qǐng)參見有關(guān)優(yōu)化Thoth解析器的YouTube視頻 )。 此外,底層算法可以考慮發(fā)布在不同Python索引上的Python包( 除了PyPI之外 ,它還可以使用來自諸如AICoE的索引的自定義TensorFlow構(gòu)建 )。 我們將在后續(xù)文章之一中對(duì)Dependency Monkey做更深入的解釋。 如果您太渴望了,請(qǐng)隨時(shí)瀏覽其在線文檔 。
Amun API (Amun API)
Now, let’s utilize a service called “Amun”. This service was designed to accept a specification of the software stack and hardware and execute an application given the specification.
現(xiàn)在,讓我們利用一項(xiàng)名為“ Amun ”的服務(wù)。 該服務(wù)旨在接受軟件堆棧和硬件的規(guī)范,并根據(jù)給定的規(guī)范執(zhí)行應(yīng)用程序。
Amun is an OpenShift cluster native application, that utilizes OpenShift features (such as builds, container image registry, …) and Argo Workflows to run desired software on specific hardware using a specific software environment. The specification is accepted in a JSON format that is subsequently translated into respective steps that need to be done in order to test the given stack build and run.
Amun是一個(gè)OpenShift群集本機(jī)應(yīng)用程序,它利用OpenShift功能(例如構(gòu)建,容器映像注冊(cè)表等)和Argo Workflow在使用特定軟件環(huán)境的特定硬件上運(yùn)行所需的軟件。 該規(guī)范以JSON格式接受,隨后將其轉(zhuǎn)換為需要執(zhí)行的各個(gè)步驟,以測(cè)試給定的堆棧構(gòu)建和運(yùn)行。
A walkthrough on running Amun inspections to check the quality of software.有關(guān)運(yùn)行Amun檢查以檢查軟件質(zhì)量的演練。The video linked above shows how Amun inspections are run and how the knowledge created is aggregated using OpenShift, Argo workflows, and Ceph. You can see inspected different TensorFlow builds tensorflow , tensorflow-cpu , intel-tensorflow and a community builds of TensorFlow for AVX2 instruction set support available on the AICoE index.
上面鏈接的視頻顯示了如何運(yùn)行Amun檢查以及如何使用OpenShift,Argo工作流程和Ceph匯總所創(chuàng)建的知識(shí)。 您可以在AICoE索引上看到經(jīng)過檢查的不同TensorFlow構(gòu)建tensorflow , tensorflow-cpu , intel-tensorflow和TensorFlow for AVX2指令集支持的社區(qū)構(gòu)建 。
在Kaggle上的Thoth檢查數(shù)據(jù)集 (Thoth’s inspection dataset on Kaggle)
We (Red Hat) have produced multiple inspections as part of the project Thoth where we tested different TensorFlow releases and different TensorFlow builds.
我們(Red Hat)作為Thoth項(xiàng)目的一部分進(jìn)行了多次檢查,在其中我們測(cè)試了不同的TensorFlow版本和不同的TensorFlow版本。
One such dataset is Thoth’s performance data set in version 1 on Kaggle. It’s consisting out of nearly 4000 files capturing information about inspection runs of TensorFlow stacks. A notebook published together with the dataset can help one exploring the dataset.
這樣的數(shù)據(jù)集就是在Kaggle的版本1中的Thoth的性能數(shù)據(jù) 。 它由近4000個(gè)文件組成,這些文件捕獲有關(guān)TensorFlow堆棧檢查運(yùn)行的信息。 與數(shù)據(jù)集一起發(fā)布的筆記本可以幫助人們探索數(shù)據(jù)集。
An introduction to Thoth’s datasets available on Kaggle.在Kaggle上可獲得Thoth數(shù)據(jù)集的簡(jiǎn)介。托特計(jì)劃 (Project Thoth)
Project Thoth is an application that aims to help Python developers. If you wish to be updated on any improvements and any progress we make in project Thoth, feel free to subscribe to our YouTube channel where we post updates as well as recordings from scrum demos.
Project Thoth是旨在幫助Python開發(fā)人員的應(yīng)用程序。 如果您希望了解我們?cè)赥hoth項(xiàng)目中所做的任何改進(jìn)和進(jìn)展的最新信息,請(qǐng)隨時(shí)訂閱我們的YouTube頻道 ,我們?cè)谄渲邪l(fā)布更新以及Scrum演示的錄音。
Stay tuned for any updates!
請(qǐng)隨時(shí)關(guān)注任何更新!
翻譯自: https://towardsdatascience.com/how-to-beat-pythons-pip-inspecting-the-quality-of-machine-learning-software-f1a028f0c42a
總結(jié)
以上是生活随笔為你收集整理的如何击败Python的问题的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到吃梨是什么意思周公解梦
- 下一篇: 在Python中使用Twitter Re