气流与路易吉,阿戈,MLFlow,KubeFlow
任務(wù)編排工具和工作流程 (Task orchestration tools and workflows)
Recently there’s been an explosion of new tools for orchestrating task- and data workflows (sometimes referred to as “MLOps”). The quantity of these tools can make it hard to choose which ones to use and to understand how they overlap, so we decided to compare some of the most popular ones head to head.
最近,用于編排任務(wù)和數(shù)據(jù)工作流(有時(shí)稱為“ MLOps”) 的新工具激增。 這些工具的數(shù)量眾多,因此很難選擇要使用的工具,也難以理解它們的重疊方式,因此我們決定對(duì)一些最受歡迎的工具進(jìn)行比較。
Airflow is the most popular solution, followed by Luigi. There are newer contenders too, and they’re all growing fast. (Source: Author)氣流是最受歡迎的解決方案,其次是Luigi。 也有新的競(jìng)爭(zhēng)者,而且它們都在快速增長(zhǎng)。 (來源:作者)Overall Apache Airflow is both the most popular tool and also the one with the broadest range of features, but Luigi is a similar tool that’s simpler to get started with. Argo is the one teams often turn to when they’re already using Kubernetes, and Kubeflow and MLFlow serve more niche requirements related to deploying machine learning models and tracking experiments.
總體而言,Apache Airflow既是最受歡迎的工具,也是功能最廣泛的工具,但是Luigi是類似的工具,上手起來比較簡(jiǎn)單。 Argo是團(tuán)隊(duì)已經(jīng)在使用Kubernetes時(shí)經(jīng)常使用的一種,而Kubeflow和MLFlow滿足了與部署機(jī)器學(xué)習(xí)模型和跟蹤實(shí)驗(yàn)有關(guān)的更多利基需求。
Before we dive into a detailed comparison, it’s useful to understand some broader concepts related to task orchestration.
在進(jìn)行詳細(xì)比較之前,了解一些與任務(wù)編排相關(guān)的更廣泛的概念很有用。
什么是任務(wù)編排,為什么有用? (What is task orchestration and why is it useful?)
Smaller teams usually start out by managing tasks manually — such as cleaning data, training machine learning models, tracking results, and deploying the models to a production server. As the size of the team and the solution grows, so does the number of repetitive steps. It also becomes more important that these tasks are executed reliably.
較小的團(tuán)隊(duì)通常從手動(dòng)管理任務(wù)開始,例如清理數(shù)據(jù),訓(xùn)練機(jī)器學(xué)習(xí)模型,跟蹤結(jié)果以及將模型部署到生產(chǎn)服務(wù)器。 隨著團(tuán)隊(duì)規(guī)模和解決方案的增長(zhǎng),重復(fù)步驟的數(shù)量也隨之增加。 可靠地執(zhí)行這些任務(wù)也變得更加重要。
The complex ways these tasks depend on each other also increases. When you start out, you might have a pipeline of tasks that needs to be run once a week, or once a month. These tasks need to be run in a specific order. As you grow, this pipeline becomes a network with dynamic branches. In certain cases, some tasks set off other tasks, and these might depend on several other tasks running first.
這些任務(wù)相互依賴的復(fù)雜方式也在增加。 當(dāng)你開始,你可能有任務(wù)的管道需要進(jìn)行每周運(yùn)行一次或每月一次。 這些任務(wù)需要按特定順序運(yùn)行。 隨著您的成長(zhǎng),該管道變成具有動(dòng)態(tài)分支的網(wǎng)絡(luò) 。 在某些情況下,某些任務(wù)會(huì)引發(fā)其他任務(wù),而這些可能取決于首先運(yùn)行的其他幾個(gè)任務(wù)。
This network can be modelled as a DAG — a Directed Acyclic Graph, which models each task and the dependencies between them.
可以將該網(wǎng)絡(luò)建模為DAG(有向無環(huán)圖),該模型對(duì)每個(gè)任務(wù)及其之間的依賴關(guān)系進(jìn)行建模。
A pipeline is a limited DAG where each task has one upstream and one downstream dependency at most. (Source: Author)管道是有限的DAG,其中每個(gè)任務(wù)最多具有一個(gè)上游和一個(gè)下游依賴性。 (來源:作者)Workflow orchestration tools allow you to define DAGs by specifying all of your tasks and how they depend on each other. The tool then executes these tasks on schedule, in the correct order, retrying any that fail before running the next ones. It also monitors the progress and notifies your team when failures happen.
工作流程編排工具允許您通過指定所有任務(wù)以及它們?nèi)绾蜗嗷ヒ蕾噥矶xDAG。 然后,該工具按正確的順序按計(jì)劃執(zhí)行這些任務(wù),然后在運(yùn)行下一個(gè)任務(wù)之前重試任何失敗的任務(wù)。 它還會(huì)監(jiān)視進(jìn)度,并在發(fā)生故障時(shí)通知您的團(tuán)隊(duì)。
CI/CD tools such as Jenkins are commonly used to automatically test and deploy code, and there is a strong parallel between these tools and task orchestration tools — but there are important distinctions too. Even though in theory you can use these CI/CD tools to orchestrate dynamic, interlinked tasks, at a certain level of complexity you’ll find it easier to use more general tools like Apache Airflow instead.
CI / CD工具(例如Jenkins)通常用于自動(dòng)測(cè)試和部署代碼,這些工具與任務(wù)編排工具之間有很強(qiáng)的相似性-但也有重要的區(qū)別。 即使從理論上講, 您可以使用這些CI / CD工具來編排動(dòng)態(tài)的,相互鏈接的任務(wù) ,但在一定程度的復(fù)雜性下,您會(huì)發(fā)現(xiàn)改用Apache Airflow等更通用的工具會(huì)更容易。
[Want more articles like this? Sign up to our newsletter. We share a maximum of one article per week and never send any kind of promotional mail].
[想要更多這樣的文章嗎? 訂閱我們的新聞通訊 。 我們每周最多共享一篇文章,從不發(fā)送任何形式的促銷郵件]。
Overall, the focus of any orchestration tool is ensuring centralized, repeatable, reproducible, and efficient workflows: a virtual command center for all of your automated tasks. With that context in mind, let’s see how some of the most popular workflow tools stack up.
總體而言,任何業(yè)務(wù)流程工具的重點(diǎn)都是確保集中,可重復(fù),可重現(xiàn)和高效的工作流程:虛擬命令中心,用于您的所有自動(dòng)化任務(wù)。 考慮到這種情況,讓我們看看一些最流行的工作流工具是如何堆疊的。
告訴我使用哪一個(gè) (Just tell me which one to use)
You should probably use:
您可能應(yīng)該使用:
Apache Airflow if you want the most full-featured, mature tool and you can dedicate time to learning how it works, setting it up, and maintaining it.
阿帕奇氣流 如果您需要功能最全,最成熟的工具,則可以花時(shí)間來學(xué)習(xí)它的工作原理,設(shè)置和維護(hù)它。
Luigi if you need something with an easier learning curve than Airflow. It has fewer features, but it’s easier to get off the ground.
路易吉 如果您需要比Airflow更容易學(xué)習(xí)的東西。 它具有較少的功能,但更容易起步。
Argo if you’re already deeply invested in the Kubernetes ecosystem and want to manage all of your tasks as pods, defining them in YAML instead of Python.
Argo,如果您已經(jīng)對(duì)Kubernetes生態(tài)系統(tǒng)進(jìn)行了深入投資,并希望將所有任務(wù)作為Pod進(jìn)行管理,請(qǐng)?jiān)赮AML中定義它們,而不是Python。
KubeFlow if you want to use Kubernetes but still define your tasks with Python instead of YAML.
庫(kù)伯流 如果您想使用Kubernetes,但仍使用Python而不是YAML定義任務(wù)。
MLFlow if you care more about tracking experiments or tracking and deploying models using MLFlow’s predefined patterns than about finding a tool that can adapt to your existing custom workflows.
MLFlow,如果您更關(guān)心使用MLFlow的預(yù)定義模式跟蹤實(shí)驗(yàn)或跟蹤和部署模型,而不是尋找可以適應(yīng)現(xiàn)有自定義工作流程的工具。
比較表 (Comparison table)
Get our weekly newsletter獲取我們的每周新聞For a quick overview, we’ve compared the libraries when it comes to:
為了快速瀏覽,我們比較了以下方面的庫(kù):
Maturity: based on the age of the project and the number of fixes and commits;
成熟度:基于項(xiàng)目的年齡以及修復(fù)和提交的次數(shù);
Popularity: based on adoption and GitHub stars;
受歡迎程度:基于采用率和GitHub星級(jí);
Simplicity: based on ease of onboarding and adoption;
簡(jiǎn)潔性:基于易于注冊(cè)和采用;
Breadth: based on how specialized vs. how adaptable each project is;
廣度:基于每個(gè)項(xiàng)目的專業(yè)性與適應(yīng)性;
Language: based on the primary way you interact with the tool.
語言:基于您與工具互動(dòng)的主要方式。
These are not rigorous or scientific benchmarks, but they’re intended to give you a quick overview of how the tools overlap and how they differ from each other. For more details, see the head-to-head comparison below.
這些不是嚴(yán)格或科學(xué)的基準(zhǔn),但是它們旨在使您快速了解這些工具如何重疊以及它們?nèi)绾伪舜瞬煌?有關(guān)更多詳細(xì)信息,請(qǐng)參見下面的正面對(duì)比。
路易吉vs.氣流 (Luigi vs. Airflow)
Luigi and Airflow solve similar problems, but Luigi is far simpler. It’s contained in a single component, while Airflow has multiple modules which can be configured in different ways. Airflow has a larger community and some extra features, but a much steeper learning curve. Specifically, Airflow is far more powerful when it comes to scheduling, and it provides a calendar UI to help you set up when your tasks should run. With Luigi, you need to write more custom code to run tasks on a schedule.
Luigi和Airflow解決了類似的問題,但是Luigi要簡(jiǎn)單得多。 它包含在單個(gè)組件中,而Airflow有多個(gè)模塊,可以用不同的方式進(jìn)行配置。 氣流具有更大的社區(qū)和一些其他功能,但學(xué)習(xí)曲線卻陡峭得多。 具體來說,Airflow在計(jì)劃方面要強(qiáng)大得多,它提供了日歷UI,可幫助您設(shè)置任務(wù)應(yīng)在何時(shí)運(yùn)行。 使用Luigi,您需要編寫更多的自定義代碼以按計(jì)劃運(yùn)行任務(wù)。
Both tools use Python and DAGs to define tasks and dependencies. Use Luigi if you have a small team and need to get started quickly. Use Airflow if you have a larger team and can take an initial productivity hit in exchange for more power once you’ve gotten over the learning curve.
兩種工具都使用Python和DAG定義任務(wù)和依賴項(xiàng)。 如果您的團(tuán)隊(duì)較小并且需要快速上手,請(qǐng)使用Luigi。 如果您的團(tuán)隊(duì)規(guī)模較大,可以使用Airflow,一旦您掌握了學(xué)習(xí)曲線,就可以以最初的生產(chǎn)力下降來?yè)Q取更多的功能。
路易吉vs.阿爾戈 (Luigi vs. Argo)
Argo is built on top of Kubernetes, and each task is run as a separate Kubernetes pod. This can be convenient if you’re already using Kubernetes for most of your infrastructure, but it will add complexity if you’re not. Luigi is a Python library and can be installed with Python package management tools, such as pip and conda. Argo is a Kubernetes extension and is installed using Kubernetes. While both tools let you define your tasks as DAGs, with Luigi you’ll use Python to write these definitions, and with Argo you’ll use YAML.
Argo建立在Kubernetes之上 ,并且每個(gè)任務(wù)都作為單獨(dú)的Kubernetes容器運(yùn)行。 如果您已經(jīng)在大多數(shù)基礎(chǔ)架構(gòu)中使用Kubernetes,這可能會(huì)很方便,但是如果您沒有使用Kubernetes,則會(huì)增加復(fù)雜性。 Luigi是一個(gè)Python庫(kù),可以與Python包管理工具(如pip和conda)一起安裝。 Argo是Kubernetes擴(kuò)展 ,使用Kubernetes安裝。 雖然這兩種工具都可以將任務(wù)定義為DAG,但使用Luigi時(shí),您將使用Python編寫這些定義,而使用Argo時(shí),您將使用YAML。
Use Argo if you’re already invested in Kubernetes and know that all of your tasks will be pods. You should also consider it if the developers who’ll be writing the DAG definitions are more comfortable with YAML than Python. Use Luigi if you’re not running on Kubernetes and have Python expertise on the team.
如果您已經(jīng)對(duì)Kubernetes進(jìn)行了投資,并且知道所有任務(wù)都是吊艙,請(qǐng)使用Argo。 如果將要編寫DAG定義的開發(fā)人員對(duì)YAML比對(duì)Python更滿意,則還應(yīng)該考慮這一點(diǎn)。 如果您不是在Kubernetes上運(yùn)行并且在團(tuán)隊(duì)中擁有Python專業(yè)知識(shí),請(qǐng)使用Luigi。
路易吉vs.庫(kù)伯福 (Luigi vs. Kubeflow)
Luigi is a Python-based library for general task orchestration, while Kubeflow is a Kubernetes-based tool specifically for machine learning workflows. Luigi is built to orchestrate general tasks, while Kubeflow has prebuilt patterns for experiment tracking, hyper-parameter optimization, and serving Jupyter notebooks. Kubeflow consists of two distinct components: Kubeflow and Kubeflow Pipelines. The latter is focused on model deployment and CI/CD, and it can be used independently of the main Kubeflow features.
Luigi是用于一般任務(wù)編排的基于Python的庫(kù),而Kubeflow是專門用于機(jī)器學(xué)習(xí)工作流的基于Kubernetes的工具。 Luigi是為協(xié)調(diào)一般任務(wù)而構(gòu)建的,而Kubeflow具有用于實(shí)驗(yàn)跟蹤,超參數(shù)優(yōu)化和為Jupyter筆記本服務(wù)的預(yù)構(gòu)建模式。 Kubeflow由兩個(gè)不同的組件組成:Kubeflow和Kubeflow管道。 后者專注于模型部署和CI / CD,并且可以獨(dú)立于主要Kubeflow功能使用。
Use Luigi if you need to orchestrate a variety of different tasks, from data cleaning through model deployment. Use Kubeflow if you already use Kubernetes and want to orchestrate common machine learning tasks such as experiment tracking and model training.
如果需要安排從數(shù)據(jù)清理到模型部署的各種不同任務(wù),請(qǐng)使用Luigi。 如果您已經(jīng)使用Kubernetes并希望安排常見的機(jī)器學(xué)習(xí)任務(wù)(例如實(shí)驗(yàn)跟蹤和模型訓(xùn)練),請(qǐng)使用Kubeflow。
路易吉vs MLFlow (Luigi vs. MLFlow)
Luigi is a general task orchestration system, while MLFlow is a more specialized tool to help manage and track your machine learning lifecycle and experiments. You can use Luigi to define general tasks and dependencies (such as training and deploying a model), but you can import MLFlow directly into your machine learning code and use its helper function to log information (such as the parameters you’re using) and artifacts (such as the trained models). You can also use MLFlow as a command-line tool to serve models built with common tools (such as scikit-learn) or deploy them to common platforms (such as AzureML or Amazon SageMaker).
Luigi是一個(gè)通用的任務(wù)編排系統(tǒng),而MLFlow是一個(gè)更專業(yè)的工具,可以幫助管理和跟蹤您的機(jī)器學(xué)習(xí)生命周期和實(shí)驗(yàn)。 您可以使用Luigi定義常規(guī)任務(wù)和依賴項(xiàng)(例如訓(xùn)練和部署模型),但是可以將MLFlow直接導(dǎo)入到機(jī)器學(xué)習(xí)代碼中,并使用其幫助函數(shù)來記錄信息(例如您正在使用的參數(shù)),并且工件(例如訓(xùn)練有素的模型)。 您還可以將MLFlow用作命令行工具,以服務(wù)使用通用工具(例如scikit-learn)構(gòu)建的模型或?qū)⑵洳渴鸬酵ㄓ闷脚_(tái)(例如AzureML或Amazon SageMaker)。
氣流與Argo (Airflow vs. Argo)
Argo and Airflow both allow you to define your tasks as DAGs, but in Airflow you do this with Python, while in Argo you use YAML. Argo runs each task as a Kubernetes pod, while Airflow lives within the Python ecosystem. Canva evaluated both options before settling on Argo, and you can watch this talk to get their detailed comparison and evaluation.
Argo和Airflow都允許您將任務(wù)定義為DAG,但是在Airflow中,您可以使用Python進(jìn)行此操作,而在Argo中,您可以使用YAML。 Argo作為Kubernetes窗格運(yùn)行每個(gè)任務(wù),而Airflow則生活在Python生態(tài)系統(tǒng)中。 在選擇Argo之前,Canva評(píng)估了這兩個(gè)選項(xiàng),您可以觀看此演講以獲取詳細(xì)的比較和評(píng)估 。
Use Airflow if you want a more mature tool and don’t care about Kubernetes. Use Argo if you’re already invested in Kubernetes and want to run a wide variety of tasks written in different stacks.
如果您想要更成熟的工具并且不關(guān)心Kubernetes,請(qǐng)使用Airflow。 如果您已經(jīng)在Kubernetes上進(jìn)行了投資,并且想要運(yùn)行以不同堆棧編寫的各種任務(wù),請(qǐng)使用Argo。
氣流與Kubeflow (Airflow vs. Kubeflow)
Airflow is a generic task orchestration platform, while Kubeflow focuses specifically on machine learning tasks, such as experiment tracking. Both tools allow you to define tasks using Python, but Kubeflow runs tasks on Kubernetes. Kubeflow is split into Kubeflow and Kubeflow Pipelines: the latter component allows you to specify DAGs, but it’s more focused on deployment and model serving than on general tasks.
Airflow是一個(gè)通用的任務(wù)編排平臺(tái),而Kubeflow則特別專注于機(jī)器學(xué)習(xí)任務(wù),例如實(shí)驗(yàn)跟蹤。 兩種工具都允許您使用Python定義任務(wù),但是Kubeflow在Kubernetes上運(yùn)行任務(wù)。 Kubeflow分為Kubeflow和Kubeflow管道:后一個(gè)組件允許您指定DAG,但與常規(guī)任務(wù)相比,它更側(cè)重于部署和模型服務(wù)。
Use Airflow if you need a mature, broad ecosystem that can run a variety of different tasks. Use Kubeflow if you already use Kubernetes and want more out-of-the-box patterns for machine learning solutions.
如果您需要一個(gè)成熟的,廣泛的生態(tài)系統(tǒng)來執(zhí)行各種不同的任務(wù),請(qǐng)使用Airflow。 如果您已經(jīng)使用Kubernetes,并希望使用更多現(xiàn)成的機(jī)器學(xué)習(xí)解決方案模式,請(qǐng)使用Kubeflow。
氣流與MLFlow (Airflow vs. MLFlow)
Airflow is a generic task orchestration platform, while MLFlow is specifically built to optimize the machine learning lifecycle. This means that MLFlow has the functionality to run and track experiments, and to train and deploy machine learning models, while Airflow has a broader range of use cases, and you could use it to run any set of tasks. Airflow is a set of components and plugins for managing and scheduling tasks. MLFlow is a Python library you can import into your existing machine learning code and a command-line tool you can use to train and deploy machine learning models written in scikit-learn to Amazon SageMaker or AzureML.
Airflow是一個(gè)通用的任務(wù)編排平臺(tái),而MLFlow是專門為優(yōu)化機(jī)器學(xué)習(xí)生命周期而構(gòu)建的。 這意味著MLFlow具有運(yùn)行和跟蹤實(shí)驗(yàn)以及訓(xùn)練和部署機(jī)器學(xué)習(xí)模型的功能,而Airflow具有更廣泛的用例,您可以使用它來運(yùn)行任何任務(wù)集。 Airflow是一組用于管理和計(jì)劃任務(wù)的組件和插件。 MLFlow是一個(gè)Python庫(kù),您可以將其導(dǎo)入到現(xiàn)有的機(jī)器學(xué)習(xí)代碼中,并且可以使用命令行工具來將scikit-learn編寫的機(jī)器學(xué)習(xí)模型訓(xùn)練和部署到Amazon SageMaker或AzureML。
Use MLFlow if you want an opinionated, out-of-the-box way of managing your machine learning experiments and deployments. Use Airflow if you have more complicated requirements and want more control over how you manage your machine learning lifecycle.
如果您想以一種開明的,開箱即用的方式來管理機(jī)器學(xué)習(xí)實(shí)驗(yàn)和部署的方法,請(qǐng)使用MLFlow。 如果您有更復(fù)雜的要求并且想要更好地控制如何管理機(jī)器學(xué)習(xí)生命周期,請(qǐng)使用Airflow。
Argo與Kubeflow (Argo vs. Kubeflow)
Parts of Kubeflow (like Kubeflow Pipelines) are built on top of Argo, but Argo is built to orchestrate any task, while Kubeflow focuses on those specific to machine learning — such as experiment tracking, hyperparameter tuning, and model deployment. Kubeflow Pipelines is a separate component of Kubeflow which focuses on model deployment and CI/CD, and can be used independently of Kubeflow’s other features. Both tools rely on Kubernetes and are likely to be more interesting to you if you’ve already adopted that. With Argo, you define your tasks using YAML, while Kubeflow allows you to use a Python interface instead.
Kubeflow的某些部分(例如Kubeflow管道)建立在Argo之上,但是Argo的建立是為了編排任何任務(wù),而Kubeflow則專注于特定于機(jī)器學(xué)習(xí)的任務(wù),例如實(shí)驗(yàn)跟蹤,超參數(shù)調(diào)整和模型部署。 Kubeflow管道是Kubeflow的一個(gè)獨(dú)立組件,專注于模型部署和CI / CD,并且可以獨(dú)立于Kubeflow的其他功能使用。 這兩種工具都依賴Kubernetes,如果您已經(jīng)采用了它,那么可能會(huì)讓您更感興趣。 使用Argo,您可以使用YAML定義任務(wù),而Kubeflow允許您使用Python接口。
Use Argo if you need to manage a DAG of general tasks running as Kubernetes pods. Use Kubeflow if you want a more opinionated tool focused on machine learning solutions.
如果您需要管理作為Kubernetes Pod運(yùn)行的常規(guī)任務(wù)的DAG,請(qǐng)使用Argo。 如果您想要更專注于機(jī)器學(xué)習(xí)解決方案的工具,請(qǐng)使用Kubeflow。
Argo與MLFlow (Argo vs. MLFlow)
Argo is a task orchestration tool that allows you to define your tasks as Kubernetes pods and run them as a DAG, defined with YAML. MLFlow is a more specialized tool that doesn’t allow you to define arbitrary tasks or the dependencies between them. Instead, you can import MLFlow into your existing (Python) machine learning code base as a Python library and use its helper functions to log artifacts and parameters to help with analysis and experiment tracking. You can also use MLFlow’s command-line tool to train scikit-learn models and deploy them to Amazon Sagemaker or Azure ML, as well as to manage your Jupyter notebooks.
Argo是一個(gè)任務(wù)編排工具,可讓您將任務(wù)定義為Kubernetes Pod,并將其作為DAG運(yùn)行(使用YAML定義)。 MLFlow是一種更加專業(yè)的工具,它不允許您定義任意任務(wù)或它們之間的依賴關(guān)系。 相反,您可以將MLFlow作為Python庫(kù)導(dǎo)入到現(xiàn)有的(Python)機(jī)器學(xué)習(xí)代碼庫(kù)中,并使用其助手功能記錄工件和參數(shù),以幫助進(jìn)行分析和實(shí)驗(yàn)跟蹤。 您還可以使用MLFlow的命令行工具來訓(xùn)練scikit學(xué)習(xí)模型,并將其部署到Amazon Sagemaker或Azure ML,以及管理Jupyter筆記本。
Use Argo if you need to manage generic tasks and want to run them on Kubernetes. Use MLFlow if you want an opinionated way to manage your machine learning lifecycle with managed cloud platforms.
如果您需要管理常規(guī)任務(wù)并想在Kubernetes上運(yùn)行它們,請(qǐng)使用Argo。 如果您想以一種明智的方式使用托管云平臺(tái)來管理機(jī)器學(xué)習(xí)生命周期,請(qǐng)使用MLFlow。
Kubeflow與MLFlow (Kubeflow vs. MLFlow)
Kubeflow and MLFlow are both smaller, more specialized tools than general task orchestration platforms such as Airflow or Luigi. Kubeflow relies on Kubernetes, while MLFlow is a Python library that helps you add experiment tracking to your existing machine learning code. Kubeflow lets you build a full DAG where each step is a Kubernetes pod, but MLFlow has built-in functionality to deploy your scikit-learn models to Amazon Sagemaker or Azure ML.
與諸如Airflow或Luigi之類的通用任務(wù)編排平臺(tái)相比,Kubeflow和MLFlow都是更小,更專業(yè)的工具。 Kubeflow依賴Kubernetes,而MLFlow是一個(gè)Python庫(kù),可幫助您將實(shí)驗(yàn)跟蹤添加到現(xiàn)有的機(jī)器學(xué)習(xí)代碼中。 Kubeflow允許您構(gòu)建完整的DAG,其中每個(gè)步驟都是一個(gè)Kubernetes窗格,但是MLFlow具有內(nèi)置功能,可以將scikit學(xué)習(xí)模型部署到Amazon Sagemaker或Azure ML。
Use Kubeflow if you want to track your machine learning experiments and deploy your solutions in a more customized way, backed by Kubernetes. Use MLFlow if you want a simpler approach to experiment tracking and want to deploy to managed platforms such as Amazon Sagemaker.
如果您想跟蹤機(jī)器學(xué)習(xí)實(shí)驗(yàn)并以Kubernetes為后盾以更自定義的方式部署解決方案,請(qǐng)使用Kubeflow。 如果您想要一種更簡(jiǎn)單的方法來進(jìn)行實(shí)驗(yàn)跟蹤,并希望將其部署到托管平臺(tái)(例如Amazon Sagemaker),請(qǐng)使用MLFlow。
沒有銀彈 (No silver bullet)
While all of these tools have different focus points and different strengths, no tool is going to give you a headache-free process straight out of the box. Before sweating over which tool to choose, it’s usually important to ensure you have good processes, including a good team culture, blame-free retrospectives, and long-term goals. If you’re struggling with any machine learning problems, get in touch. We love talking shop, and you can schedule a free call with our CEO.
盡管所有這些工具都有不同的重點(diǎn)和優(yōu)勢(shì),但是沒有任何一種工具可以使您立即擺脫頭痛的困擾。 在努力選擇哪種工具之前,通常重要的是要確保您擁有良好的流程,包括良好的團(tuán)隊(duì)文化,無可指責(zé)的回顧和長(zhǎng)期目標(biāo)。 如果您遇到任何機(jī)器學(xué)習(xí)問題,請(qǐng)與我們聯(lián)系。 我們喜歡談?wù)撋痰?#xff0c;您可以安排與我們首席執(zhí)行官的免費(fèi)電話 。
翻譯自: https://towardsdatascience.com/airflow-vs-luigi-vs-argo-vs-mlflow-vs-kubeflow-b3785dd1ed0c
總結(jié)
以上是生活随笔為你收集整理的气流与路易吉,阿戈,MLFlow,KubeFlow的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: netflix_Netflix的Poly
- 下一篇: 顶级数据恢复_顶级R数据科学图书馆