當前位置：首頁 > 运维知识 > windows >内容正文

windows

Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读

發布時間：2025/3/21 windows 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Paper：《Hidden Technical Debt in Machine Learning Systems—機器學習系統中隱藏的技術債》翻譯與解讀

導讀：機器學習系統中，隱藏多少技術債呢？這篇文章以講述DS整個流程為案例，深刻剖析了DS的長期價值，從長期考慮如何避免維護成本的上升。文章還強調了一點，模型本身再整個產品鏈中只占很小的一塊(雖然時核心模塊)。

《Hidden Technical Debt in Machine Learning Systems》翻譯與解讀

Abstract

1 Introduction

2 Complex Models Erode Boundaries

3 Data Dependencies Cost More than Code Dependencies

4 Feedback Loops

5 ML-System Anti-Patterns

6 Configuration Debt

7 Dealing with Changes in the ExternalWorld

9 Conclusions: Measuring Debt and Paying it Off

Acknowledgments

《Hidden Technical Debt in Machine Learning Systems》翻譯與解讀

鏈接	https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
作者	D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips {dsculley,gholt,dgg,edavydov,toddphillips}@google.com Google, Inc. Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Franc?ois Crespo, Dan Dennison {ebner,vchaudhary,mwyoung,jfcrespo,dennison}@google.com
發布時間	NIPS， 2016年

Abstract

Machine learning offers a fantastically powerful toolkit for building useful com-plex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we ?nd it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-speci?c risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, con?guration issues, changes in the external world, and a variety of system-level anti-patterns.

機器學習為快速構建有用的復雜預測系統提供了一個非常強大的工具包。本篇論文認為，將這些快速的勝利視為免費的，是很危險的。利用技術債的軟件工程框架，我們發現在真實世界的ML系統中，經常會產生大量的持續維護成本。我們探討了在系統設計中考慮的幾個ML特定的風險因素。這些問題包括邊界侵蝕、糾纏、隱藏的反饋循環、未聲明的訪問者、數據依賴、配置問題、外部世界的更改以及各種系統層面的反模式。

1 Introduction

As the machine learning (ML) community continues to accumulate years of experience with live systems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML sys-tems is relatively fast and cheap, but maintaining them over time is dif?cult and expensive.

This dichotomy can be understood through the lens of technical debt, a metaphor introduced by Ward Cunningham in 1992 to help reason about the long term costs incurred by moving quickly in software engineering. As with ?scal debt, there are often sound strategic reasons to take on technical debt. Not all debt is bad, but all debt needs to be serviced. Technical debt may be paid down by refactoring code, improving unit tests, deleting dead code, reducing dependencies, tightening APIs, and improving documentation [8]. The goal is not to add new functionality, but to enable future improvements, reduce errors, and improve maintainability. Deferring such payments results in compounding costs. Hidden debt is dangerous because it compounds silently.

隨著機器學習(ML)社區在使用實時系統方面持續積累了多年的經驗，出現了一種普遍且令人不安的趨勢：開發和部署ML系統相對快速和廉價，但隨著時間的推移，維護它們既困難又昂貴。

這種二分法可以從技術債的角度來理解，技術債是Ward Cunningham在1992年提出的一個比喻，用來解釋軟件工程快速發展所產生的長期成本。與財政債務一樣，承擔技術債務通常有合理的戰略原因。并非所有的債務都是壞的，但所有的債務都需要償還。技術債務可以通過重構代碼、改進單元測試、刪除無用代碼、減少依賴、收緊API和改進文檔[8]來償還。目標不是添加新功能，而是支持未來的改進、減少錯誤和提高可維護性。推遲支付會導致復利成本。隱性債務之所以危險，是因為它悄無聲息地復利。

In this paper, we argue that ML systems have a special capacity for incurring technical debt, because they have all of the maintenance problems of traditional code plus an additional set of ML-speci?c issues. This debt may be dif?cult to detect because it exists at the system level rather than the code level. Traditional abstractions and boundaries may be subtly corrupted or invalidated by the fact that data in?uences ML system behavior. Typical methods for paying down code level technical debt are not suf?cient to address ML-speci?c technical debt at the system level.

This paper does not offer novel ML algorithms, but instead seeks to increase the community’s aware-ness of the dif?cult tradeoffs that must be considered in practice over the long term. We focus on system-level interactions and interfaces as an area where ML technical debt may rapidly accumulate. At a system-level, an ML model may silently erode abstraction boundaries. The tempting re-use or chaining of input signals may unintentionally couple otherwise disjoint systems. ML packages may be treated as black boxes, resulting in large masses of “glue code” or calibration layers that can lock in assumptions. Changes in the external world may in?uence system behavior in unintended ways. Even monitoring ML system behavior may prove dif?cult without careful design.

在本文中，我們認為ML系統具有招致技術債務的特殊能力，因為它們具有傳統代碼的所有維護問題以及一組額外的ML特定問題。這種債務可能很難檢測，因為它存在于系統級別而不是代碼級別。由于數據影響ML系統行為，傳統的抽象和邊界可能會被微妙地破壞或失效。降低代碼級技術債務的典型方法不足以解決系統級特定于ML的技術債務。

本文沒有提供新的ML算法，而是試圖提高社區對長期實踐中必須考慮的困難權衡的認識。我們關注系統級的交互和接口，這是ML技術債可能迅速積累的領域。在系統級別，ML模型可能會悄悄地侵蝕抽象邊界。誘人的重復使用或鏈接輸入信號可能無意中耦合其他原本不相連的系統。ML包可能被視為黑盒，導致大量的“粘合代碼”或校準層，這些層可能鎖定在假設中。外部世界的變化可能會以意想不到的方式影響系統行為。如果沒有精細的設計，甚至監視ML系統的行為也可能被證明是困難的。

2 Complex Models Erode Boundaries

Traditional software engineering practice has shown that strong abstraction boundaries using en-capsulation and modular design help create maintainable code in which it is easy to make isolated changes and improvements. Strict abstraction boundaries help express the invariants and logical consistency of the information inputs and outputs from an given component [8].

Unfortunately, it is dif?cult to enforce strict abstraction boundaries for machine learning systems by prescribing speci?c intended behavior. Indeed, ML is required in exactly those cases when the desired behavior cannot be effectively expressed in software logic without dependency on external data. The real world does not ?t into tidy encapsulation. Here we examine several ways that the resulting erosion of boundaries may signi?cantly increase technical debt in ML systems.

傳統的軟件工程實踐表明，使用封裝和模塊化設計的強抽象邊界有助于創建可維護的代碼，在這些代碼中可以很容易地進行獨立的更改和改進。嚴格的抽象邊界有助于表達來自給定組件[8]的信息輸入和輸出的不變性和邏輯一致性。

不幸的是，很難通過規定特定的預期行為來對機器學習系統實施嚴格的抽象邊界。實際上，當不依賴于外部數據而無法用軟件邏輯有效地表達所需的行為時，ML正是需要的。現實世界并不適合整潔的封裝。在這里，我們研究了幾種導致邊界侵蝕的方法，這些方法可能會顯著增加ML系統中的技術債務。

Entanglement. Machine learning systems mix signals together, entangling them and making iso-lation of improvements impossible. For instance, consider a system that uses features x1, ...xn in a model. If we change the input distribution of values in x1, the importance, weights, or use of the remaining n ? 1 features may all change. This is true whether the model is retrained fully in a batch style or allowed to adapt in an online fashion. Adding a new feature xn+1 can cause similar changes, as can removing any feature xj . No inputs are ever really independent. We refer to this here as the CACE principle: Changing Anything Changes Everything. CACE applies not only to input signals, but also to hyper-parameters, learning settings, sampling methods, convergence thresholds, data selection, and essentially every other possible tweak.

One possible mitigation strategy is to isolate models and serve ensembles. This approach is useful in situations in which sub-problems decompose naturally such as in disjoint multi-class settings like [14]. However, in many cases ensembles work well because the errors in the component models are uncorrelated. Relying on the combination creates a strong entanglement: improving an individual component model may actually make the system accuracy worse if the remaining errors are more strongly correlated with the other components.

A second possible strategy is to focus on detecting changes in prediction behavior as they occur. One such method was proposed in [12], in which a high-dimensional visualization tool was used to allow researchers to quickly see effects across many dimensions and slicings. Metrics that operate on a slice-by-slice basis may also be extremely useful.

糾纏。機器學習系統將信號混合在一起，將它們糾纏在一起，使得孤立的改進變得不可能。例如，考慮一個在模型中使用特征 x1, ...xn 的系統。如果我們改變 x1 中值的輸入分布，其余 n-1 個特征的重要性、權重或使用都可能發生變化。無論模型以批處理方式完全重新訓練，還是允許以在線方式進行調整，都是如此。添加新特征 xn+1 會導致類似的變化，刪除任何特征 xj 也可能會導致類似的變化。沒有任何輸入是真正獨立的。我們在此將其稱為 CACE 原則：改改變任何事物都會改變一切。 CACE 不僅適用于輸入信號，還適用于超參數、學習設置、采樣方法、收斂閾值、數據選擇，以及基本上所有其他可能的調整。

一種可能的緩解戰略是隔離模型并為集成服務。這種方法在子問題自然分解的情況下非常有用，例如在不相交的多類設置中，如[14]。然而，在許多情況下，集成效果很好，因為組件模型中的錯誤是不相關的。依賴于這種組合會產生一種強糾纏:如果剩余的誤差與其他組件的相關性更強，那么改進單個組件模型實際上可能會使系統的精度更差。

第二種可能的策略是，專注于檢測預測行為發生的變化。其中一種方法是在[12]中提出的，在該方法中，使用了高維可視化工具，以便研究人員快速地看到跨多維和切片的效果。在逐個切片的基礎上運行的指標也可能非常有用。

Correction Cascades.?There are often situations in which model ma for problem A exists, but a solution for a slightly different problem A′ is required. In this case, it can be tempting to learn a model m′a that takes ma as input and learns a small correction as a fast way to solve the problem.

However, this correction model has created a new system dependency on ma, making it signi?cantly more expensive to analyze improvements to that model in the future. The cost increases when correction models are cascaded, with a model for problem A′′ learned on top of m′a, and so on, for several slightly different test distributions. Once in place, a correction cascade can create an improvement deadlock, as improving the accuracy of any individual component actually leads to system-level detriments. Mitigation strategies are to augment ma to learn the corrections directly within the same model by adding features to distinguish among the cases, or to accept the cost of creating a separate model for A′.

校正級聯。經常存在問題A的模型ma存在的情況，但需要一個稍微不同的問題A'的解決方案。在這種情況下，學習以ma為輸入的模型m 'a，并學習一個小的修正作為快速解決問題的方法是很有誘惑力的。

然而，這種修正模型已經對ma產生了新的系統依賴，使得分析未來對該模型的改進變得更加昂貴。當校正模型級聯時，成本會增加，在m 'a的基礎上學習問題A的模型，依此類推，對于幾個略有不同的測試分布。糾正級聯一旦就位，就會造成改進僵局，因為提高任何單個組件的準確性實際上會導致系統級的損害。緩解策略是通過添加特征來區分案例來增加 ma 以直接在同一模型內學習更正，或者接受為 A' 創建單獨模型的成本。

Undeclared Consumers.?Oftentimes, a prediction from a machine learning model ma is made widely accessible, either at runtime or by writing to ?les or logs that may later be consumed by other systems. Without access controls, some of these consumers may be undeclared, silently using the output of a given model as an input to another system. In more classical software engineering, these issues are referred to as visibility debt [13].

Undeclared consumers are expensive at best and dangerous at worst, because they create a hidden tight coupling of model ma to other parts of the stack. Changes to ma will very likely impact these other parts, potentially in ways that are unintended, poorly understood, and detrimental. In practice, this tight coupling can radically increase the cost and dif?culty of making any changes to ma at all, even if they are improvements. Furthermore, undeclared consumers may create hidden feedback loops, which are described more in detail in section 4.

Undeclared consumers may be dif?cult to detect unless the system is speci?cally designed to guard against this case, for example with access restrictions or strict service-level agreements (SLAs). In the absence of barriers, engineers will naturally use the most convenient signal at hand, especially when working against deadline pressures.

未申報的訪問者。通常情況下，機器學習模型ma的預測可以被廣泛訪問，無論是在運行時還是通過寫入文件或日志，這些文件或日志稍后可能會被其他系統使用。在沒有訪問控制的情況下，這些訪問者中的一些可能是未聲明的，默默地使用給定模型的輸出作為另一個系統的輸入。在更經典的軟件工程中，這些問題被稱為可見性債務[13]。

未聲明的訪問者在最好的情況下是昂貴的，在最壞的情況下是危險的，因為它們創建了模型ma與堆棧的其他部分的一個隱藏的緊密耦合。ma 的變化很可能會影響這些其他部分，可能會以意想不到的、知之甚少和有害的方式影響。在實踐中，這種緊密耦合會從根本上增加對ma進行任何更改的成本和難度，即使這些更改是改進。此外，未聲明的訪問者可能會創建隱藏的反饋循環，這將在第4節中詳細描述。

除非系統專門設計來防范這種情況，未聲明的訪問者可能很難檢測到，例如使用訪問限制或嚴格的服務水平協議(SLA)。在沒有障礙的情況下，工程師自然會使用手頭最方便的信號，尤其是在面臨截止日期壓力的情況下。

3 Data Dependencies Cost More than Code Dependencies

In [13], dependency debt is noted as a key contributor to code complexity and technical debt in classical software engineering settings. We have found that data dependencies in ML systems carry a similar capacity for building debt, but may be more dif?cult to detect. Code dependencies can be identi?ed via static analysis by compilers and linkers. Without similar tooling for data dependencies, it can be inappropriately easy to build large data dependency chains that can be dif?cult to untangle.

在[13]中，依賴債被認為是經典軟件工程設置中導致代碼復雜性和技術債的關鍵因素。我們發現ML系統中的數據依賴具有類似的構建債務的能力，但可能更難檢測。代碼依賴可以通過編譯器和鏈接器的靜態分析來識別。如果沒有類似的數據依賴關系工具，那么構建難以理清的大型數據依賴關系鏈就會變得不太容易。

Unstable Data Dependencies. To move quickly, it is often convenient to consume signals as input features that are produced by other systems. However, some input signals are unstable, meaning that they qualitatively or quantitatively change behavior over time. This can happen implicitly, when the input signal comes from another machine learning model itself that updates over time, or a data-dependent lookup table, such as for computing TF/IDF scores or semantic mappings. It can also happen explicitly, when the engineering ownership of the input signal is separate from the engineering ownership of the model that consumes it. In such cases, updates to the input signal may be made at any time. This is dangerous because even “improvements” to input signals may have arbitrary detrimental effects in the consuming system that are costly to diagnose and address. For example, consider the case in which an input signal was previously mis-calibrated. The model consuming it likely ?t to these mis-calibrations, and a silent update that corrects the signal will have sudden rami?cations for the model.

One common mitigation strategy for unstable data dependencies is to create a versioned copy of a given signal. For example, rather than allowing a semantic mapping of words to topic clusters to change over time, it might be reasonable to create a frozen version of this mapping and use it until such a time as an updated version has been fully vetted. Versioning carries its own costs, however, such as potential staleness and the cost to maintain multiple versions of the same signal over time.

不穩定數據依賴關系。為了快速移動，通?？梢苑奖愕貙⑿盘栕鳛槠渌到y產生的輸入特征使用。然而，一些輸入信號是不穩定的，這意味著它們會隨著時間的推移而定性或定量地改變行為。當輸入信號來自另一個機器學習模型本身(該模型會隨著時間的推移而更新)或依賴于數據的查找表(如TF/IDF分數計算或語義映射)時，這種情況可能會隱式發生。當輸入信號的工程所有權與使用信號的模型的工程所有權分離時，這種情況也可以顯式發生。在這種情況下，可以隨時更新輸入信號。這是危險的，因為即使是對輸入信號的“改進”也可能對消費系統產生任意的不利影響，而這些影響的診斷和解決成本很高。例如，考慮輸入信號之前被錯誤校準的情況。使用它的模型很可能符合這些錯誤的校準，而一個無聲的修正信號的更新將對模型產生突然的影響。

對于不穩定的數據依賴關系，一種常見的緩解策略是創建給定信號的版本控制副本。例如，與其允許詞匯到主題集群的語義映射隨著時間的推移而改變，不如創建此映射的凍結版本，并在完全審查更新版本之前使用它。然而，版本控制也有它自己的成本，比如潛在的過時性，以及隨著時間的推移維護同一個信號的多個版本的成本。

Underutilized Data Dependencies.?In code, underutilized dependencies are packages that are mostly unneeded [13]. Similarly, underutilized data dependencies are input signals that provide little incremental modeling bene?t. These can make an ML system unnecessarily vulnerable to change, sometimes catastrophically so, even though they could be removed with no detriment.

As an example, suppose that to ease the transition from an old product numbering scheme to new product numbers, both schemes are left in the system as features. New products get only a new number, but old products may have both and the model continues to rely on the old numbers for some products. A year later, the code that stops populating the database with the old numbers is deleted. This will not be a good day for the maintainers of the ML system.

Underutilized data dependencies can creep into a model in several ways.

(1)、Legacy Features. The most common case is that a feature F is included in a model early in its development. Over time, F is made redundant by new features but this goes undetected.

(2)、Bundled Features. Sometimes, a group of features is evaluated and found to be bene?cial. Because of deadline pressures or similar effects, all the features in the bundle are added to the model together, possibly including features that add little or no value.

(3)、?-Features. As machine learning researchers, it is tempting to improve model accuracy even when the accuracy gain is very small or when the complexity overhead might be high.

(4)、Correlated Features. Often two features are strongly correlated, but one is more directly causal. Many ML methods have dif?culty detecting this and credit the two features equally, or may even pick the non-causal one. This results in brittleness if world behavior later changes the correlations.

Underutilized dependencies can be detected via exhaustive leave-one-feature-out evaluations. These should be run regularly to identify and remove unnecessary features.

未充分利用數據依賴關系。在代碼中，未被充分利用的依賴項是大多數不需要的包[13]。類似地，未充分利用的數據依賴性是提供很少增量建模收益的輸入信號。這些可能會使ML系統不必要地容易受到更改的影響，有時甚至是災難性的，即使它們可以被刪除而不會造成損害。

例如，假設為了簡化從舊產品編號方案到新產品編號的轉換，兩個方案都作為特性保留在系統中。新產品只有一個新編號，但舊產品可能兩者都有，并且對于某些產品，模型繼續依賴于舊編號。一年后，停止用舊號碼填充數據庫的代碼被刪除。對于ML系統的維護者來說，這將不是一個好日子。

未充分利用的數據依賴關系可以通過多種方式滲透到模型中。

(1)、遺留特性。最常見的情況是，特性F在開發的早期就包含在模型中。隨著時間的推移，新特征使 F 變得多余，但這沒有被發現。

(2)、捆綁功能。有時，一組特征被評估并發現是有益的。由于截止日期的壓力或類似的影響，捆綁包中的所有特性都被添加到模型中，可能包括增加很少或沒有價值的特性。

(3)、?特性。作為機器學習研究人員，即使在精度增益非常小或復雜性開銷可能很高的情況下，提高模型精度提高模型精度也是誘人的。

(4)、關聯特征。通常兩個特征是緊密相關的，但其中一個具有更直接的因果關系。許多ML方法很難檢測到這一點，并且對這兩個特征同等重視，甚至可能會選擇非因果特征。如果世界行為后來改變了相關性，這會導致脆弱性。

可以通過詳盡的留一功能評估來檢測未充分利用的依賴項。這些應該定期運行以識別和刪除不必要的功能。

Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.

圖1:只有一小部分真實世界的ML系統由ML代碼組成，如中間的小黑盒子所示。所需的周邊基礎設施龐大而復雜。

Static Analysis of Data Dependencies. In traditional code, compilers and build systems perform static analysis of dependency graphs. Tools for static analysis of data dependencies are far less common, but are essential for error checking, tracking down consumers, and enforcing migration and updates. One such tool is the automated feature management system described in [12], which enables data sources and features to be annotated. Automated checks can then be run to ensure that all dependencies have the appropriate annotations, and dependency trees can be fully resolved. This kind of tooling can make migration and deletion much safer in practice.

數據依賴的靜態分析。在傳統的代碼中，編譯器和構建系統執行依賴圖的靜態分析。用于數據依賴性靜態分析的工具并不常見，但對于錯誤檢查、追蹤訪問者以及強制遷移和更新至關重要。其中一個工具是[12]中描述的自動化特性管理系統，它支持對數據源和特性進行注釋。然后可以運行自動檢查，以確保所有依賴項都有適當的注釋，并且可以完全解析依賴項樹。這種工具可以使遷移和刪除在實踐中更加安全。

4 Feedback Loops

One of the key features of live ML systems is that they often end up in?uencing their own behavior if they update over time. This leads to a form of analysis debt, in which it is dif?cult to predict the behavior of a given model before it is released. These feedback loops can take different forms, but they are all more dif?cult to detect and address if they occur gradually over time, as may be the case when models are updated infrequently.

實時ML系統的關鍵特征之一是，如果它們隨著時間的推移進行更新，它們通常最終會影響自己的行為。這導致了一種形式的分析債務，其中很難在發布之前預測給定模型的行為。這些反饋循環可以采取不同的形式，但如果它們隨著時間的推移逐漸發生，則它們都更難以檢測和解決，例如模型更新不頻繁時的情況。

Direct Feedback Loops. A model may directly in?uence the selection of its own future training data. It is common practice to use standard supervised algorithms, although the theoretically correct solution would be to use bandit algorithms. The problem here is that bandit algorithms (such as contextual bandits [9]) do not necessarily scale well to the size of action spaces typically required for real-world problems. It is possible to mitigate these effects by using some amount of randomization [3], or by isolating certain parts of data from being in?uenced by a given model.

直接的反饋循環。一個模型可能直接影響它自己未來訓練數據的選擇。通常的做法是使用標準的監督算法，盡管理論上正確的解決方案是使用老虎機算法。這里的問題是，老虎機算法(如上下文老虎機[9])不一定能很好地適應現實世界問題通常需要的行動空間大小?？梢酝ㄟ^使用一定數量的隨機化[3]，或通過隔離特定部分的數據不受給定模型的影響來減輕這些影響。

Hidden Feedback Loops. Direct feedback loops are costly to analyze, but at least they pose a statistical challenge that ML researchers may ?nd natural to investigate [3]. A more dif?cult case is hidden feedback loops, in which two systems in?uence each other indirectly through the world.

One example of this may be if two systems independently determine facets of a web page, such as one selecting products to show and another selecting related reviews. Improving one system may lead to changes in behavior in the other, as users begin clicking more or less on the other components in reaction to the changes. Note that these hidden loops may exist between completely disjoint systems. Consider the case of two stock-market prediction models from two different investment companies. Improvements (or, more scarily, bugs) in one may in?uence the bidding and buying behavior of the other.

隱藏的反饋循環。直接反饋循環的分析成本很高，但至少它們提出了一個統計挑戰，ML 研究人員可能會發現研究 [3] 是很自然的。一個更困難的情況是隱藏的反饋回路，其中兩個系統通過世界間接相互影響。

這方面的一個例子可能是兩個系統獨立地確定一個網頁的方面，例如一個選擇要顯示的產品，另一個選擇相關的評論。改進一個系統可能會導致另一個系統的行為發生變化，因為用戶開始或多或少地點擊其他組件以響應這些變化。注意，這些隱藏循環可能存在于完全不相連的系統之間。考慮一下來自兩家不同投資公司的兩種股市預測模型。其中一個的改進(或者更可怕的bug)可能會影響另一個的投標和購買行為。

5 ML-System Anti-Patterns

It may be surprising to the academic community to know that only a tiny fraction of the code in many ML systems is actually devoted to learning or prediction – see Figure 1. In the language of Lin and Ryaboy, much of the remainder may be described as “plumbing” [11].

It is unfortunately common for systems that incorporate machine learning methods to end up with high-debt design patterns. In this section, we examine several system-design anti-patterns [4] that can surface in machine learning systems and which should be avoided or refactored where possible.

學術界可能會驚訝地發現，在許多ML系統中，只有一小部分代碼實際上用于學習或預測——參見圖1。在 Lin 和 Ryaboy 的語言中，其余大部分可以描述為“管道”[11]。

不幸的是，對于采用機器學習方法的系統來說，最終會出現高負債的設計模式。在本節中，我們將研究機器學習系統中可能出現的幾個系統設計反模式[4]，這些反模式應該在可能的情況下避免或重構。

Glue Code. ML?researchers tend to develop general purpose solutions as self-contained packages. A wide variety of these are available as open-source packages at places like mloss.org, or from in-house code, proprietary packages, and cloud-based platforms.

Using generic packages often results in a glue code system design pattern, in which a massive amount of supporting code is written to get data into and out of general-purpose packages. Glue code is costly in the long term because it tends to freeze a system to the peculiarities of a speci?c package; testing alternatives may become prohibitively expensive. In this way, using a generic package can inhibit improvements, because it makes it harder to take advantage of domain-speci?c properties or to tweak the objective function to achieve a domain-speci?c goal. Because a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code, it may be less costly to create a clean native solution rather than re-use a generic package.

An important strategy for combating glue-code is to wrap black-box packages into common API’s. This allows supporting infrastructure to be more reusable and reduces the cost of changing packages.

膠水代碼。ML研究人員傾向于將通用解決方案開發為自包含的包。在mloss.org這樣的網站上可以找到各種各樣的開源包，也可以從內部代碼、專有包和基于云的平臺上獲得。

使用通用包通常會導致膠水代碼系統設計模式，其中編寫了大量支持代碼以將數據傳入和傳出通用包。從長遠來看，粘合代碼的成本很高，因為它傾向于將系統凍結在特定包的特性上;測試替代品可能變得非常昂貴。以這種方式，使用通用包會抑制改進，因為它使得利用特定領域的屬性或調整目標函數來實現特定領域的目標變得更加困難。因為一個成熟的系統最終可能是（最多）5% 的機器學習代碼和（至少）95% 的膠水代碼，所以創建一個干凈的本地解決方案可能比重用一個通用的包成本更低。

對付粘接代碼的一個重要策略是將黑盒包封裝到通用API中。這使得支持的基礎設施更加可重用，并降低了更改包的成本。

Pipeline Jungles. As a special case of glue code, pipeline jungles often appear in data prepara-tion. These can evolve organically, as new signals are identi?ed and new information sources added incrementally. Without care, the resulting system for preparing data in an ML-friendly format may become a jungle of scrapes, joins, and sampling steps, often with intermediate ?les output. Man-aging these pipelines, detecting errors and recovering from failures are all dif?cult and costly [1]. Testing such pipelines often requires expensive end-to-end integration tests. All of this adds to technical debt of a system and makes further innovation more costly.

Pipeline jungles can only be avoided by thinking holistically about data collection and feature ex-traction. The clean-slate approach of scrapping a pipeline jungle and redesigning from the ground up is indeed a major investment of engineering effort, but one that can dramatically reduce ongoing costs and speed further innovation.

管道叢林。作為膠水代碼的一種特例，管道叢林常常出現在數據準備中。隨著新信號的識別和新信息來源的逐漸增加，這些可以有機地發展。如果不小心，以 ML 友好格式準備數據的最終系統可能會變成一堆刮擦、連接和采樣步驟，通常帶有中間文件輸出。管理這些管道、檢測錯誤和從故障中恢復都是非常困難和昂貴的。測試這類管道通常需要昂貴的端到端集成測試。所有這些都增加了系統的技術債務，并使進一步創新的成本更高。

只有從整體上考慮數據收集和特征提取，才能避免管道叢林。這是一種全新的方法，即拆除叢林般的管道，從頭開始重新設計，這確實是一項重大的工程投資，但它可以顯著降低持續成本，加速進一步的創新。

Glue code and pipeline jungles are symptomatic of integration issues that may have a root cause in overly separated “research” and “engineering” roles. When ML packages are developed in an ivory-tower setting, the result may appear like black boxes to the teams that employ them in practice. A hybrid research approach where engineers and researchers are embedded together on the same teams (and indeed, are often the same people) can help reduce this source of friction signi?cantly [16].

膠水代碼和管道叢林是集成問題的癥狀，這些問題的根本原因可能是過度分離的“研究”和“工程”角色。當ML包在象牙塔環境中開發時，對于實際使用它們的團隊來說，結果可能看起來像黑盒。一種混合的研究方法，即工程師和研究人員被嵌入到同一個團隊中(實際上，通常是同一個人)，可以大大減少這種摩擦的來源。

Dead Experimental Codepaths. A common consequence of glue code or pipeline jungles is that it becomes increasingly attractive in the short term to perform experiments with alternative methods by implementing experimental codepaths as conditional branches within the main production code. For any individual change, the cost of experimenting in this manner is relatively low—none of the surrounding infrastructure needs to be reworked. However, over time, these accumulated codepaths can create a growing debt due to the increasing dif?culties of maintaining backward compatibility and an exponential increase in cyclomatic complexity. Testing all possible interactions between codepaths becomes dif?cult or impossible. A famous example of the dangers here was Knight Capital’s system losing $465 million in 45 minutes, apparently because of unexpected behavior from obsolete experimental codepaths [15].

As with the case of dead ?ags in traditional software [13], it is often bene?cial to periodically re-examine each experimental branch to see what can be ripped out. Often only a small subset of the possible branches is actually used; many others may have been tested once and abandoned.

死實驗代碼路徑。粘合代碼或管道叢林的一個常見后果是，通過在主要生產代碼中實現實驗代碼路徑作為條件分支，在短期內使用替代方法進行實驗變得越來越有吸引力。對于任何單獨的更改，以這種方式進行試驗的成本相對較低——周圍的基礎設施都不需要重新設計。然而，隨著時間的推移，由于保持向后兼容性的難度越來越大，圈復雜度呈指數級增長，這些累積的代碼路徑可能會導致債務不斷增加。測試代碼路徑之間所有可能的交互變得困難或不可能。關于這種危險的一個著名例子是，騎士資本(Knight Capital)的系統在45分鐘內損失了4.65億美元，顯然是因為過時的實驗代碼路徑[15]的意外行為。

與傳統軟件[13]中的死標志的情況一樣，定期重新檢查每個實驗分支以查看可以刪除什么內容通常是有益的。通常只有可能分支的一小部分被實際使用;許多其他的可能已經被測試過一次并被拋棄了。

Abstraction Debt.?The above issues highlight the fact that there is a distinct lack of strong ab-stractions to support ML systems. Zheng recently made a compelling comparison of the state ML abstractions to the state of database technology [17], making the point that nothing in the machine learning literature comes close to the success of the relational database as a basic abstraction. What is the right interface to describe a stream of data, or a model, or a prediction?

For distributed learning in particular, there remains a lack of widely accepted abstractions. It could be argued that the widespread use of Map-Reduce in machine learning was driven by the void of strong distributed learning abstractions. Indeed, one of the few areas of broad agreement in recent years appears to be that Map-Reduce is a poor abstraction for iterative ML algorithms.

The parameter-server abstraction seems much more robust, but there are multiple competing speci-?cations of this basic idea [5, 10]. The lack of standard abstractions makes it all too easy to blur the lines between components.

抽象的債務。上述問題突出了一個事實，即明顯缺乏支持ML系統的強抽象。Zheng最近做了一個令人信服的比較，將ML抽象的狀態與數據庫技術[17]的狀態進行了比較，指出機器學習文獻中沒有任何東西能與關系數據庫作為基本抽象的成功相提并論。描述數據流、模型或預測的正確接口是什么?

特別是對于分布式學習，仍然缺乏被廣泛接受的抽象?？梢赃@樣說，Map-Reduce在機器學習中的廣泛使用是由于缺乏強大的分布式學習抽象。事實上，近年來得到廣泛認同的少數領域之一似乎是Map-Reduce是迭代ML算法的一個糟糕抽象。

參數服務器抽象似乎更健壯，但這個基本思想有多個相互競爭的規范[5,10]。缺乏標準的抽象使得組件之間的界限變得很容易模糊。

Common Smells. In software engineering, a design smell may indicate an underlying problem in a component or system [7]. We identify a few ML system smells, not hard-and-fast rules, but as subjective indicators.

(1)、Plain-Old-Data Type Smell. The rich information used and produced by ML systems is all to often encoded with plain data types like raw ?oats and integers. In a robust system, a model parameter should know if it is a log-odds multiplier or a decision threshold, and a prediction should know various pieces of information about the model that produced it and how it should be consumed.

(2)、Multiple-Language Smell. It is often tempting to write a particular piece of a system in a given language, especially when that language has a convenient library or syntax for the task at hand. However, using multiple languages often increases the cost of effective testing and can increase the dif?culty of transferring ownership to other individuals.

(3)、Prototype Smell. It is convenient to test new ideas in small scale via prototypes. How-ever, regularly relying on a prototyping environment may be an indicator that the full-scale system is brittle, dif?cult to change, or could bene?t from improved abstractions and inter-faces. Maintaining a prototyping environment carries its own cost, and there is a signi?cant danger that time pressures may encourage a prototyping system to be used as a production solution. Additionally, results found at small scale rarely re?ect the reality at full scale.

常見的氣味。在軟件工程中，設計氣味可能表明組件或系統[7]中的潛在問題。我們識別一些ML系統氣味，不是硬性規則，而是作為主觀指標。

(1)、Plain-Old-Data Type Smell。ML系統使用和產生的豐富信息通常都是用普通數據類型(如原始浮點數和整數)編碼的。在穩健的系統中，模型參數應該知道它是log-odds乘數還是決策閾值，而預測應該知道關于生成它的模型的各種信息，以及應該如何使用這些信息。

(2)、多語言氣味。用給定的語言編寫系統的特定部分通常是很誘人的，特別是當該語言具有方便的庫或語法用于手頭的任務時。然而，使用多種語言通常會增加有效測試的成本，并增加將所有權轉移給其他人的難度。

(3)、原型氣味。通過原型在小范圍內測試新想法是很方便的。然而，經常依賴原型環境可能是一個指標，表明完整的系統是脆弱的，難以更改的，或者可以從改進的抽象和接口中受益。維護原型環境有其自身的成本，并且存在一個重要的危險，即時間壓力可能會促使原型系統被用作生產解決方案。此外，在小范圍內發現的結果很少反映全面范圍內的現實情況。

6 Configuration Debt

Another potentially surprising area where debt can accumulate is in the con?guration of machine learning systems. Any large system has a wide range of con?gurable options, including which features are used, how data is selected, a wide variety of algorithm-speci?c learning settings, poten-tial pre- or post-processing, veri?cation methods, etc. We have observed that both researchers and engineers may treat con?guration (and extension of con?guration) as an afterthought. Indeed, veri-?cation or testing of con?gurations may not even be seen as important. In a mature system which is being actively developed, the number of lines of con?guration can far exceed the number of lines of the traditional code. Each con?guration line has a potential for mistakes.

Consider the following examples. Feature A was incorrectly logged from 9/14 to 9/17. Feature B is not available on data before 10/7. The code used to compute feature C has to change for data before and after 11/1 because of changes to the logging format. Feature D is not available in production, so a substitute features D′ and D′′ must be used when querying the model in a live setting. If feature Z is used, then jobs for training must be given extra memory due to lookup tables or they will train inef?ciently. Feature Q precludes the use of feature R because of latency constraints.

另一個可能會累積債務的潛在令人驚訝的領域是機器學習系統的配置。任何大型系統都有廣泛的可配置選項，包括使用哪些特征、如何選擇數據、各種特定算法的學習設置、潛在的預處理或后處理、驗證方法等。我們已經注意到，研究人員和工程師都可能將配置(和配置的擴展)視為事后的想法。事實上，配置的驗證或測試甚至可能不被視為重要。在一個正在積極開發的成熟系統中，配置的行數可以遠遠超過傳統代碼的行數。每個配置行都有出錯的可能。

考慮下面的例子。特征A從9/14錯誤地記錄到9/17。特征B在10/7之前的數據上不可用。由于日志格式的更改，用于計算特征C的代碼必須在11/1之前和之后更改數據。特征D在生產中不可用，因此當在實時設置中查詢模型時，必須使用替代特征D '和D '。如果使用了特征Z，那么由于查找表的原因，必須為訓練工作提供額外的內存，否則訓練效率會很低。由于延遲限制，特征Q排除了特征R的使用。

All this messiness makes con?guration hard to modify correctly, and hard to reason about. How-ever, mistakes in con?guration can be costly, leading to serious loss of time, waste of computing resources, or production issues. This leads us to articulate the following principles of good con?gu-ration systems:

It should be easy to specify a con?guration as a small change from a previous con?guration.

It should be hard to make manual errors, omissions, or oversights.

It should be easy to see, visually, the difference in con?guration between two models.

It should be easy to automatically assert and verify basic facts about the con?guration: number of features used, transitive closure of data dependencies, etc.

It should be possible to detect unused or redundant settings.

Con?gurations should undergo a full code review and be checked into a repository.

所有這些混亂使配置難以正確修改，也難以進行推理。然而，配置中的錯誤可能代價高昂，導致嚴重的時間損失、計算資源的浪費或生產問題。這使我們闡明了良好配置系統的以下原則:

將配置指定為對先前配置的微小更改應該很容易。

應不易出現人工錯誤、遺漏或疏忽。

應該很容易從視覺上看出兩個模型之間的配置差異。

應該很容易自動斷言和驗證關于配置的基本事實：使用的特征數量、數據依賴的傳遞閉包等。

應該能夠檢測到未使用或冗余的設置。

配置應該經過完整的代碼審查，并被簽入存儲庫。

7 Dealing with Changes in the ExternalWorld

One of the things that makes ML systems so fascinating is that they often interact directly with the external world. Experience has shown that the external world is rarely stable. This background rate of change creates ongoing maintenance cost.

Fixed Thresholds in Dynamic Systems. It is often necessary to pick a decision threshold for a given model to perform some action: to predict true or false, to mark an email as spam or not spam, to show or not show a given ad. One classic approach in machine learning is to choose a threshold from a set of possible thresholds, in order to get good tradeoffs on certain metrics, such as precision and recall. However, such thresholds are often manually set. Thus if a model updates on new data, the old manually set threshold may be invalid. Manually updating many thresholds across many models is time-consuming and brittle. One mitigation strategy for this kind of problem appears in [14], in which thresholds are learned via simple evaluation on heldout validation data.

Monitoring and Testing. Unit testing of individual components and end-to-end tests of running systems are valuable, but in the face of a changing world such tests are not suf?cient to provide evidence that a system is working as intended. Comprehensive live monitoring of system behavior in real time combined with automated response is critical for long-term system reliability.

ML系統之所以如此吸引人的原因之一是它們經常與外部世界直接交互。經驗表明，外部世界很少是穩定的。這種背景變更率會產生持續的維護成本。

動態系統中的固定閾值。通常需要為給定的模型選擇一個決策閾值來執行某些操作:預測正確或錯誤，將電子郵件標記為垃圾郵件或非垃圾郵件，顯示或不顯示給定的廣告。機器學習中一個經典的方法是從一組可能的閾值中選擇一個閾值，以便在某些指標(比如精度和召回率)上獲得良好的權衡。然而，這種閾值通常是手工設置的。因此，如果模型更新了新數據，那么手工設置的舊閾值可能無效。跨多個模型手動更新多個閾值既耗時又脆弱。針對這類問題的一種緩解策略出現在[14]中，其中閾值是通過對留存驗證數據的簡單評估來學習的。

監控和測試。單個組件的單元測試和運行系統的端到端測試是有價值的，但是面對一個不斷變化的世界，這樣的測試不足以提供系統按預期工作的證據。對系統行為進行實時全面實時監控并結合自動響應對于系統的長期可靠性至關重要。

The key question is: what to monitor? Testable invariants are not always obvious given that many ML systems are intended to adapt over time. We offer the following starting points.

Prediction Bias. In a system that is working as intended, it should usually be the case that the distribution of predicted labels is equal to the distribution of observed labels. This is by no means a comprehensive test, as it can be met by a null model that simply predicts average values of label occurrences without regard to the input features. However, it is a surprisingly useful diagnostic, and changes in metrics such as this are often indicative of an issue that requires attention. For example, this method can help to detect cases in which the world behavior suddenly changes, making training distributions drawn from historical data no longer re?ective of current reality. Slicing prediction bias by various dimensions isolate issues quickly, and can also be used for automated alerting.

Action Limits. In systems that are used to take actions in the real world, such as bidding on items or marking messages as spam, it can be useful to set and enforce action limits as a sanity check. These limits should be broad enough not to trigger spuriously. If the system hits a limit for a given action, automated alerts should ?re and trigger manual intervention or investigation.

Up-Stream Producers. Data is often fed through to a learning system from various up-stream producers. These up-stream processes should be thoroughly monitored, tested, and routinely meet a service level objective that takes the downstream ML system needs into account. Further any up-stream alerts must be propagated to the control plane of an ML system to ensure its accuracy. Similarly, any failure of the ML system to meet established service level objectives be also propagated down-stream to all consumers, and directly to their control planes if at all possible.

關鍵問題是:監控什么?可測試不變量并不總是明顯的，因為許多ML系統都打算隨著時間的推移而適應。我們提供以下出發點。

預測偏差。在按預期工作的系統中，通常情況下預測標簽的分布等于觀察標簽的分布。這絕不是一個全面的測試，因為它可以由一個空模型來滿足，該模型可以簡單地預測標簽出現的平均值，而不考慮輸入特征。然而，這是一種非常有用的診斷方法，像這樣的度量標準的變化通常表明存在需要注意的問題。例如，這種方法可以幫助檢測世界行為突然改變的情況，使得從歷史數據中提取的訓練分布不再反映當前的現實。按各種維度切片預測偏差可以快速隔離問題，也可用于自動警報。

行動的限制。在用于在現實世界中采取行動的系統中，例如對物品進行競價或將消息標記為垃圾郵件，設置和執行行動限制作為一種健全的檢查可能很有用。這些限制應該足夠寬，不會誤觸發。如果系統達到給定操作的限制，則應觸發自動警報并觸發手動干預或調查。

上游生產商。數據通常是通過來自上游生產商的學習系統。這些上游流程應該被徹底監控、測試，并定期滿足服務水平目標，將下游 ML 系統需求考慮在內。此外，任何上游警報必須傳播到ML系統的控制平面，以確保其準確性。類似地，ML 系統在滿足既定服務級別目標方面的任何失敗也會在下游傳播到所有訪問者，如果可能的話，直接傳播到他們的控制平面。

Because external changes occur in real-time, response must also occur in real-time as well. Relying on human intervention in response to alert pages is one strategy, but can be brittle for time-sensitive issues. Creating systems to that allow automated response without direct human intervention is often well worth the investment.

由于外部變化是實時發生的，因此響應也必須實時發生。依靠人工干預來響應警告頁面是一種策略，但對于時間敏感的問題來說可能很脆弱。創建無需直接人工干預即可自動響應的系統通常非常值得投資。

We now brie?y highlight some additional areas where ML-related technical debt may accrue.

我們現在簡要地強調一些可能產生ML相關技術債務的其他領域。

Data Testing Debt.?If data replaces code in ML systems, and code should be tested, then it seems clear that some amount of testing of input data is critical to a well-functioning system. Basic sanity checks are useful, as more sophisticated tests that monitor changes in input distributions.

Reproducibility Debt. As scientists, it is important that we can re-run experiments and get similar results, but designing real-world systems to allow for strict reproducibility is a task made dif?cult by randomized algorithms, non-determinism inherent in parallel learning, reliance on initial conditions, and interactions with the external world.

Process Management Debt. Most of the use cases described in this paper have talked about the cost of maintaining a single model, but mature systems may have dozens or hundreds of models running simultaneously [14, 6]. This raises a wide range of important problems, including the problem of updating many con?gurations for many similar models safely and automatically, how to manage and assign resources among models with different business priorities, and how to visualize and detect blockages in the ?ow of data in a production pipeline. Developing tooling to aid recovery from production incidents is also critical. An important system-level smell to avoid are common processes with many manual steps.

Cultural Debt. There is sometimes a hard line between ML research and engineering, but this can be counter-productive for long-term system health. It is important to create team cultures that reward deletion of features, reduction of complexity, improvements in reproducibility, stability, and monitoring to the same degree that improvements in accuracy are valued. In our experience, this is most likely to occur within heterogeneous teams with strengths in both ML research and engineering.

數據測試的債務。如果數據取代了 ML 系統中的代碼，并且應該測試代碼，那么很明顯，對輸入數據進行一定量的測試對于運行良好的系統至關重要?；镜慕∪詸z查很有用，因為更復雜的測試可以監控輸入分布的變化。

再現性的債務。作為科學家，我們能夠重新運行實驗并得到相似的結果是很重要的，但是設計真實世界的系統以允許嚴格的重現性是一項困難的任務，因為隨機算法、并行學習中固有的非確定性、對初始條件的依賴以及與外部世界的交互。

流程管理債務。本文中描述的大多數用例都討論了維護單個模型的成本，但是成熟的系統可能同時運行數十或數百個模型[14,6]。這就提出了一個廣泛的重要問題,包括安全自動更新許多相似模型的許多配置的問題，如何在具有不同業務優先級的模型之間管理和分配資源，以及如何可視化和檢測數據流中的阻塞問題。開發工具以幫助從生產事故中恢復也是至關重要的。需要避免的一個重要的系統級異味是具有許多手動步驟的常見流程。

文化上的債務。有時ML研究和工程之間有一條強硬的界線，但這可能對長期的系統健康產生反作用。重要的是要創建這樣的團隊文化，即獎勵刪除特征、減少復雜性、改進可再現性、穩定性和監控，使其達到與改進準確性同等重要的程度。根據我們的經驗，這種情況最有可能發生在ML研究和工程方面都有優勢的異構團隊中。

9 Conclusions: Measuring Debt and Paying it Off

Technical debt is a useful metaphor, but it unfortunately does not provide a strict metric that can be tracked over time. How are we to measure technical debt in a system, or to assess the full cost of this debt? Simply noting that a team is still able to move quickly is not in itself evidence of low debt or good practices, since the full cost of debt becomes apparent only over time. Indeed, moving quickly often introduces technical debt. A few useful questions to consider are:

How easily can an entirely new algorithmic approach be tested at full scale?

What is the transitive closure of all data dependencies?

How precisely can the impact of a new change to the system be measured?

Does improving one model or signal degrade others?

How quickly can new members of the team be brought up to speed?

技術債務是一個有用的隱喻，但不幸的是，它沒有提供可以隨時間跟蹤的嚴格指標。我們如何衡量一個體系中的技術債務，或評估這種債務的全部成本?僅僅注意到一個團隊仍然能夠快速行動本身并不是低債務或良好實踐的證據，因為債務的全部成本只有隨著時間的推移才會變得明顯。事實上，快速行動常常會帶來技術債務。需要考慮的幾個有用的問題是：

全面測試全新的算法方法有多容易？

所有數據依賴的傳遞閉包是什么？

衡量新變化對系統的影響的精確度如何？

改進一種模型或信號是否會降低其他模型或信號？

團隊的新成員能多快上手？

We hope that this paper may serve to encourage additional development in the areas of maintainable ML, including better abstractions, testing methodologies, and design patterns. Perhaps the most important insight to be gained is that technical debt is an issue that engineers and researchers both need to be aware of. Research solutions that provide a tiny accuracy bene?t at the cost of massive increases in system complexity are rarely wise practice. Even the addition of one or two seemingly innocuous data dependencies can slow further progress.

Paying down ML-related technical debt requires a speci?c commitment, which can often only be achieved by a shift in team culture. Recognizing, prioritizing, and rewarding this effort is important for the long term health of successful ML teams.

我們希望本文能夠鼓勵在可維護性ML領域進行更多的開發，包括更好的抽象、測試方法和設計模式。也許要獲得的最重要的見解是技術債務是工程師和研究人員都需要注意的問題。以大量增加系統復雜性為代價提供微小精度優勢的研究解決方案很少是明智的做法。即使添加一兩個看似無害的數據依賴關系也會減緩進一步的進展。

償還與機器學習相關的技術債務需要做出具體的承諾，這通常只能通過團隊文化的轉變來實現。認可、優先考慮和獎勵這項工作對于成功的 ML 團隊的長期健康很重要

Acknowledgments

This paper owes much to the important lessons learned day to day in a culture that values both innovative ML research and strong engineering practice. Many colleagues have helped shape our thoughts here, and the bene?t of accumulated folk wisdom cannot be overstated. We would like to speci?cally recognize the following: Roberto Bayardo, Luis Cobo, Sharat Chikkerur, Jeff Dean, Philip Henderson, Arnar Mar Hrafnkelsson, Ankur Jain, Joe Kovac, Jeremy Kubica, H. Brendan McMahan, Satyaki Mahalanabis, Lan Nie, Michael Pohl, Abdul Salem, Sajid Siddiqi, Ricky Shan, Alan Skelly, Cory Williams, and Andrew Young.

A short version of this paper was presented at the SE4ML workshop in 2014 in Montreal, Canada.

這篇論文很大程度上歸功于在一個重視創新的ML研究和強大的工程實踐的文化中，每天學習到的重要的經驗教訓。許多同事在這里為我們的思想形成做出了貢獻，積累的民間智慧的好處是不容小覷的。我們特別要表揚的是:Roberto Bayardo, Luis Cobo, Sharat Chikkerur, Jeff Dean, Philip Henderson, Arnar Mar Hrafnkelsson, Ankur Jain, Joe Kovac, Jeremy Kubica, H. Brendan McMahan, Satyaki Mahalanabis, Lan Nie, Michael Pohl, Abdul Salem, Sajid Siddiqi, Ricky Shan, Alan Skelly, Cory Williams和Andrew Young。

本文的簡短版本于2014年在加拿大蒙特利爾舉行的SE4ML研討會上發表。

總結

以上是生活随笔為你收集整理的Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： CV之IC：计算机视觉之图像分类(Ima
下一篇： SQLServer：GUI方式、SQL语

windows

Paper：《Hidden Technical Debt in Machine Learning Systems—机器学习系统中隐藏的技术债》翻译与解读

《Hidden Technical Debt in Machine Learning Systems》翻譯與解讀

Abstract

1 Introduction

2 Complex Models Erode Boundaries

3 Data Dependencies Cost More than Code Dependencies

4 Feedback Loops

5 ML-System Anti-Patterns

6 Configuration Debt

7 Dealing with Changes in the ExternalWorld

8 Other Areas of ML-related Debt

9 Conclusions: Measuring Debt and Paying it Off

Acknowledgments

總結