當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Paper：《A Unified Approach to Interpreting Model Predictions—解释模型预测的统一方法》论文解读与翻译

發布時間：2025/3/21 编程问答 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 Paper：《A Unified Approach to Interpreting Model Predictions—解释模型预测的统一方法》论文解读与翻译小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Paper：《A Unified Approach to Interpreting Model ?Predictions—解釋模型預測的統一方法》論文解讀與翻譯

導讀：2017年11月25日，來自華盛頓大學的Scott M. Lundberg和Su-In Lee在《解釋模型預測的統一方法》論文中，提出了SHAP值作為特征重要性的統一度量。SHAP可以為每個特征分配一個特定預測的重要性值。它的意義在于解釋現代機器學習中大多數的黑盒模型，為效果好的ML模型量化各個特征的貢獻度。

《A Unified Approach to Interpreting Model ?Predictions》論文解讀與翻譯

Abstract

1 Introduction

2 Additive Feature Attribution Methods

2.1 LIME

2.2 DeepLIFT ?

2.3 Layer-Wise Relevance Propagation ?

2.4 Classic Shapley Value Estimation

3 Simple Properties Uniquely Determine Additive Feature Attributions ?

4 SHAP (SHapley Additive exPlanation) Values ?

4.1 Model-Agnostic Approximations ?

Kernel SHAP (Linear LIME + Shapley values)

4.2 Model-Specific Approximations ?

5 Computational and User Study Experiments ?

5.1 Computational Efficiency ?

5.2 Consistency with Human Intuition ?

5.3 Explaining Class Differences ?

6 Conclusion

Acknowledgements ?

《A Unified Approach to Interpreting Model ?Predictions》論文解讀與翻譯

論文地址：https://arxiv.org/pdf/1705.07874.pdf

Abstract

??Understanding why a model makes a certain prediction can be as crucial as the ?prediction’s accuracy in many applications. However, the highest accuracy for large ?modern datasets is often achieved by complex models that even experts struggle to ?interpret, such as ensemble or deep learning models, creating a tension between ?accuracy and interpretability. In response, various methods have recently been ?proposed to help users interpret the predictions of complex models, but it is often ?unclear how these methods are related and when one method is preferable over ?another. To address this problem, we present a unified framework for interpreting ?predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature ?an importance value for a particular prediction. Its novel components include: (1) ?the identification of a new class of additive feature importance measures, and (2) ?theoretical results showing there is a unique solution in this class with a set of ?desirable properties. The new class unifies six existing methods, notable because ?several recent methods in the class lack the proposed desirable properties. Based ?on insights from this unification, we present new methods that show improved ?computational performance and/or better consistency with human intuition than ?previous approaches.

在許多應用中，理解一個模型為什么要進行某種預測與預測的準確性同樣重要。然而，現代大型數據集的最高精度往往是通過復雜的模型來實現的，即使是專家也很難解釋，比如集成或深度學習模型，這就造成了準確性和可解釋性之間的緊張關系。因此，最近提出了各種方法來幫助用戶解釋復雜模型的預測，但這些方法之間的關系以及一種方法什么時候比另一種方法更好往往是不清楚的。為了解決這個問題，我們提出了一個統一的框架來解釋預測，SHAP (SHapley Additive explanation)。SHAP為每個特征分配一個特定預測的重要性值。它的新穎之處包括:

(1)確定了一類新的可加性特征重要性測度，

(2)理論結果表明，在這類測度中存在一個具有一組理想性質的唯一解。這個新類統一了6個現有的方法，值得注意的是，這個類中最近出現的幾個方法都缺乏所建議的所需屬性。

基于這種統一的見解，我們提出了比以前的方法更好的計算性能和/或與人類直覺的一致性的新方法。

1 Introduction

The ability to correctly interpret a prediction model’s output is extremely important. It engenders ?appropriate user trust, provides insight into how a model may be improved, and supports understanding ?of the process being modeled. In some applications, simple models (e.g., linear models) are often ?preferred for their ease of interpretation, even if they may be less accurate than complex ones. ?However, the growing availability of big data has increased the benefits of using complex models, so ?bringing to the forefront the trade-off between accuracy and interpretability of a model’s output. A ?wide variety of different methods have been recently proposed to address this issue [5, 8, 9, 3, 4, 1]. ?But an understanding of how these methods relate and when one method is preferable to another is ?still lacking. ?

正確解釋預測模型輸出的能力是極其重要的。它能夠建立使用者的信任，提供對模型如何改進的見解，并支持對建模過程的理解。在某些應用中，簡單模型(例如線性模型)往往因其易于解釋而受到青睞，即使它們可能不如復雜模型準確。然而，隨著大數據可用性的不斷增加，使用復雜模型的好處也越來越多，因此，在模型輸出的準確性和可解釋性之間進行權衡就成了當務之急。最近人們提出了各種不同的方法來解決這個問題[5,8,9,3,4,1]。但是，對于這些方法之間的關系以及哪種方法比哪種方法更好的理解仍然缺乏。

Here, we present a novel unified approach to interpreting model predictions.1 Our approach leads to ?three potentially surprising results that bring clarity to the growing space of methods: ?

1. We introduce the perspective of viewing any explanation of a model’s prediction as a model itself, ?which we term the explanation model. This lets us define the class of additive feature attribution ?methods (Section 2), which unifies six current methods.

2. We then show that game theory results guaranteeing a unique solution apply to the entire class of ?additive feature attribution methods (Section 3) and propose SHAP values as a unified measure of ?feature importance that various methods approximate (Section 4). ?

3. We propose new SHAP value estimation methods and demonstrate that they are better aligned ?with human intuition as measured by user studies and more effectually discriminate among model ?output classes than several existing methods (Section 5).

在這里，我們提出了一個新的統一的方法來解釋模型的預測。我們的方法導致了三個潛在的令人驚訝的結果，使越來越多的方法變得清晰起來:

1. 我們引入了這樣一種觀點，即把對模型預測的任何解釋看作模型本身，我們稱之為解釋模型。這讓我們可以定義附加特性屬性方法的類(第2節)，它統一了6個當前的方法。

2. 然后，我們展示了保證唯一解的博弈論結果適用于整個類別的加性特征屬性方法(第3節)，并提出了SHAP值作為特征重要性的統一度量，各種方法都近似(第4節)。

3.我們提出了新的SHAP值估計方法，并證明了它們比現有的幾種方法更符合用戶研究所衡量的人類直覺，更有效地區分模型輸出類(第5節)。

2 Additive Feature Attribution Methods

加性特征歸因方法

?The best explanation of a simple model is the model itself; it perfectly represents itself and is easy to ?understand. For complex models, such as ensemble methods or deep networks, we cannot use the ?original model as its own best explanation because it is not easy to understand. Instead, we must use a ?simpler explanation model, which we define as any interpretable approximation of the original model. ?We show below that six current explanation methods from the literature all use the same explanation ?model. This previously unappreciated unity has interesting implications, which we describe in later ?sections. ?

Let f be the original prediction model to be explained and g the explanation model. Here, we focus ?on local methods designed to explain a prediction f(x) based on a single input x, as proposed in ?LIME [5]. Explanation models often use simplified inputs x ?0 ?that map to the original inputs through a ?mapping function x = hx(x ?0 ?). Local methods try to ensure g(z ?0 ?) ≈ f(hx(z ?0 ?)) whenever z ?0 ≈ x ?0 ?. ?(Note that hx(x ?0 ?) = x even though x ?0 may contain less information than x because hx is specific to ?the current input x.) ?

Definition 1 Additive feature attribution methods have an explanation model that is a linear ?function of binary variables:

where z ?0 ∈ {0, 1}M, M is the number of simplified input features, and φi ∈ R. ?

Methods with explanation models matching Definition 1 attribute an effect φi ?to each feature, and ?summing the effects of all feature attributions approximates the output f(x) of the original model. ?Many current methods match Definition 1, several of which are discussed below.

對簡單模型最好的解釋是模型本身;它完美地表現了自己，很容易理解。對于復雜模型，如集成方法或深度網絡，我們不能使用原始模型作為自己的最佳解釋，因為它不容易理解。相反，我們必須使用一個更簡單的解釋模型，我們將其定義為原始模型的任何可解釋的近似。下面我們可以看出，文獻中目前的六種解釋方法都使用相同的解釋模型。這種以前未被重視的統一具有有趣的含義，我們將在后面的章節中描述。

設f為待解釋的原始預測模型，g為解釋模型。這里，我們著重于局部方法，這些方法被設計用來解釋基于單個輸入x的預測f(x)，如LIME[5]所提出的。解釋模型通常使用簡化的輸入x 0，通過映射函數x = hx(x 0)映射到原始輸入。局部方法在z0≈x0時盡量確保g(z0)≈f(hx(z0))。(注意hx(x0) = x，盡管x0包含的信息可能比x少，因為hx是特定于當前輸入x的。)

定義1加性特征歸因方法的解釋模型是二元變量的線性函數:

其中z 0∈{0,1}M, M為簡化輸入特征的數量，而算得上是∈R。

方法用解釋模型匹配定義1，對每個特征屬性的效果進行排序，并將所有特征屬性的效果相加，逼近原始模型的輸出f(x)。許多當前的方法匹配定義1，下面將討論其中的幾個。

2.1 LIME

?The LIME method interprets individual model predictions based on locally approximating the model ?around a given prediction [5]. The local linear explanation model that LIME uses adheres to Equation ?1 exactly and is thus an additive feature attribution method. LIME refers to simplified inputs x ?0 ?as ?“interpretable inputs,” and the mapping x = hx(x ?0 ?) converts a binary vector of interpretable inputs ?into the original input space. Different types of hx mappings are used for different input spaces. For ?bag of words text features, hx converts a vector of 1’s or 0’s (present or not) into the original word ?count if the simplified input is one, or zero if the simplified input is zero. For images, hx treats the ?image as a set of super pixels; it then maps 1 to leaving the super pixel as its original value and 0 ?to replacing the super pixel with an average of neighboring pixels (this is meant to represent being ?missing). ?To find φ, LIME minimizes the following objective function: ?

ξ = arg min ?g∈G ?L(f, g, πx0 ) + ?(g). (2) ?

Faithfulness of the explanation model g(z ?0 ?) to the original model f(hx(z ?0 ?)) is enforced through ?the loss L over a set of samples in the simplified input space weighted by the local kernel πx0 . ? ?penalizes the complexity of g. Since in LIME g follows Equation 1 and L is a squared loss, Equation ?2 can be solved using penalized linear regression.

LIME方法解釋個別模型的預測，基于局部逼近模型周圍的一個給定的預測[5]。LIME使用的局部線性解釋模型完全符合方程1，是一種加性特征歸因方法。LIME將簡化的輸入x0稱為“可解釋輸入”，映射x = hx(x0)將可解釋輸入的二進制向量轉換為原始輸入空間。不同類型的hx映射用于不同的輸入空間。對于word包文本特性，如果簡體輸入為1,hx將1或0(存在與否)的向量轉換為原始字數，如果簡體輸入為0，則轉換為零。對于圖像，hx將圖像視為超像素的集合;然后，它將1映射為保留超像素的原始值，將0映射為用相鄰像素的平均值(這意味著缺失)替換超像素。要找到φ，LIME將下列目標函數最小化:

解釋模型g(z0)對原始模型f(hx(z0))的忠實程度是通過在簡化輸入空間中對一組樣本L的損失來實現的，該簡化輸入空間由局部核函數sifx0加權。?懲罰g的復雜性。因為在LIMEg方程1,L是一個平方損失,使用懲罰線性回歸方程2可以解決。

2.2 DeepLIFT ?

DeepLIFT was recently proposed as a recursive prediction explanation method for deep learning ?[8, 7]. It attributes to each input xi a value C?xi?y that represents the effect of that input being set ?to a reference value as opposed to its original value. This means that for DeepLIFT, the mapping ?x = hx(x ?0 ?) converts binary values into the original inputs, where 1 indicates that an input takes its ?original value, and 0 indicates that it takes the reference value. The reference value, though chosen ?by the user, represents a typical uninformative background value for the feature. ?DeepLIFT uses a "summation-to-delta" property that states: ?

where o = f(x) is the model output, ?o = f(x) ? f(r), ?xi = xi ? ri ?, and r is the reference input. ?If we let φi = C?xi?o and φ0 = f(r), then DeepLIFT’s explanation model matches Equation 1 and ?is thus another additive feature attribution method. ?

DeepLIFT是最近提出的一種用于深度學習的遞歸預測解釋方法[8,7]。為每個輸入xi賦予一個值C?xi?y，表示該輸入相對于其原始值被設置為一個參考值后的效果。這意味著對于DeepLIFT，映射x = hx(x 0)將二進制值轉換為原始輸入，其中1表示輸入接受其原始值，0表示它接受引用值。參考值雖然是由用戶選擇的，但代表了該特性的典型的非信息背景值。DeepLIFT使用了一個“求和到增量”屬性，該屬性表示:

其中o = f(x)為模型輸出，?o = f(x)?f(r)，?xi = xi?ri, r為參考輸入。如果我們φi = C?xi?o and φ0 = f(r), ，則DeepLIFT的解釋模型符合方程1，是另一種加性特征歸因方法。

2.3 Layer-Wise Relevance Propagation ?

The layer-wise relevance propagation method interprets the predictions of deep networks [1]. As ?noted by Shrikumar et al., this menthod is equivalent to DeepLIFT with the reference activations of all ?neurons fixed to zero. Thus, x = hx(x ?0 ?) converts binary values into the original input space, where ?1 means that an input takes its original value, and 0 means an input takes the 0 value. Layer-wise ?relevance propagation’s explanation model, like DeepLIFT’s, matches Equation 1.

分層關聯傳播方法解釋了深度網絡[1]的預測。正如Shrikumar等人所指出的，這種方法相當于將所有神經元的參考激活固定為零的DeepLIFT。因此，x = hx(x 0)將二進制值轉換為原始輸入空間，其中1表示輸入取其原始值，0表示輸入取0值。分層關聯傳播的解釋模型，如DeepLIFT，符合方程1。

2.4 Classic Shapley Value Estimation

經典Shapley值估計

Three previous methods use classic equations from cooperative game theory to compute explanations ?of model predictions: Shapley regression values [4], Shapley sampling values [9], and Quantitative ?Input Influence [3]. ?

Shapley regression values are feature importances for linear models in the presence of multicollinearity. ?This method requires retraining the model on all feature subsets S ? F, where F is the set of all ?features. It assigns an importance value to each feature that represents the effect on the model ?prediction of including that feature. To compute this effect, a model fS∪{i} is trained with that feature ?present, and another model fS is trained with the feature withheld. Then, predictions from the two ?models are compared on the current input fS∪{i}(xS∪{i}) ? fS(xS), where xS represents the values ?of the input features in the set S. Since the effect of withholding a feature depends on other features ?in the model, the preceding differences are computed for all possible subsets S ? F \ {i}. The ?Shapley values are then computed and used as feature attributions. They are a weighted average of all ?possible differences: ?

For Shapley regression values, hx maps 1 or 0 to the original input space, where 1 indicates the input ?is included in the model, and 0 indicates exclusion from the model. If we let φ0 = f?(?), then the ?Shapley regression values match Equation 1 and are hence an additive feature attribution method. ?

之前的三種方法使用合作博弈論的經典方程來計算模型預測的解釋:Shapley回歸值[4]，Shapley采樣值[9]，定量輸入影響[3]。

Shapley回歸值是存在多重共線性的線性模型的重要特征。該方法需要對所有特性子集S?F重新訓練模型，其中F是所有特性的集合。它為每個特征賦值，表示包含該特征對模型預測的影響。為了計算這種效果，一個模型fS∪{i}使用該特征進行訓練，另一個模型fS使用保留該特征進行訓練。然后,比較從兩個模型預測在當前輸入fS∪{我}(xS∪{我})?F (x),其中x表示的值輸入特性集S以來扣留一個特性的影響取決于模型中的其他功能,前面的差異計算所有可能的子集S?F \{我}。然后計算Shapley值作為特征屬性。它們是所有可能差異的加權平均值:

對于Shapley回歸值，hx將1或0映射到原始輸入空間，其中1表示輸入包含在模型中，0表示從模型中排除。如果我們讓bx0 = f?(?)，那么Shapley回歸值與方程1匹配，因此是一種加性特征歸因方法。

Shapley sampling values are meant to explain any model by: (1) applying sampling approximations ?to Equation 4, and (2) approximating the effect of removing a variable from the model by integrating ?over samples from the training dataset. This eliminates the need to retrain the model and allows fewer ?than 2 ?|F | differences to be computed. Since the explanation model form of Shapley sampling values ?is the same as that for Shapley regression values, it is also an additive feature attribution method. ?

Quantitative input influence is a broader framework that addresses more than feature attributions. ?However, as part of its method it independently proposes a sampling approximation to Shapley values ?that is nearly identical to Shapley sampling values. It is thus another additive feature attribution ?method.

Shapley采樣值用于解釋任何模型:(1)對公式4應用采樣逼近，(2)通過對訓練數據集的樣本進行積分來逼近從模型中移除一個變量的效果。這就消除了對模型進行再訓練的需要，并允許計算小于2 |F |的差異。由于Shapley采樣值的解釋模型形式與Shapley回歸值的解釋模型形式相同，也是一種加性特征歸因方法。

定量輸入影響是一個更廣泛的框架，解決的不僅僅是特征屬性。然而，作為其方法的一部分，它獨立地提出了一個近似于Shapley值的采樣方法，該方法幾乎與Shapley采樣值相同。因此，它是另一種加性特征歸因方法。

3 Simple Properties Uniquely Determine Additive Feature Attributions ?

簡單屬性唯一地決定了可加性特征屬性

A surprising attribute of the class of additive feature attribution methods is the presence of a single ?unique solution in this class with three desirable properties (described below). While these properties ?are familiar to the classical Shapley value estimation methods, they were previously unknown for ?other additive feature attribution methods. ?

The first desirable property is local accuracy. When approximating the original model f for a specific ?input x, local accuracy requires the explanation model to at least match the output of f for the ?simplified input x ?0 ?(which corresponds to the original input x). ?

Property 1 (Local accuracy) ?

The explanation model g(x ?0 ?) matches the original model f(x) when x = hx(x ?0 ?). ?

The second property is missingness. If the simplified inputs represent feature presence, then missingness ?requires features missing in the original input to have no impact. All of the methods described in ?Section 2 obey the missingness property. ?

Property 2 (Missingness) ?

Missingness constrains features where x ?0 ?i = 0 to have no attributed impact. ?

The third property is consistency. Consistency states that if a model changes so that some simplified ?input’s contribution increases or stays the same regardless of the other inputs, that input’s attribution ?should not decrease. ?

Property 3 (Consistency)

加法特征屬性方法類的一個令人驚訝的屬性是該類中存在一個唯一的解決方案，具有三個理想的屬性（如下所述）。雖然這些特性對于經典的Shapley值估計方法來說是很常見的，但是對于其他的加性特征屬性方法來說，它們是未知的。

第一個需要的特性是局部精度。當對特定輸入x近似原始模型f時，局部精度要求解釋模型至少匹配簡化輸入x 0的f輸出(對應于原始輸入x)。

屬性1(局部精度)

f(x) = g(x 0) = x0 + xm i=1 (5)

當x = hx(x0)時，解釋模型g(x0)與原始模型f(x)匹配。

第二個特性是缺失。如果簡化的輸入表示特性的存在，那么丟失要求原始輸入中丟失的特性不會產生影響。第2節中描述的所有方法都服從失聯性。

屬性2 (Missingness)

(6) x 0 i = 0從下= 0 (6)

缺失限制了x0i = 0沒有屬性影響的特征。

第三個屬性是一致性。一致性指的是，如果一個模型發生了變化，使得某些簡化輸入的貢獻增加或保持不變，而不受其他輸入的影響，那么該輸入的屬性就不應該減少。

屬性3(一致性)

設fx(z0) = f(hx(z0))和z0 \ i表示設z0 i = 0。對于任意兩種模型f和f0，如果f0x (z0)?f0x (z0 \ i)≥fx(z0)?fx(z0 \ i)(7)對于所有輸入z 0∈{0,1}M，則rqi (f0, x)≥rqi (f, x)。

定理1只有一種可能的解釋模型g遵循定義1并滿足屬性1、2和3:(M?|z?0 |?1)!米![fx(z0)?fx(z0 \ i)](8)其中，| z0 |為z0中非零項的數量，z0?x 0表示所有的z0向量，其中非零項是x0中非零項的子集。

Theorem 1 Only one possible explanation model g follows Definition 1 and satisfies Properties 1, 2, ?and 3: ?φi(f, x) = X ?z ?0?x0 ?|z ?0 ?|!(M ? |z ?0 ?| ? 1)! ?M! ?[fx(z ?0 ?) ? fx(z ?0 ?\ i)] (8) ?where |z ?0 ?| is the number of non-zero entries in z ?0 ?, and z ?0 ? x ?0 ?represents all z ?0 ?vectors where the ?non-zero entries are a subset of the non-zero entries in x ?0 ?. ?

Theorem 1 follows from combined cooperative game theory results, where the values φi are known ?as Shapley values [6]. Young (1985) demonstrated that Shapley values are the only set of values ?that satisfy three axioms similar to Property 1, Property 3, and a final property that we show to be ?redundant in this setting (see Supplementary Material). Property 2 is required to adapt the Shapley ?proofs to the class of additive feature attribution methods. ?

Under Properties 1-3, for a given simplified input mapping hx, Theorem 1 shows that there is only one ?possible additive feature attribution method. This result implies that methods not based on Shapley ?values violate local accuracy and/or consistency (methods in Section 2 already respect missingness). ?The following section proposes a unified approach that improves previous methods, preventing them ?from unintentionally violating Properties 1 and 3.

定理1來源于組合合作博弈論的結果，其中的值稱為Shapley值[6]。Young(1985)證明Shapley值是唯一滿足三個公理的值集，這三個公理與屬性1、屬性3和最后一個我們在此設置中顯示為冗余的屬性相似(參見補充材料)。屬性2需要將Shapley證明應用于可加性特征歸屬方法的類別。

在屬性1-3下，對于給定的簡化輸入映射hx，定理1表明只有一種可能的加性特征屬性方法。這個結果意味著不基于Shapley值的方法會破壞局部精度和/或一致性(第2節中的方法已經考慮了缺失)。下面的部分提出了一種統一的方法，可以改進以前的方法，防止它們無意中違反屬性1和屬性3。

4 SHAP (SHapley Additive exPlanation) Values ?

We propose SHAP values as a unified measure of feature importance. These are the Shapley values ?of a conditional expectation function of the original model; thus, they are the solution to Equation 8, where fx(z ?0 ?) = f(hx(z ?0 ?)) = E[f(z) | zS], and S is the set of non-zero indexes in z ?0 ?(Figure 1). ?Based on Sections 2 and 3, SHAP values provide the unique additive feature importance measure that ?adheres to Properties 1-3 and uses conditional expectations to define simplified inputs. Implicit in this ?definition of SHAP values is a simplified input mapping, hx(z ?0 ?) = zS, where zS has missing values ?for features not in the set S. Since most models cannot handle arbitrary patterns of missing input ?values, we approximate f(zS) with E[f(z) | zS]. This definition of SHAP values is designed to ?closely align with the Shapley regression, Shapley sampling, and quantitative input influence feature ?attributions, while also allowing for connections with LIME, DeepLIFT, and layer-wise relevance ?propagation. ?

The exact computation of SHAP values is challenging. However, by combining insights from current ?additive feature attribution methods, we can approximate them. We describe two model-agnostic ?approximation methods, one that is already known (Shapley sampling values) and another that is ?novel (Kernel SHAP). We also describe four model-type-specific approximation methods, two of ?which are novel (Max SHAP, Deep SHAP). When using these methods, feature independence and ?model linearity are two optional assumptions simplifying the computation of the expected values ?(note that Sˉ is the set of features not in S):

、

我們提出SHAP值作為特征重要性的統一度量。這些是原始模型的條件期望函數的Shapley值;因此,他們是方程的解,在fx (z 0) = f (hx (z 0)) = E (f (z) | z)和S中的非零索引集z 0(圖1)。基于部分2和3,SHAP值提供了遵循屬性1-3的獨特的加性特征重要性度量,使用條件期望來定義簡化輸入。SHAP值的定義隱含了一個簡化的輸入映射，hx(z0) = zS，其中zS對于s集合中沒有的特征有缺失值。由于大多數模型不能處理缺失輸入值的任意模式，我們用E[f(z) | zS]來逼近f(zS)。SHAP值的定義被設計成與Shapley回歸、Shapley抽樣和定量輸入影響特征屬性緊密一致，同時也允許與LIME、DeepLIFT和分層關聯傳播的聯系。

SHAP值的精確計算具有挑戰性。然而，通過結合現有的加性特征歸因方法，我們可以對它們進行近似。我們描述了兩種模型不可知的近似方法，一種是已知的(Shapley采樣值)，另一種是新的(Kernel SHAP)。我們還描述了四種特定模型類型的近似方法，其中兩種是新穎的(Max SHAP, Deep SHAP)。使用這些方法時，特征獨立性和模型線性是簡化期望值計算的兩個可選假設(注意，Sˉ是不在S中的特征集):

Figure 1: SHAP (SHapley Additive exPlanation) values attribute to each feature the change in the ?expected model prediction when conditioning on that feature. They explain how to get from the ?base value E[f(z)] that would be predicted if we did not know any features to the current output ?f(x). This diagram shows a single ordering. When the model is non-linear or the input features are ?not independent, however, the order in which features are added to the expectation matters, and the ?SHAP values arise from averaging the φi values across all possible orderings.

圖1:SHAP (SHapley加性解釋)值屬性為每個特征在滿足該特征時預期模型預測中的變化。它們解釋了如果我們不知道當前輸出f(x)的任何特征，如何從基值E[f(z)]得到預測值。這個圖顯示了一個單一的順序。然而，當模型是非線性的或輸入特征不是獨立的時，特征添加到期望中的順序很重要，SHAP值來自于對所有可能的順序的φi 值的平均。

4.1 Model-Agnostic Approximations ?

Model-Agnostic近似

If we assume feature independence when approximating conditional expectations (Equation 11), as ?in [9, 5, 7, 3], then SHAP values can be estimated directly using the Shapley sampling values method ?[9] or equivalently the Quantitative Input Influence method [3]. These methods use a sampling ?approximation of a permutation version of the classic Shapley value equations (Equation 8). Separate ?sampling estimates are performed for each feature attribution. While reasonable to compute for a ?small number of inputs, the Kernel SHAP method described next requires fewer evaluations of the ?original model to obtain similar approximation accuracy (Section 5). ?

如果我們在近似條件期望時假設特征獨立性(式11)，如[9,5,7,3]，則可以直接使用Shapley采樣值方法[9]或定量輸入影響方法[3]估算SHAP值。這些方法使用經典Shapley值方程(方程8)的排列版本的抽樣近似。對每個特征屬性執行單獨的抽樣估計。雖然計算少量輸入是合理的，但是下面描述的核形狀方法需要對原始模型進行較少的評估，以獲得類似的逼近精度(第5節)。

Kernel SHAP (Linear LIME + Shapley values)

Kernel SHAP(線性LIME + Shapley值)

?Linear LIME uses a linear explanation model to locally approximate f, where local is measured in the ?simplified binary input space. At first glance, the regression formulation of LIME in Equation 2 seems ?very different from the classical Shapley value formulation of Equation 8. However, since linear ?LIME is an additive feature attribution method, we know the Shapley values are the only possible ?solution to Equation 2 that satisfies Properties 1-3 – local accuracy, missingness and consistency. A ?natural question to pose is whether the solution to Equation 2 recovers these values. The answer ?depends on the choice of loss function L, weighting kernel πx0 and regularization term ?. The LIME ?choices for these parameters are made heuristically; using these choices, Equation 2 does not recover ?the Shapley values. One consequence is that local accuracy and/or consistency are violated, which in ?turn leads to unintuitive behavior in certain circumstances (see Section 5).

Linear LIME使用一個線性解釋模型來局部逼近f，其中局部是在簡化的二進制輸入空間中度量的。乍一看，方程2中的LIME回歸公式與經典的方程8的Shapley值公式有很大的不同。然而，由于線性LIME是一種加性特征屬性的方法，我們知道Shapley值是等式2的唯一可能的解，滿足特性1-3 -局部精度、缺失和一致性。人們自然會提出這樣一個問題:方程式2的解決方案能否恢復這些價值?答案取決于損失函數的選擇,權重內核πx0?和正則化項。這些參數的LIME選擇是啟發式的;使用這些選項，方程2不能恢復Shapley值。一個后果是局部準確性和/或一致性被違反，這反過來導致在某些情況下的非直覺行為(見第5節)。

Below we show how to avoid heuristically choosing the parameters in Equation 2 and how to find the ?loss function L, weighting kernel πx0 , and regularization term ? that recover the Shapley values. ?

Theorem 2 (Shapley kernel) Under Definition 1, the specific forms of πx0 , L, and ? that make ?solutions of Equation 2 consistent with Properties 1 through 3 are:

where |z ?0 ?| is the number of non-zero elements in z ?0 ?. ?

The proof of Theorem 2 is shown in the Supplementary Material. ?

It is important to note that πx0 (z ?0 ?) = ∞ when |z ?0 ?| ∈ {0, M}, which enforces φ0 = fx(?) and f(x) = ?PM ?i=0 φi ?. In practice, these infinite weights can be avoided during optimization by analytically ?eliminating two variables using these constraints. ?

Since g(z ?0 ?) in Theorem 2 is assumed to follow a linear form, and L is a squared loss, Equation 2 ?can still be solved using linear regression. As a consequence, the Shapley values from game theory ?can be computed using weighted linear regression.2 Since LIME uses a simplified input mapping ?that is equivalent to the approximation of the SHAP mapping given in Equation 12, this enables ?regression-based, model-agnostic estimation of SHAP values. Jointly estimating all SHAP values ?using regression provides better sample efficiency than the direct use of classical Shapley equations ?(see Section 5). ?

下面我們展示如何避免一些選擇方程2中的參數和如何找到損失函數L,加權內核πx0,正則化項?夏普利值恢復。

定理2(ShapleyKernel )定義1,具體形式的πx0, L,和?方程2的解決方案符合1到3屬性:

?(g) = 0,πx0 z (0) = (M?1)z (M z選擇| 0 |)| 0 | (M?z | 0 |), L (f, g,πx0) = X z 0 z∈f (h?1 X z(0))?g (z 0) 2πx0 (z 0),

其中| z0 |為z0中非零元素的個數。

定理2的證明在補充材料中給出。

需要注意的是，當|z 0 |∈{0,M}時，x0 (z 0) =∞，這就使得x0 = fx(?)和f(x) = PM i=0的isi。在實踐中，這些無限權值可以避免在優化過程中分析消除兩個變量使用這些約束。

由于假設定理2中的g(z0)是線性形式，而L是平方損失，所以方程2仍然可以用線性回歸來求解。因此，從博弈論的Shapley值可以計算使用加權線性回歸。2由于LIME使用了一個簡化的輸入映射，它等價于公式12中給出的SHAP映射的近似值，這使得基于回歸的、不依賴模型的SHAP值估計成為可能。使用回歸聯合估計所有SHAP值比直接使用經典Shapley方程提供了更好的樣本效率(見第5節)。

The intuitive connection between linear regression and Shapley values is that Equation 8 is a difference ?of means. Since the mean is also the best least squares point estimate for a set of data points, it is ?natural to search for a weighting kernel that causes linear least squares regression to recapitulate ?the Shapley values. This leads to a kernel that distinctly differs from previous heuristically chosen ?kernels (Figure 2A).

線性回歸和Shapley值之間的直觀聯系是，方程8是平均值的差異。由于平均值也是一組數據點的最佳最小二乘點估計值，因此很自然地需要搜索一個加權核函數，它可以導致線性最小二乘回歸來重新獲得Shapley值。這將導致一個與之前啟發式選擇的內核明顯不同的內核(圖2A)。

4.2 Model-Specific Approximations ?

模型相關的近似

While Kernel SHAP improves the sample efficiency of model-agnostic estimations of SHAP values, by ?restricting our attention to specific model types, we can develop faster model-specific approximation ?methods. ?

雖然核形狀提高了模型不可知的形狀值估計的樣本效率，但通過限制我們的注意力到特定的模型類型，我們可以開發更快的模型特定的近似方法。

Linear SHAP ?

For linear models, if we assume input feature independence (Equation 11), SHAP values can be ?approximated directly from the model’s weight coefficients. ?

This follows from Theorem 2 and Equation 11, and it has been previously noted by ?trumbelj and ?Kononenko [9]. ?

Low-Order SHAP ?

Since linear regression using Theorem 2 has complexity O(2M + M3 ?), it is efficient for small values ?of M if we choose an approximation of the conditional expectations (Equation 11 or 12).

線性SHAP

對于線性模型，如果我們假設輸入特征獨立性(方程11)，SHAP值可以直接從模型的權重系數近似得到。

推論1(線性SHAP)給定線性模型

從定理2和方程11在此之前,它已經被?前所述trumbelj和Kononenko[9]。

低階SHAP

由于使用定理2的線性回歸的復雜度為O(2M + M3)，如果我們選擇條件期望的近似值(方程11或12)，那么它對于M的小值是有效的。

Figure 2: (A) The Shapley kernel weighting is symmetric when all possible z ?0 vectors are ordered ?by cardinality there are 2 ?15 vectors in this example. This is distinctly different from previous ?heuristically chosen kernels. (B) Compositional models such as deep neural networks are comprised ?of many simple components. Given analytic solutions for the Shapley values of the components, fast ?approximations for the full model can be made using DeepLIFT’s style of back-propagation. ??

圖2:(A) Shapley核加權是對稱的，當所有可能的z0向量是由基數排序時，在這個例子中有2 15個向量。這與以前啟發式選擇的內核明顯不同。(B)組成模型，如深度神經網絡，是由許多簡單組件組成的。給定組件Shapley值的解析解，可以使用DeepLIFT的反向傳播方式對整個模型進行快速逼近。

Max SHAP

Using a permutation formulation of Shapley values, we can calculate the probability that each input ?will increase the maximum value over every other input. Doing this on a sorted order of input values ?lets us compute the Shapley values of a max function with M inputs in O(M2 ?) time instead of ?O(M2M). See Supplementary Material for the full algorithm.

最大SHAP

使用Shapley值的置換公式，我們可以計算每個輸入比其他輸入增加最大值的概率。按照輸入值的排序進行此操作，可以讓我們在O(M2)時間內計算M個max函數的Shapley值，而不是O(M2M)。參見補充資料了解完整的算法。

Deep SHAP (DeepLIFT + Shapley values) ?

While Kernel SHAP can be used on any model, including deep models, it is natural to ask whether ?there is a way to leverage extra knowledge about the compositional nature of deep networks to improve ?computational performance. We find an answer to this question through a previously unappreciated ?connection between Shapley values and DeepLIFT [8]. If we interpret the reference value in Equation ?3 as representing E[x] in Equation 12, then DeepLIFT approximates SHAP values assuming that ?the input features are independent of one another and the deep model is linear. DeepLIFT uses a ?linear composition rule, which is equivalent to linearizing the non-linear components of a neural ?network. Its back-propagation rules defining how each component is linearized are intuitive but were ?heuristically chosen. Since DeepLIFT is an additive feature attribution method that satisfies local ?accuracy and missingness, we know that Shapley values represent the only attribution values that ?satisfy consistency. This motivates our adapting DeepLIFT to become a compositional approximation ?of SHAP values, leading to Deep SHAP. ?

Deep SHAP combines SHAP values computed for smaller components of the network into SHAP ?values for the whole network. It does so by recursively passing DeepLIFT’s multipliers, now defined ?in terms of SHAP values, backwards through the network as in Figure 2B:

Deep SHAP (DeepLIFT + Shapley值)

雖然Kernel SHAP可以用于任何模型，包括深度模型，但我們很自然地會問，是否有一種方法可以利用關于深度網絡組成特性的額外知識來提高計算性能。我們通過Shapley值和DeepLIFT[8]之間先前未被重視的聯系找到了這個問題的答案。如果我們將公式3中的參考值解釋為公式12中的E[x]，則DeepLIFT近似SHAP值，假設輸入特征是相互獨立的，深模型是線性的。DeepLIFT使用線性組合規則，這相當于線性化一個神經網絡的非線性成分。它的反向傳播規則定義了如何將每個分量線性化是直觀的，但是啟發式地選擇的。由于DeepLIFT是一種滿足局部精度和缺失性的加性特征屬性方法，我們知道Shapley值代表了唯一滿足一致性的屬性值。這促使我們適應DeepLIFT，成為SHAP值的合成近似值，從而形成了Deep SHAP。

深度SHAP將網絡中較小組件的SHAP值合并為整個網絡的SHAP值。它通過遞歸傳遞DeepLIFT的乘數，現在定義為SHAP值，向后通過網絡，如圖2B所示:

Since the SHAP values for the simple network components can be efficiently solved analytically ?if they are linear, max pooling, or an activation function with just one input, this composition ?rule enables a fast approximation of values for the whole model. Deep SHAP avoids the need to ?heuristically choose ways to linearize components. Instead, it derives an effective linearization from ?the SHAP values computed for each component. The max function offers one example where this ?leads to improved attributions (see Section 5).

由于簡單網絡組件的SHAP值可以有效地解析求解，如果它們是線性的、最大池化的或只有一個輸入的激活函數的，這個組合規則允許對整個模型的值進行快速逼近。深度SHAP避免了啟發式地選擇線性化組件的方法的需要。相反，它從為每個組件計算的SHAP值得到有效的線性化。max函數提供了一個例子，在這個例子中可以改進屬性(見第5節)。

Figure 3: Comparison of three additive feature attribution methods: Kernel SHAP (using a debiased ?lasso), Shapley sampling values, and LIME (using the open source implementation). Feature ?importance estimates are shown for one feature in two models as the number of evaluations of the ?original model function increases. The 10th and 90th percentiles are shown for 200 replicate estimates ?at each sample size. (A) A decision tree model using all 10 input features is explained for a single ?input. (B) A decision tree using only 3 of 100 input features is explained for a single input.

圖3:比較三種附加特性屬性方法:Kernel SHAP(使用去偏套索)、Shapley采樣值和LIME(使用開源實現)。隨著原始模型函數的評估數量的增加，在兩個模型中顯示一個特征的特征重要性估計值。第10百分位數和第90百分位數顯示了200個重復估計在每個樣本大小。(A)一個決策樹模型使用所有10個輸入特征解釋單一輸入。(B)對于單個輸入，說明一個決策樹只使用100個輸入特征中的3個。

5 Computational and User Study Experiments ?

計算和用戶研究實驗

We evaluated the benefits of SHAP values using the Kernel SHAP and Deep SHAP approximation ?methods. First, we compared the computational efficiency and accuracy of Kernel SHAP vs. LIME ?and Shapley sampling values. Second, we designed user studies to compare SHAP values with ?alternative feature importance allocations represented by DeepLIFT and LIME. As might be expected, ?SHAP values prove more consistent with human intuition than other methods that fail to meet ?Properties 1-3 (Section 2). Finally, we use MNIST digit image classification to compare SHAP with ?DeepLIFT and LIME. ?

我們評估了利用Kernel SHAP和Deep SHAP近似方法的SHAP值的好處。首先，我們比較了Kernel SHAP與LIME和Shapley采樣值的計算效率和精度。其次，我們設計了用戶研究來比較SHAP值和其他特征重要性分配，如DeepLIFT和LIME。正如預期的那樣，SHAP值被證明比其他無法滿足屬性1-3的方法更符合人類直覺(第2節)。最后，我們使用MNIST數字圖像分類將SHAP與DeepLIFT和LIME進行比較。

5.1 Computational Efficiency ?

計算效率

Theorem 2 connects Shapley values from game theory with weighted linear regression. Kernal SHAP ?uses this connection to compute feature importance. This leads to more accurate estimates with fewer ?evaluations of the original model than previous sampling-based estimates of Equation 8, particularly ?when regularization is added to the linear model (Figure 3). Comparing Shapley sampling, SHAP, and ?LIME on both dense and sparse decision tree models illustrates both the improved sample efficiency ?of Kernel SHAP and that values from LIME can differ significantly from SHAP values that satisfy ?local accuracy and consistency. ?

定理2將博弈論中的Shapley值與加權線性回歸聯系起來。Kernal SHAP使用這種連接來計算特征的重要性。這導致更精確的估計,用更少的原始模型的評估比先前sampling-based 8的估計方程,特別是當正規化添加到線性模型(圖3)。Shapley抽樣比較,SHAP,LIME稠密和稀疏的決策樹模型說明了改進后的樣品的效率值的Kernel SHAP和LIME可以明顯的區別于SHAP值滿足當地的準確性和一致性。

5.2 Consistency with Human Intuition ?

符合人類直覺

Theorem 1 provides a strong incentive for all additive feature attribution methods to use SHAP ?values. Both LIME and DeepLIFT, as originally demonstrated, compute different feature importance ?values. To validate the importance of Theorem 1, we compared explanations from LIME, DeepLIFT, ?and SHAP with user explanations of simple models (using Amazon Mechanical Turk). Our testing ?assumes that good model explanations should be consistent with explanations from humans who ?understand that model. ?

We compared LIME, DeepLIFT, and SHAP with human explanations for two settings. The first ?setting used a sickness score that was higher when only one of two symptoms was present (Figure 4A). ?The second used a max allocation problem to which DeepLIFT can be applied. Participants were told ?a short story about how three men made money based on the maximum score any of them achieved ?(Figure 4B). In both cases, participants were asked to assign credit for the output (the sickness score ?or money won) among the inputs (i.e., symptoms or players). We found a much stronger agreement ?between human explanations and SHAP than with other methods. SHAP’s improved performance for ?max functions addresses the open problem of max pooling functions in DeepLIFT [7]. ?

定理1強烈鼓勵所有加性特征屬性方法使用SHAP值。正如最初演示的那樣，LIME和DeepLIFT都可以計算不同的特征重要度值。為了驗證定理1的重要性，我們將來自LIME、DeepLIFT和SHAP的解釋與用戶對簡單模型的解釋(使用Amazon Mechanical Turk)進行了比較。我們的測試假設好的模型解釋應該與理解該模型的人的解釋一致。

我們比較了LIME、DeepLIFT和SHAP與人類對兩種設置的解釋。第一組使用的疾病評分在兩種癥狀中只有一種出現時更高(圖4A)。第二種方法使用了一個可以應用DeepLIFT的最大分配問題。參與者被告知一個關于三個男人如何根據他們中的任何一個人獲得的最高分賺錢的小故事(圖4B)。在這兩種情況下，參與者被要求為輸入(即癥狀或玩家)中的輸出(疾病分數或贏得的錢)分配分數。我們發現人類的解釋與SHAP之間的一致性比其他方法要強得多。SHAP對max函數性能的改進解決了DeepLIFT[7]中max池函數的開放問題。

5.3 Explaining Class Differences ?

解釋分類差異

As discussed in Section 4.2, DeepLIFT’s compositional approach suggests a compositional approximation ?of SHAP values (Deep SHAP). These insights, in turn, improve DeepLIFT, and a new version includes updates to better match Shapley values [7]. Figure 5 extends DeepLIFT’s convolutional ?network example to highlight the increased performance of estimates that are closer to SHAP values. ?The pre-trained model and Figure 5 example are the same as those used in [7], with inputs normalized ?between 0 and 1. Two convolution layers and 2 dense layers are followed by a 10-way softmax ?output layer. Both DeepLIFT versions explain a normalized version of the linear layer, while SHAP ?(computed using Kernel SHAP) and LIME explain the model’s output. SHAP and LIME were both ?run with 50k samples (Supplementary Figure 1); to improve performance, LIME was modified to use ?single pixel segmentation over the digit pixels. To match [7], we masked 20% of the pixels chosen to ?switch the predicted class from 8 to 3 according to the feature attribution given by each method.

如4.2節所討論的，DeepLIFT的組分方法建議對形狀值進行組分近似(Deep SHAP)。這些見解反過來改進了DeepLIFT，新版本包括更新以更好地匹配Shapley值[7]。圖5擴展了DeepLIFT的卷積網絡示例，以突出顯示更接近SHAP值的估計所增加的性能。預訓練的模型和圖5示例與[7]中使用的相同，其輸入在0和1之間規范化。兩個卷積層和2個密集層之后是一個10路softmax輸出層。兩個DeepLIFT版本都解釋了線性層的規范化版本，而SHAP(使用Kernel SHAP計算)和LIME解釋了模型的輸出。SHAP和LIME均在50k樣品中運行(補充圖1);為了提高性能，LIME被修改為對數字像素使用單一像素分割。為了匹配[7]，我們根據每種方法給出的特征屬性，對選擇的將預測類從8切換到3的像素的20%進行掩蔽。

Figure 4: Human feature impact estimates are shown as the most common explanation given among ?30 (A) and 52 (B) random individuals, respectively.

(A) Feature attributions for a model output value ?(sickness score) of 2. The model output is 2 when fever and cough are both present, 5 when only ?one of fever or cough is present, and 0 otherwise.

?(B) Attributions of profit among three men, given ?according to the maximum number of questions any man got right. The first man got 5 questions ?right, the second 4 questions, and the third got none right, so the profit is $5.

圖4:分別在30 (A)和52 (B)隨機個體中，最常見的解釋是人類特征影響估計。

(A)模型輸出值(疾病得分)為2的特征屬性。當同時出現發燒和咳嗽時，模型輸出為2，只有發燒或咳嗽一種時輸出為5，其余輸出為0。

(B)根據每個人所答對的問題的最大數量，由三個人來確定利益的歸屬。第一個人答對了5道題，第二個答對了4道題，第三個沒答對，所以利潤是5美元。

Figure 5: Explaining the output of a convolutional network trained on the MNIST digit dataset. Orig. ?DeepLIFT has no explicit Shapley approximations, while New DeepLIFT seeks to better approximate ?Shapley values.

(A) Red areas increase the probability of that class, and blue areas decrease the ?probability. Masked removes pixels in order to go from 8 to 3.

(B) The change in log odds when ?masking over 20 random images supports the use of better estimates of SHAP values.

圖5:解釋在MNIST數字數據集上訓練的卷積網絡的輸出。DeepLIFT沒有明確的Shapley近似，而New DeepLIFT尋求更好的近似Shapley值。

(A)紅色區域增加該類的概率，藍色區域減少該類的概率。為了從8到3去除像素。

(B)當掩蔽超過20幅隨機圖像時，日志概率的變化支持更好地估計SHAP值。

6 Conclusion

The growing tension between the accuracy and interpretability of model predictions has motivated ?the development of methods that help users interpret predictions. The SHAP framework identifies ?the class of additive feature importance methods (which includes six previous methods) and shows ?there is a unique solution in this class that adheres to desirable properties.

The thread of unity that ?SHAP weaves through the literature is an encouraging sign that common principles about model ?interpretation can inform the development of future methods. ?

We presented several different estimation methods for SHAP values, along with proofs and experiments ?showing that these values are desirable. Promising next steps involve developing faster ?model-type-specific estimation methods that make fewer assumptions, integrating work on estimating ?interaction effects from game theory, and defining new explanation model classes.

模型預測的準確性和可解釋性之間的緊張關系推動了幫助用戶解釋預測的方法的發展。SHAP框架確定了附加特征重要性方法的類別(包括以前的6個方法)，并表明在該類中有一個唯一的解決方案，它符合需要的屬性。

SHAP在文獻中統一的線索是一個令人鼓舞的信號，表明有關模型解釋的通用原理可以為將來方法的發展提供信息。

我們提出了幾種不同的SHAP值估計方法，并通過證明和實驗表明這些值是可取的。有希望的下一步包括開發更快的模型-特定類型的估計方法，使假設更少，整合從博弈論中估算交互作用的工作，并定義新的解釋模型類。

Acknowledgements ?

This work was supported by a National Science Foundation (NSF) DBI-135589, NSF CAREER ?DBI-155230, American Cancer Society 127332-RSG-15-097-01-TBG, National Institute of Health ?(NIH) AG049196, and NSF Graduate Research Fellowship. We would like to thank Marco Ribeiro, ?Erik ?trumbelj, Avanti Shrikumar, Yair Zick, the Lee Lab, and the NIPS reviewers for feedback that ?has significantly improved this work.

這項工作得到了美國國家科學基金會(NSF) DBI-135589、NSF CAREER DBI-155230、美國癌癥學會127332-RSG-15-097-01-TBG、美國國家衛生研究院(NIH) AG049196和NSF研究生研究獎學金的支持。我們要感謝Marco Ribeio, ?Erik ?trumbelj, Avanti Shrikumar, Yair Zick, ?Lee實驗室,和NIPS 的評論反饋,大大提高這項工作。

總結

以上是生活随笔為你收集整理的Paper：《A Unified Approach to Interpreting Model Predictions—解释模型预测的统一方法》论文解读与翻译的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： C++：C++语言入门级基础知识考察点回
下一篇： Py之featuretools：feat