

这两个词的区别一直就很含糊,事实上,在英文里Interpretable和Explainable本身就是一对同义词,表示「capable of being understood」[1]。不同文献对这两个词的定义都各有不同,有一些文献不区分这两个词,交替使用,例如Molnar很有名的教程Interpretable Machine Learning。顺便提一下, @最大的梦想家 提到[2]的这段中文译文,实际上是翻译错误,译文提到:

像 Miller (2017) ?样,区分术语 Interpretable 和 Explainable 是有意义的。我们将使用Explainable 来描述对单个实例预测的解释。


Like Miller (2017), I think it makes sense to distinguish between the terms interpretability/explainability and explanation. I will use "explanation" for explanations of individual predictions.





  1. ^参见Thesaurus小节 https://www.thefreedictionary.com/interpretable
  2. ^https://www.zhihu.com/question/382107473/answer/1477720260
  3. ^https://christophm.github.io/interpretable-ml-book/interpretability.html
  4. ^https://www.nature.com/articles/s42256-019-0048-x.pdf



Interpretability means that the cause and effect can be determined.「 [1]

Interpretability is about the extent to which a cause and effect can be observed within a system.「 [2]

"interpretable ML uses models that are no black boxes."[3]

「Interpretability has to do with how accurate a machine learning model can associate a cause to an effect.「 [1]

"Interpretability is the degree to which a human can understand the cause of a decision."[4]

"An interpretable model is able to output humanly understandable summaries of its calculation that allow us understand how it came to specific conclusions. Due to that a human would be able to actually create a specific desired outcome by selecting specific inputs."



「Explainability has to do with the ability of the parameters, often hidden in Deep Nets, to justify the results.「[1]

"Explainable ML uses a black box model and explains it afterwards."[3]

「Explainability, meanwhile, is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms.「[2]

"Rather than trying to create models that are inherently interpretable, there has been a recent explosion of work on『 explainable ML』, where a second (post hoc) model is created to explain the first black box model. This is problematic. Explanations are often not reliable, and can be misleading, as we discuss below. If we instead use models that are inherently interpretable, they provide their own explanations, which are faithful to what the model actually computes."[5]

"A "merely" explainable model however does not deliver this input and we need a second model or mode of inspection to create a "Hypothesis about its mechanism" that will help explain the results but not allows to rebuild results by hand deterministically."[6]



[1] Interpretability vs Explainability: The Black Box of Machine Learning

[2]Machine Learning Explainability vs Interpretability: Two concepts that could help restore trust in AI

[3]Explainable ML versus Interpretable ML

[4]Interpretable machine learning

[5]Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

[6] What is the difference between explainable and interpretable machine learning?


如果?个模型的决策比另?个模型的决策能让?更容易理解,那么它就比另?个模型有更高的解释性。我们将在后文中同时使用 Interpretable 和 Explainable 这两个术语来描述可解释性。像 Miller (2017) ?样,区分术语 Interpretable 和 Explainable 是有意义的。我们将使用Explainable 来描述对单个实例预测的解释。



Interpretability refers to a passive characteristic of a model referring to the level at which a given model makes sense for a human observer. This feature is also expressed as transparency. By contrast, explainability can be viewed as an active characteristic of a model, denoting any action or procedure taken by a model with the intent of clarifying or detailing its internal functions.

Interpretability: it is defined as the ability to explain or to provide the meaning in understandable terms to a human.

Explainability: explainability is associated with the notion of explanation as an interface between humans and a decision maker that is, at the same time, both an accurate proxy of the decision maker and comprehensible to humans. Given a certain audience, explainability refers to the details and reasons a model gives to make its functioning clear or easy to understand.

我自己的认识是,Interpretability 对于Observer来说Model合理且可以理解,即如果你去研究这个Model,那可以发现Model是可解释的、合理的、行得通的,其关系更偏向于Model和Observer的关系(Model is passive),而Explainablity 则更偏向于Model和Audience之间的关系(Model is active), 不同的Audience 需求不同的Explanation,而Model要提供这些Explanation。

[1] A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, F. Herrera, Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI, Information Fusion. 58 (2020) 82–115. https://doi.org/10.1016/j.inffus.2019.12.012.



interpretability: 能给出黑盒的一些信息,既可以用于科研工作者审视模型,也可以用来说服客户(因此会有较高引用)。比如lime和后来的snap.

explanability: 回答类似于rnn训练出来的模型有什么性质,使得其具有较好的性能之类的问题。去年nips上有一篇rnn sentiment line attractor的文章。总体来说,暂时没有人关注。。。





