这两个词的区别一直就很含糊,事实上,在英文里Interpretable和Explainable本身就是一对同义词,表示「capable of being understood」[1]。不同文献对这两个词的定义都各有不同,有一些文献不区分这两个词,交替使用,例如Molnar很有名的教程Interpretable Machine Learning。顺便提一下, @最大的梦想家 提到[2]的这段中文译文,实际上是翻译错误,译文提到:

像 Miller (2017) ?样,区分术语 Interpretable 和 Explainable 是有意义的。我们将使用Explainable 来描述对单个实例预测的解释。

根据这句话的意思,该知友认为「Interpretability指模型整体的可解释性,Explainability特指单个预测的可解释性」。这是被中文译本误导了。原文其实是[3]

Like Miller (2017), I think it makes sense to distinguish between the terms interpretability/explainability and explanation. I will use "explanation" for explanations of individual predictions.

这里作者认为要区分「interpretability/explainability」和「explanation」,前者表示模型决策可以被理解的性质,后者表示对某个特定实例的解释。而实际上这个教程并不区别interpretability和explainability。

如果硬要区分这两个词,我个人比较倾向于这样区分这两个词[4]

  • Interpretability:表示模型固有的性质,即模型自身的决策过程对人类来说即是可理解的,例如(广义)线性模型、决策树等等;
  • Explainability:表示对模型决策过程的事后(post hoc)可重建性,而模型本身的决策可以是黑盒、不透明的,这种重建的解释不一定和真实的模型决策一致。

其实这东西没必要过度解读,因为学科发展必然伴随著术语的变迁,同义词变成反义词、反义词变成同义词,都是有可能的。

参考

  1. ^参见Thesaurus小节 https://www.thefreedictionary.com/interpretable
  2. ^https://www.zhihu.com/question/382107473/answer/1477720260
  3. ^https://christophm.github.io/interpretable-ml-book/interpretability.html
  4. ^https://www.nature.com/articles/s42256-019-0048-x.pdf


看了几个回答,感觉都没说到位。

集各家之言,Interpretability的意思为:

Interpretability means that the cause and effect can be determined.「 [1]

Interpretability is about the extent to which a cause and effect can be observed within a system.「 [2]

"interpretable ML uses models that are no black boxes."[3]

「Interpretability has to do with how accurate a machine learning model can associate a cause to an effect.「 [1]

"Interpretability is the degree to which a human can understand the cause of a decision."[4]

"An interpretable model is able to output humanly understandable summaries of its calculation that allow us understand how it came to specific conclusions. Due to that a human would be able to actually create a specific desired outcome by selecting specific inputs."

可以总结为:Interpretability表示是否一个模型能够解释因果关系,这是一个更加抽象、宏伟的先验概念(也就是在事情发生之前我就知道)。「有因必有果,你的报应就是我。」

Explainability的意思为:

「Explainability has to do with the ability of the parameters, often hidden in Deep Nets, to justify the results.「[1]

"Explainable ML uses a black box model and explains it afterwards."[3]

「Explainability, meanwhile, is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms.「[2]

"Rather than trying to create models that are inherently interpretable, there has been a recent explosion of work on『 explainable ML』, where a second (post hoc) model is created to explain the first black box model. This is problematic. Explanations are often not reliable, and can be misleading, as we discuss below. If we instead use models that are inherently interpretable, they provide their own explanations, which are faithful to what the model actually computes."[5]

"A "merely" explainable model however does not deliver this input and we need a second model or mode of inspection to create a "Hypothesis about its mechanism" that will help explain the results but not allows to rebuild results by hand deterministically."[6]

Explainability可以理解为具体的某一个模型中,模型结构、模型参数、数据输入等是如何得到数据输出的。这是一个更加形象、具体的后验概念(也就是事情发生之后,我想著怎么去解释,让别人理解我)。「事后诸葛亮,事前猪一样。」

很显然,Interpretability是好于Explainability的。当然,这只是一部分人的理解,欢迎讨论。


[1] Interpretability vs Explainability: The Black Box of Machine Learning

[2]Machine Learning Explainability vs Interpretability: Two concepts that could help restore trust in AI

[3]Explainable ML versus Interpretable ML

[4]Interpretable machine learning

[5]Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

[6] What is the difference between explainable and interpretable machine learning?


我比较认同《可解释机器学习》译文中的一段:

如果?个模型的决策比另?个模型的决策能让?更容易理解,那么它就比另?个模型有更高的解释性。我们将在后文中同时使用 Interpretable 和 Explainable 这两个术语来描述可解释性。像 Miller (2017) ?样,区分术语 Interpretable 和 Explainable 是有意义的。我们将使用Explainable 来描述对单个实例预测的解释。

Interpretability指模型整体的可解释性,Explainability特指单个预测的可解释性(比如某次申请贷款被拒的原因)。


小白推荐阅读文献[1],里面有一段话,如下:

Interpretability refers to a passive characteristic of a model referring to the level at which a given model makes sense for a human observer. This feature is also expressed as transparency. By contrast, explainability can be viewed as an active characteristic of a model, denoting any action or procedure taken by a model with the intent of clarifying or detailing its internal functions.

Interpretability: it is defined as the ability to explain or to provide the meaning in understandable terms to a human.

Explainability: explainability is associated with the notion of explanation as an interface between humans and a decision maker that is, at the same time, both an accurate proxy of the decision maker and comprehensible to humans. Given a certain audience, explainability refers to the details and reasons a model gives to make its functioning clear or easy to understand.

我自己的认识是,Interpretability 对于Observer来说Model合理且可以理解,即如果你去研究这个Model,那可以发现Model是可解释的、合理的、行得通的,其关系更偏向于Model和Observer的关系(Model is passive),而Explainablity 则更偏向于Model和Audience之间的关系(Model is active), 不同的Audience 需求不同的Explanation,而Model要提供这些Explanation。

[1] A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, F. Herrera, Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI, Information Fusion. 58 (2020) 82–115. https://doi.org/10.1016/j.inffus.2019.12.012.


在写文章时或者审文章时感觉不大区分。

我的感觉是这样的(抛砖引玉):

interpretability: 能给出黑盒的一些信息,既可以用于科研工作者审视模型,也可以用来说服客户(因此会有较高引用)。比如lime和后来的snap.

explanability: 回答类似于rnn训练出来的模型有什么性质,使得其具有较好的性能之类的问题。去年nips上有一篇rnn sentiment line attractor的文章。总体来说,暂时没有人关注。。。


这是一对同义词,强行说不同是学术人对语言不负责的滥用乱用。所谓模型可解释就是既应该有可以人类能理解的公式,也有可以理解的变数,还得有可以预判的模型行为。换个同义词并不能掩盖某方面差强人意的半黑箱模型。


interpretability是可理解,explainability是可解释,

打个比方,你做化学实验,有化学方程式做参考,所以你可以理解这个实验,知道目前的输入会产生什么样的输出,这就叫可理解。可解释意味著你知道中间的反应过程,可以用语言或文字将反应过程表达出来。

可解释的模型一定是可理解的,但可理解的模型不一定可解释。

只可意会,不可言传就可以理解为可理解但不可解释


推荐阅读:
相关文章