人工智慧領域裡的Interpretability和Explainability有什麼區別嗎？

這兩個詞的區別一直就很含糊，事實上，在英文裏Interpretable和Explainable本身就是一對同義詞，表示「capable of being understood」^[1]。不同文獻對這兩個詞的定義都各有不同，有一些文獻不區分這兩個詞，交替使用，例如Molnar很有名的教程Interpretable Machine Learning。順便提一下， @最大的夢想家提到^[2]的這段中文譯文，實際上是翻譯錯誤，譯文提到：

像 Miller (2017) ?樣，區分術語 Interpretable 和 Explainable 是有意義的。我們將使用Explainable 來描述對單個實例預測的解釋。

根據這句話的意思，該知友認為「Interpretability指模型整體的可解釋性，Explainability特指單個預測的可解釋性」。這是被中文譯本誤導了。原文其實是^[3]：

Like Miller (2017), I think it makes sense to distinguish between the terms interpretability/explainability and explanation. I will use "explanation" for explanations of individual predictions.

這裡作者認為要區分「interpretability/explainability」和「explanation」，前者表示模型決策可以被理解的性質，後者表示對某個特定實例的解釋。而實際上這個教程並不區別interpretability和explainability。

如果硬要區分這兩個詞，我個人比較傾向於這樣區分這兩個詞^[4]：

Interpretability：表示模型固有的性質，即模型自身的決策過程對人類來說即是可理解的，例如（廣義）線性模型、決策樹等等；
Explainability：表示對模型決策過程的事後（post hoc）可重建性，而模型本身的決策可以是黑盒、不透明的，這種重建的解釋不一定和真實的模型決策一致。

其實這東西沒必要過度解讀，因為學科發展必然伴隨著術語的變遷，同義詞變成反義詞、反義詞變成同義詞，都是有可能的。

參考

^參見Thesaurus小節 https://www.thefreedictionary.com/interpretable
^https://www.zhihu.com/question/382107473/answer/1477720260
^https://christophm.github.io/interpretable-ml-book/interpretability.html
^https://www.nature.com/articles/s42256-019-0048-x.pdf

看了幾個回答，感覺都沒說到位。

集各家之言，Interpretability的意思為：

「Interpretability means that the cause and effect can be determined.「 [1]

「Interpretability is about the extent to which a cause and effect can be observed within a system.「 [2]

"interpretable ML uses models that are no black boxes."[3]

「Interpretability has to do with how accurate a machine learning model can associate a cause to an effect.「 [1]

"Interpretability is the degree to which a human can understand the cause of a decision."[4]

"An interpretable model is able to output humanly understandable summaries of its calculation that allow us understand how it came to specific conclusions. Due to that a human would be able to actually create a specific desired outcome by selecting specific inputs."

可以總結為：Interpretability表示是否一個模型能夠解釋因果關係，這是一個更加抽象、宏偉的先驗概念（也就是在事情發生之前我就知道）。「有因必有果，你的報應就是我。」

Explainability的意思為：

「Explainability has to do with the ability of the parameters, often hidden in Deep Nets, to justify the results.「[1]

"Explainable ML uses a black box model and explains it afterwards."[3]

「Explainability, meanwhile, is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms.「[2]

"Rather than trying to create models that are inherently interpretable, there has been a recent explosion of work on『 explainable ML』, where a second (post hoc) model is created to explain the first black box model. This is problematic. Explanations are often not reliable, and can be misleading, as we discuss below. If we instead use models that are inherently interpretable, they provide their own explanations, which are faithful to what the model actually computes."[5]

"A "merely" explainable model however does not deliver this input and we need a second model or mode of inspection to create a "Hypothesis about its mechanism" that will help explain the results but not allows to rebuild results by hand deterministically."[6]

Explainability可以理解為具體的某一個模型中，模型結構、模型參數、數據輸入等是如何得到數據輸出的。這是一個更加形象、具體的後驗概念（也就是事情發生之後，我想著怎麼去解釋，讓別人理解我）。「事後諸葛亮，事前豬一樣。」

很顯然，Interpretability是好於Explainability的。當然，這只是一部分人的理解，歡迎討論。

[1] Interpretability vs Explainability: The Black Box of Machine Learning

[2]Machine Learning Explainability vs Interpretability: Two concepts that could help restore trust in AI

[3]Explainable ML versus Interpretable ML

[4]Interpretable machine learning

[5]Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

[6] What is the difference between explainable and interpretable machine learning?

我比較認同《可解釋機器學習》譯文中的一段：

如果?個模型的決策比另?個模型的決策能讓?更容易理解，那麼它就比另?個模型有更高的解釋性。我們將在後文中同時使用 Interpretable 和 Explainable 這兩個術語來描述可解釋性。像 Miller (2017) ?樣，區分術語 Interpretable 和 Explainable 是有意義的。我們將使用Explainable 來描述對單個實例預測的解釋。

Interpretability指模型整體的可解釋性，Explainability特指單個預測的可解釋性（比如某次申請貸款被拒的原因）。

小白推薦閱讀文獻[1]，裡面有一段話，如下：

Interpretability refers to a passive characteristic of a model referring to the level at which a given model makes sense for a human observer. This feature is also expressed as transparency. By contrast, explainability can be viewed as an active characteristic of a model, denoting any action or procedure taken by a model with the intent of clarifying or detailing its internal functions.

Interpretability: it is defined as the ability to explain or to provide the meaning in understandable terms to a human.

Explainability: explainability is associated with the notion of explanation as an interface between humans and a decision maker that is, at the same time, both an accurate proxy of the decision maker and comprehensible to humans. Given a certain audience, explainability refers to the details and reasons a model gives to make its functioning clear or easy to understand.

我自己的認識是，Interpretability 對於Observer來說Model合理且可以理解，即如果你去研究這個Model，那可以發現Model是可解釋的、合理的、行得通的，其關係更偏向於Model和Observer的關係(Model is passive)，而Explainablity 則更偏向於Model和Audience之間的關係（Model is active）, 不同的Audience 需求不同的Explanation，而Model要提供這些Explanation。

[1] A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, F. Herrera, Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI, Information Fusion. 58 (2020) 82–115. https://doi.org/10.1016/j.inffus.2019.12.012.

在寫文章時或者審文章時感覺不大區分。

我的感覺是這樣的(拋磚引玉):

interpretability: 能給出黑盒的一些信息，既可以用於科研工作者審視模型，也可以用來說服客戶(因此會有較高引用)。比如lime和後來的snap.

explanability: 回答類似於rnn訓練出來的模型有什麼性質，使得其具有較好的性能之類的問題。去年nips上有一篇rnn sentiment line attractor的文章。總體來說，暫時沒有人關注。。。

這是一對同義詞，強行說不同是學術人對語言不負責的濫用亂用。所謂模型可解釋就是既應該有可以人類能理解的公式，也有可以理解的變數，還得有可以預判的模型行為。換個同義詞並不能掩蓋某方面差強人意的半黑箱模型。

interpretability是可理解，explainability是可解釋，

打個比方，你做化學實驗，有化學方程式做參考，所以你可以理解這個實驗，知道目前的輸入會產生什麼樣的輸出，這就叫可理解。可解釋意味著你知道中間的反應過程，可以用語言或文字將反應過程表達出來。

可解釋的模型一定是可理解的，但可理解的模型不一定可解釋。

只可意會，不可言傳就可以理解為可理解但不可解釋