BM3D 圖像去噪演算法||原論文翻譯 1/N

論文原文：https://www.cs.tut.fi/~foi/GCF-BM3D/BM3D_TIP_2007.pdf

Image denoising by sparse 3D transform-domain collaborative filtering

Abstract

總述

We propose a novel image denoising strategy based on an enhanced sparse representation in transform domain. The enhancement of the sparsity is achieved by grouping similar 2D image fragments (e.g. blocks) into 3D data arrays which we call "groups". Collaborative filtering is a special procedure developed to deal with these 3D groups. We realize it using the three successive steps: 3D transformation of a group, shrinkage of the transform spectrum, and inverse 3D transformation. The result is a 3D estimate that consists of the jointly filtered grouped image blocks. By attenuating the noise, the collaborative filtering reveals even the nest details shared by grouped blocks and at the same time it preserves the essential unique features of each individual block. The filtered blocks are then returned to their original positions. Because these blocks are overlapping, for each pixel we obtain many different estimates which need to be combined. Aggregation is a particular averaging procedure which is exploited to take advantage of this redundancy. A signicant improvement is obtained by a specially developed collaborative Wiener filtering. An algorithm based on this novel denoising strategy and its efcient implementation are presented in full detail; an extension to color-image denoising is also developed. The experimental results demonstrate that this computationally scalable algorithm achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.

我們提出了一種基於變換域中增強稀疏表示的新型圖像去噪策略。通過將類似的2D圖像片段（例如塊）分組成3D數據陣列（我們稱之為「組」）來實現稀疏度的增強。協作？過濾是為處理這些3D組而開發的一種特殊程序。我們使用三個連續步驟來實現它：組的3D變換，變換譜的收縮和逆3D變換。結果是由聯合濾波的分組圖像塊組成的3D估計。通過衰減雜訊，協同過濾甚至可以顯示分組塊共享的嵌套細節，同時保留每個塊的基本獨特特徵。然後將經過濾的塊返回到其原始位置。因為這些塊是重疊的，所以對於每個像素，我們獲得了許多需要組合起來的不同估計。聚合是一種特殊的平均過程，被用來利用這種冗餘。通過專門開發的協作維納濾波獲得了顯著的改進。進而提出了一種基於這種新型去噪策略及其有效實現的演算法，更進一步地還開發了彩色圖像去噪演算法。實驗結果表明，這種計算可擴展的演算法在峰值信噪比和主觀視覺質量方面都達到了最先進的去噪性能。

Index Terms: image denoising, sparsity, adaptive grouping, block-matching, 3D transform shrinkage.

索引術語：圖像去噪，稀疏性，自適應分組，塊匹配，3D變換收縮。

I. INTRODUCTION

介紹

PLENTY of denoising methods exist, originating from various disciplines such as probability theory, statistics, partial differential equations, linear and nonlinear filtering, spectral and multiresolution analysis. All these methods rely on some explicit or implicit assumptions about the true (noise-free) signal in order to separate it properly from the random noise.

目前已存在大量的去噪方法，它們源自各種學科，如概率論，統計學，偏微分方程，線性和非線性濾波，光譜和多解析度分析。所有這些方法都依賴於關於真實（無雜訊）信號的一些明確或隱含的假設，以便將其與隨機雜訊正確地分離。

In particular, the transform-domain denoising methods typically assume that the true signal can be well approximated by a linear combination of few basis elements. That is, the signal is sparsely represented in the transform domain. Hence, by preserving the few high-magnitude transform coefcients that convey mostly the true-signal energy and discarding the rest which are mainly due to noise, the true signal can be effectively estimated. The sparsity of the representation depends on both the transform and the true-signals properties.

特別的，變換域去噪方法通常假設真實信號可以通過幾個基本元素的線性組合很好地近似。也就是說，信號在變換域中稀疏地表示。因此，通過保留少數高幅度變換係數，其主要傳達真實信號能量並丟棄主要由雜訊引起的其餘部分，就可以有效地估計真實信號。表示的稀疏性取決於變換和真實信號的屬性。

The multiresolution transforms can achieve good sparsity for spatially localized details, such as edges and singularities. Because such details are typically abundant in natural images and convey a signicant portion of the information embedded therein, these transforms have found a signicant application for image denoising. Recently, a number of advanced denois- ing methods based on multiresolution transforms have been developed, relying on elaborate statistical dependencies between coefcients of typically overcomplete (e.g. translation- invariant and multiply-oriented) transforms. Examples of such image denoising methods can be seen in [1], [2], [3], [4].

多解析度變換可以實現空間局部細節的良好稀疏性，例如邊緣和奇點。因為這些細節通常在自然圖像中很豐富並且傳達??了嵌入其中的信息的重要部分，所以這些變換已經發現了用於圖像去噪的重要應用。最近，許多基於多解析度變換的高級變性方法已經被開發了出來，這些方法依賴於通常過度完整（例如，平移不變和乘法導向）變換的係數之間的精細統計依賴性。在[1]，[2]，[3]，[4]中可以看到這種圖像去噪方法的例子。

Not limited to the wavelet techniques, the overcomplete representations have traditionally played an important role in improving the restoration abilities of even the most basic transform-based methods. This is manifested by the sliding-window transform-domain image denoising methods [5], [6] where the basic idea is to apply shrinkage in local (windowed) transform domain. There, the overlap between successive win- dows accounts for the overcompleteness, while the transform itself is typically orthogonal, e.g. the 2D DCT.

不僅限於小波技術，過度完整的表示傳統上在提高即使是最基本的基於變換的方法的恢復能力方面也起著重要作用。這通過滑動窗口變換域圖像去噪方法[5]，[6]表明，其中基本思想是在局部（窗口化）變換域中應用收縮。在那裡，連續窗口之間的重疊解釋了過度完整性，而變換本身通常是正交的，例如， 2D DCT。

However, the overcompleteness by itself is not enough to compensate for the ineffective shrinkage if the adopted transform cannot attain a sparse representation of certain image details. For example, the 2D DCT is not effective in representing sharp transitions and singularities, whereas wavelets would typically perform poorly for textures and smooth transitions. The great variety in natural images makes impossible for any fixed 2D transform to achieve good sparsity for all cases. Thus, the commonly used orthogonal transforms can achieve sparse representations only for particular image patterns.

然而，如果所採用的變換不能獲得某些圖像細節的稀疏表示，則過度完整性本身不足以補償無效收縮。例如，2D DCT在表示銳利過渡和奇點方面無效，而小波通常對紋理和平滑過渡表現不佳。自然圖像的多樣性使得任何固定的2D變換都不可能在所有情況下實現良好的稀疏性。因此，常用的正交變換可以僅針對特定圖像模式實現稀疏表示。

The adaptive principal components of local image patches was proposed by Muresan and Parks [7] as a tool to overcome the mentioned drawbacks of standard orthogonal transforms. This approach produces good results for highly-structured image patterns. However, the computation of the correct PCA basis is essentially deteriorated by the presence of noise. With similar intentions, the K-SVD algorithm [8] by Elad and Aharon utilizes highly overcomplete dictionaries obtained via a preliminary training procedure. A shortcoming of these techniques is that both the PCA and learned dictionaries impose a very high computational burden.

Muresan和Parks [7]提出了局部圖像塊的自適應主成分作為克服標準正交變換的上述缺點的工具。這種方法為高度結構化的圖像模式產生了良好的結果。然而，正確的PCA基礎的計算基本上由於雜訊的存在而惡化。出於類似的意圖，Elad和Aharon的K-SVD演算法[8]利用通過初步訓練程序獲得的高度過完備的詞典。這些技術的缺點是PCA和學習詞典都會產生非常高的計算負擔。

Another approach [9] is to exploit a shape-adaptive transform on neighborhoods whose shapes are adaptive to salient image details and thus contain mostly homogeneous signal. The shape-adaptive transform can achieve a very sparse representation of the true signal in these adaptive neighborhoods.

另一種方法[9]是在其形狀適應於顯著的圖像細節，因此主要包含同質信號的鄰域上利用形狀自適應變換。形狀自適應變換可以在這些自適應鄰域中實現真實信號的非常稀疏的表示。

Recently, an elaborate adaptive spatial estimation strategy, the non-local means, was introduced [10]. This approach is different from the transform domain ones. Its basic idea is to build a pointwise estimate of the image where each pixel is obtained as a weighted average of pixels centered at regions that are similar to the region centered at the estimated pixel. The estimates are non-local as in principle the averages can be calculated over all pixels of the image. A signicant extension of this approach is the exemplar-based estimator [11], which exploits pairwise hypothesis testing to dene adaptive non-local estimation neighborhoods and achieves results competitive to the ones produced by the best transform- based techniques.

最近，一種精細的自適應空間估計策略，即NL-means方法[10]被提出來了。這種方法與變換域方法不同。其基本思想是建立圖像的逐點估計，其中獲得每個像素，作為以與估計像素為中心的區域相似的區域為中心的像素的加權平均。估計是非局部的，因為原則上可以在圖像的所有像素上計算平均值。這種方法的一個重要擴展是基於樣本的估計[11]，它利用成對假設檢驗來定義自適應非局部估計鄰域，並獲得與基於最佳變換的技術產生的結果競爭的結果。

In this paper, we propose a novel image denoising strategy based on an enhanced sparse representation in transform-domain. The enhancement of the sparsity is achieved by group- ing similar 2D fragments of the image into 3D data arrays which we call groups. Collaborative filtering is a special procedure developed to deal with these 3D groups. It includes three successive steps: 3D transformation of a group, shrinkage of transform spectrum, and inverse 3D transformation. Thus, we obtain the 3D estimate of the group which consists of an array of jointly filtered 2D fragments. Due to the similarity between the grouped blocks, the transform can achieve a highly sparse representation of the true signal so that the noise can be well separated by shrinkage. In this way, the collaborative filtering reveals even the nest details shared by grouped fragments and at the same time it preserves the essential unique features of each individual fragment.

在本文中，我們提出了一種基於變換域中增強稀疏表示的新型圖像去噪策略。通過將圖像的類似2D片段分組為3D數??據陣列（我們稱之為「組」）來實現稀疏性的增強。協作過濾是為處理這些3D組而開發的一種特殊程序。它包括三個連續的步驟：組的3D變換，變換譜的收縮和逆3D變換。藉此，我們獲得由一組聯合濾波的2D片段組成的組的3D估計。由於分組塊之間的相似性，變換可以實現真實信號的高度稀疏表示，從而可以通過收縮很好地分離雜訊。通過這種方式，協作過濾甚至可以顯示分組片段共享的嵌套細節，同時保留每個片段的基本獨特特徵。

An image denoising algorithm based on this novel strategy is developed and described in detail. It generalizes and improves our preliminary algorithm introduced in [12]. A very efficient algorithm implementation offering effective complexity/performance trade-off is developed. Experimental results demonstrate that it achieves outstanding denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality, superior to the current state-of-the- art. Extension to color-image denoising based on [13] is also presented.

基於該新策略的圖像去噪演算法已被開發並詳細描述。它概括和改進了我們在[12]中介紹的初步演算法。它已被有效實現，提供了有效的複雜性/性能權衡。實驗結果表明，它在峰值信噪比和主觀視覺質量方面都達到了出色的去噪性能，優於目前的最新技術水平。還給出了基於[13]的彩色圖像去噪擴展。

The paper is organized as follows. We introduce the grouping and collaborative filtering concepts in Section II. The developed image denoising algorithm is described in Section III. An efficient and scalable realization of this algorithm can be found in Section IV and its extension to color-image denoising is given in Section V. Experimental results are presented in Section VI. Section VII gives an overall discussion of the developed approach and Section VIII contains relevant conclusions.

本文的結構如下。我們在第二部分介紹了分組和協作過濾概念。開發的圖像去噪演算法在第III節中描述。該演算法的有效且可擴展的實現可以在第IV節中找到，其對彩色圖像去噪的擴展在第V節中給出。實驗結果在第VI節中給出。第七節對已提出的方法進行了全面討論，第八節包含了相關結論。

II. GROUPING AND COLLABORATIVE FILTERING

分組和協作過濾

We denominate grouping the concept of collecting similar d-dimensional fragments of a given signal into a d+1- dimensional data structure that we term group. In the case of images for example, the signal fragments can be arbitrary 2D neighborhoods (e.g. image patches or blocks). There, a group is a 3D array formed by stacking together similar image neighborhoods. If the neighborhoods have the same shape and size, the formed 3D array is a generalized cylinder. The importance of grouping is to enable the use of a higher-dimensional filtering of each group, which exploits the potential similarity (correlation, afnity, etc.) between grouped fragments in order to estimate the true signal in each of them. This approach we denominate collaborative filtering.

我們將術語組的概念定義為：收集與給定信號的類似的d維片段，形成d + 1維的數據結構。例如，在圖像的情況下，信號片段可以是任意2D鄰域（例如圖像塊或塊）。在那裡，組是通過將類似的圖像鄰域堆疊在一起而形成的3D陣列。如果鄰域具有相同的形狀和大小，則形成的3D陣列是廣義圓柱體。分組的重要性在於能夠使用每個組的更高維度的過濾，這利用了潛在的相似性（分組，片段等）在分組片段之間，以便估計每個片段中的真實信號。這種方法我們稱之為協同過濾。

A. Grouping

分組

Grouping can be realized by various techniques; e.g., K-means clustering [14], self-organizing maps [15], fuzzy clustering [16], vector quantization [17], and others. There exist a vast literature on the topic; we refer the reader to [18] for a detailed and systematic overview of these approaches.

分組可以通過各種技術實現;例如，K-means聚類[14]，自組織映射[15]，模糊聚類[16]，矢量量化[17]等。關於這個主題存在大量文獻;我們請讀者[18]詳細，系統地概述這些方法。

Similarity between signal fragments is typically computed as the inverse of some distance measure. Hence, a smaller distance implies higher similarity. Various distance measures can be employed, such as the p-norm of the difference between two signal fragments. Other examples are the weighted Euclidean distance (p = 2) used in the non-local means estimator [10], and also the normalized distance used in the exemplar-based estimator [11]. When processing complex or uncertain (e.g. noisy) data it might be necessary to rst extract some features from the signal and then to measure the distance for these features only [18].

通常將信號片段之間的相似度計算為某個距離測量的倒數。因此，較小的距離意味著較高的相似性。可以採用各種距離測量，例如兩個信號片段之間差異的p範數。其他例子是在非局部均值估計器[10]中使用的加權歐幾里德距離（p = 2），以及在基於範例的估計器[11]中使用的歸一化距離。當處理複雜或不確定（例如雜訊）數據時，可能需要首先從信號中提取一些特徵，然後僅測量這些特徵的距離[18]。

B. Grouping by matching

按匹配分組

Grouping techniques such as vector quantization or K- means clustering are essentially based on the idea of partitioning. It means that they build groups or clusters (classes) which are disjoint, in such a way that each fragment belongs to one and only one group. Constructing disjoint groups whose elements enjoy high mutual similarity typically requires recursive procedures and can be computationally demanding [18]. Furthermore, the partitioning causes unequal treatment of the different fragments because the ones that are close to the centroid of the group are better represented than those far from it. This happens always, even in the special case where all fragments of the signal are equidistantly distributed.

諸如矢量量化或K均值聚類之類的分組技術基本上基於分區的思想。這意味著它們構建不相交的組或集群（類），使得每個片段屬於一個且僅屬於一個組。構造其元素具有高度相互相似性的不相交群通常需要遞歸過程並且可能在計算上要求很高[18]。此外，分割導致不同片段的不均等處理，因為接近該組質心的那些比遠離它的那些更好地表示。這種情況總是發生，即使在信號的所有片段等距分布的特殊情況下也是如此。

A much simpler and effective grouping of mutually similar signal fragments can be realized by matching, where in contrast to the above partitioning methods, the formed groups are not necessarily disjoint. Matching is a method for finding signal fragments similar to a given reference one. That is achieved by pairwise testing the similarity between the reference fragment and candidate fragments located at different spatial locations. The fragments whose distance (i.e. dissimilarity) from the reference one is smaller than a given threshold are considered mutually similar and are subsequently grouped. The similarity plays the role of the membership function for the considered group and the reference fragment can be considered as some sort of centroid for the group. Any signal fragment can be used as a reference one and thus a group can be constructed for it.

通過匹配可以實現更簡單和有效的相互類似的信號片段的分組，其與上述分割方法相反，所形成的組不一定是不相鄰的。匹配是一種用於查找與給定參考信號片段類似的信號片段的方法。這是通過成對測試參考片段和位於不同空間位置的候選片段之間的相似性來實現的。與參考片段的距離（即，不相似性）小於給定閾值的片段被認為是彼此相似的並且隨後被分組。相似性起到所關注組的隸屬函數的作用，並且參考片段可以被認為是該組的某種質心。任何信號片段都可以用作參考信號片段，因此可以為其構建組。

We remark that for most distance measures, establishing a bound on the distance between the reference fragment and all of the matched ones means that the distance between any two fragments in that group is also bounded. Roughly speaking, this bound is the diameter of the group. While for an arbitrary distance measure such a statement may not hold precisely, for the case of metrics (e.g., -norms) it is just a direct consequence of the triangle inequality.

我們注意到，對於大多數距離測量，建立參考片段與所有匹配片段之間距離的界限意味著該組中任何兩個片段之間的距離也是有界的。粗略地說，這個界限是該組的直徑。雖然對於任意距離測量，這樣的陳述可能不能精確地保持，對於度量（例如， -範數）的情況，它只是三角不等式的直接結果。

Block-matching (BM) is a particular matching approach that has been extensively used for motion estimation in video compression (MPEG 1, 2, and 4, and H.26x). As a particular way of grouping, it is used to find similar blocks, which are then stacked together in a 3D array (i.e. a group). An illustrative example of grouping by block-matching for images is given in Figure 1, where we show a few reference blocks and the ones matched as similar to them.

塊匹配（BM）是一種特殊的匹配方法，已廣泛用於視頻壓縮（MPEG 1,2和4以及H.26x）中的運動估計。作為分組的特定方式，它用於找到類似的塊，然後將它們堆疊在一起形成3D陣列（即組）。在圖1中給出了通過圖像的塊匹配進行分組的說明性示例，其中我們示出了一些參考塊以及與它們相似的匹配的參考塊。

Fig. 1. Illustration of grouping blocks from noisy natural images corrupted by white Gaussian noise with standard deviation 15 and zero mean. Each fragment shows a reference block marked with "R" and a few of the blocks matched to it.

圖1.來自帶有雜訊自然圖像的分組塊的圖示，其中雜訊為標準偏差15和零均值的白雜訊。每個片段顯示標有「R」的參考塊和與其匹配的一些塊。

C. Collaborative filtering

協同過濾

Given a group of n fragments, the collaborative filtering of the group produces n estimates, one for each of the grouped fragments. In general, these estimates can be different. The term collaborative is taken literally, in the sense that each grouped fragment collaborates for the filtering of all others, and vice versa.

給定一組n個片段，該組的協同過濾產生n次估計，每個分好組的片段一次。通常，這些估計可能不同。術語「協作」是從字面上理解的，即每個分好組的片段協作過濾所有其他片段，反之亦然。

Let us consider an illustrative example of collaborative filtering for the estimation of the image in Figure 2 from an observation (not shown) corrupted by additive zero-mean independent noise. In particular, let us focus on the already grouped blocks shown in the same figure. These blocks exhibit perfect mutual similarity, which makes the elementwise averaging (i.e. averaging between pixels at the same relative positions) a suitable estimator. Hence, for each group, this collaborative averaging produces estimates of all grouped blocks. Because the corresponding noise-free blocks are assumed to be identical, the estimates are unbiased. Therefore, the final estimation error is due only to the residual variance which is inversely proportional to the number of blocks in the group. Regardless of how complex the signal fragments are, we can obtain very good estimates provided that the groups contain a large number of fragments.

讓我們考慮一個用協同濾波估計圖2中的由加性零均值獨立雜訊破壞圖片的（未示出）的說明性示例。特別的，讓我們關注同一圖中所示的已經分組的塊。這些塊表現出完美的相互相似性，這使得元素平均（即，在相同相對位置的像素之間的平均）成為合適的估計器。因此，對於每個組，這種協作平均產生所有分組塊的估計。因為假設相應的無雜訊塊是相同的，所以估計是無偏的。因此，最終估計誤差僅歸因於與組中的塊數成反比的殘差方差。無論信號片段有多複雜，我們都可以獲得非常好的估計，條件是這些組包含大量片段。

However, perfectly identical blocks are unlikely in natural images. If non-identical fragments are allowed within the same group, the estimates obtained by elementwise averaging become biased. The bias error can account for the largest share of the overall final error in the estimates, unless one uses an estimator that allows for producing a different estimate of each grouped fragment. Therefore, a more effective collaborative filtering strategy than averaging should be employed.

但是，在自然圖像中不太可能完全相同的塊。如果在同一組內允許不相同的片段，則通過元素平均獲得的估計變得有偏差。除非使用允許對每個分組片段產生不同估計的估計器，否則偏差誤差可以占估計中總體最終誤差的最大份額。因此，應採用比平均更有效的協同過濾策略。