Project: Video Denoising and Enhancement via Dynamic Sparse + Low-Rank Matrix Decomposition

Main idea

Video denoising refers to the problem of removing “noise” from a video sequence. Here the term “noise” is used in a broad sense to refer to any corruption or outlier or interference that is not the quantity of interest. In this work, we develop a novel approach to video denoising that is based on the idea that many noisy or corrupted videos can be split into three parts - the ‘‘low-rank layer“, the ‘‘sparse layer”, and everything else (which is small and bounded).

Our proposed algorithm consists of two parts. At each time instant, it first separate the video into “noisy” version of the two layers ell_t and s_t. This is followed by applying an existing state-of-the-art denoising algorithm VBM3D [1] on each layer. In doing this, VBM3D exploits the specific characteristics of each layer and, hence, is able to find more matched blocks to filter over, resulting in better denoising performance.

Example applications

Why Low-Rank + Sparse? Here are some examples:

  • In very low-light videos of moving targets/objects (the moving target is barely visible), the denoising goal is to “see” the barely visible moving targets (sparse). These are hard to see because they are corrupted by slowly-changing background images (well modeled as forming the low-rank layer plus the residual). See following dark video example:

  • In a traditional denoising scenario, consider slowly changing videos that are corrupted by salt-and-pepper noise (or other impulsive noise). For these type of videos, the large magnitude part of the noise forms the “sparse layer”, while the video-of-interest (slowly-changing in many applications, e.g., waterfall) forms the approximate ‘‘low-rank layer’’. The approximation error in the low-rank approximation forms the ‘‘small bounded residual". See following waterfall-salt-pepper video for an example of this:

  • More generally, consider slow-changing videos corrupted by very large variance white Gaussian noise. As we explain in the paper, large Gaussian noise can, with high probability, be split into a very sparse noise component plus bounded noise. See following waterfall-large-Gaussian video for an example of this:

Problem formulation

Let m_t denote the image at time t arranged as a 1D vector. We consider denoising for videos in which image can be split as m_t = ell_t + s_t + w_t, where s_t is a sparse vector, ell_t's lie in a fixed or slowly changing low-dimensional subspace so that the matrix L :=[ell_1, ell_2,ldots, ell_{t_{max}}] is low-rank, and w_t is the residual noise that satisfies | w_t |_{infty}leq b_w.

Overall algorithm – ReProCS-based Layering Denoising (ReLD)

  • For t < t_0, initialization using PCP [2].

  • For all t > t_0, implement an appropriately modified ReProCS algorithm [3]

    • Split the video frame m_t into layers hat{ell}_t and hat{s}_t

    • For every alpha frames, perform subspace update, i.e., update hat{P}_t

  • Denoise using VBM3D

Experiments

We compare ReLD with VBM3D [1], SLMA [4], and MLP [5] on different datasets and compare their denoising performance. We summarize the experimental results in following table (PSNR and running time in second):

dataset sigma ReLD VBM3D MLP SLMA
Waterfall 25 35.00 (73.54) 32.02 (24.83) 28.26 (477.22) *
Waterfall 30 34.51 (73.33) 30.96 (23.96) 26.96 (474.26) *
Waterfall 50 33.08 (73.14) 27.99 (24.14) 18.87 (477.60) *
Waterfall 70 29.25 (69.77) 24.42 (21.01) 15.03 (478.73) *
Escalator 25 31.01 (16.64) 30.32 (5.34) 25.53 (107.51) 21.17 (3.09times 10^4)
Escalator 30 30.27 (16.45) 29.29 (5.38) 24.54 (108.65) 20.49 (3.15times 10^4)
Escalator 50 27.84 (16.03) 25.10 (5.27) 18.83 (109.40) 17.98 (3.21times 10^4)
Escalator 70 25.15 (15.28) 20.20 (4.72) 15.20 (108.78) 15.90 (3.18times 10^4)
Fountain 25 32.67 (16.70) 31.18 (5.44) 26.86 (105.64) 22.93 (3.05times 10^4)
Fountain 30 32.25 (15.84) 30.26 (5.17) 25.67 (107.41) 21.85 (3.06times 10^4)
Fountain 50 30.53 (15.82) 26.55 (5.24) 18.53 (109.79) 18.55 (3.13times 10^4)
Fountain 70 27.53 (15.03) 22.08 (4.69) 14.85 (107.52) 16.25 (3.19times 10^4)
Curtain 25 35.47 (16.78) 34.60 (4.15) 31.14 (189.14) 23.28 (7.75times 10^4)
Curtain 30 34.58 (17.35) 33.59 (4.37) 28.90 (191.14) 22.74 (9.05times 10^4)
Curtain 50 31.91 (17.17) 30.29 (4.42) 18.58 (188.30) 19.12 (7.86times 10^4)
Curtain 70 28.10 (16.50) 26.15 (3.85) 14.73 (192.00) 16.68 (8.30times 10^4)
Lobby 25 39.78 (57.96) 35.00 (19.57) 29.22 (384.11) 23.43 (3.75times 10^5)
Lobby 30 38.76 (57.99) 33.64 (19.09) 27.72 (395.67) 21.15 (3.82times 10^5)
Lobby 50 35.15 (58.41) 29.23 (19.35) 18.66 (403.59 ) 18.21 (3.99times 10^5)
Lobby 70 29.68 (56.51) 24.90 (17.00) 14.85 (401.29) 16.82 (4.09times 10^5)

*: Waterfall dataset is a long sequence, and based on the code provided by authors of SLMA, we were unable to get any results due to extremely low speed.

As can be seen from the table, our algorithm ReLD outperforms all other algorithms in all cases.

Next we visually compare the denoising performances. Since ReLD and VBM3D are the best two among above algorithms, for ease of display we only show results of these two. The noise being added are i.i.d. Gaussian with sigma=70.

  • Comparison on Waterfall Dataset:


  • Comparison on Escalator Dataset:


  • Comparison on Fountain Dataset:


  • Comparison on Curtain Dataset:


  • Comparison on Lobby Dataset:


Note that denoising with noise level sigma =70 is a difficult task – both ReLD and VBM3D inevitably result in blurring effect. However, as can be seen from above videos, ReLD preserves more details than VBM3D, e.g., the white board in the Curtain dataset and the book shelf in the Lobby dataset.

Demo code

Click here.

References

[1] Kostadin Dabov, Alessandro Foi, and Karen Egiazarian, “Video denoising by sparse 3D transform-domain collaborative filtering,” 2007.
[2] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright, “Robust Principal Component analysis?”, Journal of ACM, 2011.
[3] Han Guo, Chenlu Qiu, and Namrata Vaswani, “An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from Their Sum,” IEEE Trans. on Sig. Proc., 2014.
[4] Hui Ji, Sibin Huang, Zuowei Shen, and Yuhong Xu, “Robust Video Restoration by Joint Sparse and Low Rank Matrix Approximation,” SIAM Journal on Imaging Sciences, 2011.
[5] Harold C Burger, Christian J Schuler, and Stefan Harmeling, “Image Denoising: Can Plain Neural Networks Compete with BM3D?,” CVPR 2012.