Total variation (TV) denoising, also known as TV regularization or TV filtering, is a powerful technique widely used in various fields, including medical imaging, computer vision, etc. It removes noises while preserving most important structural features. The first image of black hole, captured by Event Horizon Telescope (EHT), was processed and revealed with this technique in 2019. The concept was proposed in 1992 by Rudin, Osher and Fatemi, known as the ROF model which is a continuous form of TV denoising problem.

What is Total Variation

The term, total variation, refers to a mathematical concept that is a little hard to understand for me. Nonrigorously, the TV of a function u(x)u(x) is the integral of its derivative magnitude within its bouned domain Ω\Omega:

uTV=Ωu(x)dx\begin{equation} \begin{split} \|u\|_{TV} = \int_{\Omega} |\nabla u(x)| dx \end{split} \end{equation}

For practical use, a discretized version is more favorable. There are two common discretized TVs in literatures, the isotropic and anisotropic TVs. Suppose that u(X)u(\mathcal{X}) is a function of an order-NN tensor grid XRI1×I2×IN\mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots I_N}, the isotropic TV is defined as:

uTV=i1i2iN(I1u)i1i2iN2++(INu)i1i2iN2\begin{equation} \begin{split} \|u\|_{TV} = \sum_{i_1} \sum_{i_2} \cdots \sum_{i_N} \sqrt{\left(\nabla_{I_1} u \right)_{i_1 i_2 \cdots i_N}^2 + \cdots + \left(\nabla_{I_N} u \right)_{i_1 i_2 \cdots i_N}^2} \end{split} \end{equation}

where Iku\nabla_{I_k} u is the derivatives along the IkI_k dimension. The isotropic TV is invariant to rotation of the domain that is if you rotate the image arbitrarily, the isotropic TV would not change.

The anisotropic TV is defined as:

uTV=i1i2iN(I1u)i1i2iN++(INu)i1i2iN\begin{equation} \begin{split} \|u\|_{TV} = \sum_{i_1} \sum_{i_2} \cdots \sum_{i_N} |\left(\nabla_{I_1} u \right)_{i_1 i_2 \cdots i_N}| + \cdots + |\left(\nabla_{I_N} u \right)_{i_1 i_2 \cdots i_N}| \end{split} \end{equation}

which is not rotation invariant. There is no difference between the isotropic and anisotropic TVs for 1D signals.

Discrete Derivatives

There are many ways to define derivatives for discretized signals. For better understanding, I will use a 1-d signal u(x)u(x) (and its discrete form ui,i=0,1,2,,N1u_i,i=0,1,2,\cdots,N-1) as an example to illustrate all these ways. Let xu\nabla_{x} u denote the derivatives of u(x)u(x) in the x direction, (x+u)i=ui+1ui(\nabla_{x}^+ u)_i = u_{i+1} - u_i denote the forward difference, and (xu)i=uiui1(\nabla_{x}^- u)_i = u_{i} - u_{i-1} denote the backward difference. A few definitions of derivatives are:

  • one-sided difference, (xu)i2=(x+u)i2(\nabla_{x}u)^2_{i} = (\nabla_{x}^+ u)^2_i
  • central difference, (xu)i2=(((x+u)i+(xu)i)/2)2(\nabla_{x}u)^2_{i} = (((\nabla_{x}^+ u)_i + (\nabla_{x}^- u)_i)/2)^2
  • geometric average, (xu)i2=((x+u)i2+(xu)i2)/2(\nabla_{x}u)^2_{i} = ((\nabla_{x}^+ u)^2_i + (\nabla_{x}^- u)^2_i)/2
  • minmod difference, (xu)i2=m((x+u)i,(xu)i)(\nabla_{x}u)^2_{i} = m((\nabla_{x}^+ u)_i, (\nabla_{x}^- u)_i), where m(a,b)=(sign(a)+sign(b)2)min(a,b)m(a,b)=\left(\frac{sign(a) + sign(b)}{2}\right)min(|a|, |b|)
  • upwind discretization, (xu)i2=(max((x+u)i,0)2+max((xu)i,0)2)/2(\nabla_{x}u)^2_{i} = (max((\nabla_{x}^+ u)_i, 0)^2+max((\nabla_{x}^- u)_i, 0)^2)/2

The one-sided difference (or forward difference) may be the most common way to discretize the derivatives but it’s not symmetric. Although the central difference is symmetric, it is not able to reveal thin and small structures (considering a single point with one and others are zero, the central difference at this point would be zero which is counter-intuitive). The geometric average, minmod difference and upwind discretization are both symmectic and being able to reveal thin structures at the cost of nonlinearity. I will use one-sided difference in the following content.

Since we are dealing with finite signals, special handling is needed on the boundaries. I have seen two common ways to do that in literatures. One is reflective extension and the other is circulant extension. The reflective extention assumes that the signals extend in a reflective way and the circulant extension assumes that the signals extend in a circulant way.

For the reflective extension, the matricized differential operator and its adjoint operator are:

D=[1111110]D=[1111110]=DT\begin{equation} \begin{split} \mathbf{D} &= \begin{bmatrix} -1&1&&&\\ &-1&1&&\\ & &\ddots&\ddots&\\ & & &-1&1\\ &&&&0 \end{bmatrix}\\ \mathbf{D}^{\ast} &= - \begin{bmatrix} 1&&&&\\ -1&1&&&\\ &\ddots&\ddots&&\\ & &-1&1&\\ &&&-1&0 \end{bmatrix} = \mathbf{D}^T\\ \end{split} \end{equation}

For the circulant extension, the matricized differential operator and its adjoint operator are:

D=[11111111]D=[11111111]=DT\begin{equation} \begin{split} \mathbf{D} &= \begin{bmatrix} -1&1&&&\\ &-1&1&&\\ & &\ddots&\ddots&\\ & & &-1&1\\ 1&&&&-1 \end{bmatrix}\\ \mathbf{D}^{\ast} &= - \begin{bmatrix} 1&&&&-1\\ -1&1&&&\\ &\ddots&\ddots&&\\ & &-1&1&\\ &&&-1&1 \end{bmatrix} = \mathbf{D}^T\\ \end{split} \end{equation}

As we’ve seen, the adjoint operators of differential operators are the negative backward differences. In practice, we don’t need to construct explicit matrices for those operators as long as we know how to perform the corresponding operations.

Total Variation Denoising

A general TV denoising problem minimizes the following objective function:

arg minuuTV+λ2A(u)v2\begin{equation} \begin{split} \argmin_{u} \|u\|_{TV} + \frac{\lambda}{2} \|A(u) - v\|^2 \end{split} \end{equation}

where u,vu, v are functions of a given tensor grid X\mathcal{X}, AA is a linear operator, and λ\lambda controls how much smoothing is performed.

1D TV

Let’s make Equation (6) more concise for the 1D case:

arg minuRNDu1+λ2Auv22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^N} \|\mathbf{D} \mathbf{u} \|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation}

where ARM×N\mathbf{A} \in \mathbb{R}^{M \times N} and D\mathbf{D} is the differential operator defined above.

Iterative Clipping

The iterative clipping method solves the 1D TV denoising problem by solving its dual form.

A useful fact is that the absolute value x|x| can be written as an optimization problem and the l1l_1 norm likewisely$:

x=maxz1zxx1=maxz1zTx\begin{equation} \begin{split} |x| &= \max_{|z| \le 1} zx \\ \|\mathbf{x}\|_1 &= \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{x} \end{split} \end{equation}

where z1|\mathbf{z}| \le 1 denotes each element of z\mathbf{z} is less than or equals to 1.

Let F(u)=Du1+λ2Auv22F(\mathbf{u}) = \|\mathbf{D} \mathbf{u} \|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2, the objective function can be written as:

F(u)=maxz1zTDu+λ2Auv22\begin{equation} \begin{split} F(\mathbf{u}) &= \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation}

thus the Equation (7) can be written as:

argminumaxz1zTDu+λ2Auv22\begin{equation} \begin{split} \arg &\min_{\mathbf{u}} \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2}\|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation}

The minmax theorem is that if the function f(x,y)f(x, y) is concave in x and convex in y, then the following equlity holds:

maxxminyf(x,y)=minymaxxf(x,y)\begin{equation} \begin{split} \max_{x} \min_{y} f(x, y) = \min_{y} \max_{x} f(x, y) \end{split} \end{equation}

then we can change the order of minmax in Equation (10):

argmaxz1minuzTDu+λ2Auv22\begin{equation} \begin{split} \arg \max_{|\mathbf{z}| \le 1} \min_{\mathbf{u}} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2}\|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation}

which is a dual form of the original problem.

The solution of the inner minimization problem is:

uk+1=(ATA)(ATv1λDTzk)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\left(\mathbf{A}^T\mathbf{v} - \frac{1}{\lambda}\mathbf{D}^T\mathbf{z}_k\right) \end{split} \end{equation}

Substituting Equation (13) back into Equation (12) gives:

argminz1zTD(ATA)DTz2λ(DAv)Tz\begin{equation} \begin{split} \arg \min_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\mathbf{z} - 2\lambda \left(\mathbf{D}\mathbf{A}^{\dag}\mathbf{v}\right)^T\mathbf{z} \end{split} \end{equation}

There are many ways to solve this quadratic form, one is by the majorization-minimization (MM) method. Given a function F(x)F(x), the MM method chooses an auxiliary function Gk(x)G_k(x) such that Gk(x)F(x), Gk(xk)=F(xk)G_k(x) \ge F(x),\ G_k(x_k) = F(x_k), then solves xk+1=arg minxGk(x)x_{k+1}=\argmin_{x} G_k(x). The sequence xkx_k converges to the minimizer of F(x)F(x) when F(x)F(x) is convex.

We construct such a function Gk(z)G_k(\mathbf{z}) by adding (zzk)T(αID(ATA)DT)(zzk)\left(\mathbf{z} - \mathbf{z}_k\right)^T\left(\alpha\mathbf{I} - \mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\right)(\mathbf{z} - \mathbf{z}_k) to Equation (14):

Gk(z)=zTD(ATA)DTz2λ(DAv)Tz+(zzk)T(αID(ATA)DT)(zzk)\begin{equation} \begin{split} G_k(\mathbf{z}) &= \mathbf{z}^T\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\mathbf{z} - 2\lambda \left(\mathbf{D}\mathbf{A}^{\dag}\mathbf{v}\right)^T\mathbf{z}\\ &+\left(\mathbf{z} - \mathbf{z}_k\right)^T\left(\alpha\mathbf{I} - \mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\right)(\mathbf{z} - \mathbf{z}_k) \end{split} \end{equation}

where αλ0(D(ATA)DT)\alpha \ge \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T) (the max eigenvalue) such that this term is always positive-semidefinite.

Simplifing Equation (15) gives:

argminz1zTz2(zk+λαDuk+1)Tz\begin{equation} \begin{split} \arg \min_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{z} - 2\left(\mathbf{z}_k + \frac{\lambda}{\alpha} \mathbf{D} \mathbf{u}_{k+1}\right)^T\mathbf{z} \end{split} \end{equation}

which is a separable quadratic optimization problem for each element of z\mathbf{z}. The solution of the above problem is:

zk+1=clip(zk+λαDuk+1,1)\begin{equation} \begin{split} \mathbf{z}_{k+1} &= clip(\mathbf{z}_k + \frac{\lambda}{\alpha} \mathbf{D}\mathbf{u}_{k+1}, 1) \end{split} \end{equation}

where clip(x,T)clip(x, T) is a function clippling the input xx:

clip(x,T)={xxTsign(x)Totherwise\begin{equation} \begin{split} clip(x, T) = \begin{cases} |x| & |x| \le T\\ sign(x)T & otherwise\\ \end{cases} \end{split} \end{equation}

We can scale z\mathbf{z} by 1λ\frac{1}{\lambda}, then the iterative clipping method iterates as the following:

uk+1=(ATA)(ATvDTzk)zk+1=clip(zk+1αDuk+1,1λ)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\left(\mathbf{A}^T\mathbf{v} - \mathbf{D}^T\mathbf{z}_k\right)\\ \mathbf{z}_{k+1} &= clip(\mathbf{z}_k + \frac{1}{\alpha} \mathbf{D}\mathbf{u}_{k+1}, \frac{1}{\lambda})\\ \end{split} \end{equation}

where αλ0(D(ATA)DT)\alpha \ge \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T).

It turns out that the method can be accelerated with a smaller α\alpha value by contraction mapping principle (basically, it means that there is a fixed point such that f(x)=xf(x)=x). To make zk+1αDuk+1\mathbf{z}_k + \frac{1}{\alpha} \mathbf{D}\mathbf{u}_{k+1} a contraction function, we need to make sure I1αD(ATA)DT\mathbf{I} - \frac{1}{\alpha}\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T is a contraction function. It suggests that α>λ0(D(ATA)DT)/2\alpha \gt \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T)/2, which halves the original value.

If A=I\mathbf{A} = \mathbf{I}, then Equation (7) becomes the naive TV denoising problem. It turns out that λ0(DDT)\lambda_0(\mathbf{D}\mathbf{D}^T) is less than 4 regardless of NN for both definitions in Equation (4) and (5), thus α=2.3\alpha = 2.3 is an appropriate option for A=I\mathbf{A} = \mathbf{I}.

Majorization Minimization

We could also derive an algorithm with the MM method directly. Given F(x)=xF(x) = |x| for scalar xx, then G(x)=12xkx2+12xkG(x) = \frac{1}{2|x_k|}x^2 + \frac{1}{2}|x_k| such that G(x)F(x)G(x) \ge F(x) and G(xk)=F(xk)G(x_k) = F(x_k). For the l1l_1 norm, G(x)G(\mathbf{x}) is:

G(x)=12xTΛk1x+12xk1\begin{equation} \begin{split} G(\mathbf{x}) &= \frac{1}{2} \mathbf{x}^T\mathbf{\Lambda}_k^{-1}\mathbf{x} + \frac{1}{2}\|\mathbf{x}_k\|_1 \end{split} \end{equation}

where Λk=diag(xk)\mathbf{\Lambda}_k=diag(|\mathbf{x}_k|).

A majorizer of the TV cost function in Equation (7) is:

arg minu12uTΛk1u+12uk1+λ2Auv22\begin{equation} \begin{split} \argmin_{\mathbf{u}} \frac{1}{2} \mathbf{u}^T\mathbf{\Lambda}_k^{-1}\mathbf{u} + \frac{1}{2}\|\mathbf{u}_k\|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation}

Since the l1l_1 term above is now a constant, it’s easier to derive an explicit solution:

uk+1=(ATA+1λDTΛk1D)ATv\begin{equation} \begin{split} \mathbf{u}_{k+1} = \left(\mathbf{A}^T\mathbf{A} + \frac{1}{\lambda} \mathbf{D}^T\mathbf{\Lambda}_k^{-1}\mathbf{D}\right)^{\dag} \mathbf{A}^T \mathbf{v} \end{split} \end{equation}

A problem with this iteration form is that as the iterations progress, some values of Λk\mathbf{\Lambda}_k will go to zero, causing division-by-zero errors. This issue can be solved with the matrix inverse lemma:

uk+1=((ATA)(ATA)DT(λΛk+D(ATA)DT)D(ATA))ATv=Av(ATA)DT(λΛk+D(ATA)DT)DAv\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\left(\mathbf{A}^T\mathbf{A}\right)^{\dag} - \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \right)^{\dag} \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \right) \mathbf{A}^T \mathbf{v}\\ &= \mathbf{A}^{\dag}\mathbf{v} - \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T \left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \right)^{\dag} \mathbf{D} \mathbf{A}^{\dag} \mathbf{v} \end{split} \end{equation}

The complexity of the algorithm would depends on how quick to solve linear equations (λΛk+D(ATA)DT)x=DAv\left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right) \mathbf{D}^T\right) \mathbf{x} = \mathbf{D} \mathbf{A}^{\dag} \mathbf{v}. If A=I\mathbf{A} = \mathbf{I}, then λΛk+DDT\lambda \mathbf{\Lambda}_k + \mathbf{D}\mathbf{D}^T is a banded matrix which can be solved fastly.

The matrix inverse lemma is:

(A+BCD)1=A1A1B(C1+DA1B)1DA1\begin{equation} \begin{split} (\mathbf{A} + \mathbf{B}\mathbf{C}\mathbf{D})^{-1} = \mathbf{A}^{-1} - \mathbf{A}^{-1}\mathbf{B}(\mathbf{C}^{-1} + \mathbf{D}\mathbf{A}^{-1}\mathbf{B})^{-1}\mathbf{D}\mathbf{A}^{-1} \end{split} \end{equation}

Split Bregman

I wrote more about the split Bregman method in another post. Here I briefly give the method. To solve the following problem:

arg minx F(x)1+λG(x)\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ \|F(\mathbf{x})\|_1 + \lambda G(\mathbf{x}) \end{split} \end{equation}

where F(x)F(\mathbf{x}) and G(x)G(\mathbf{x}) are both convex and differentiable. F(x)F(\mathbf{x}) is also linear.

By setting d=F(x)\mathbf{d} = F(\mathbf{x}),the split Bregman method has the following iterations:

xk+1=arg minx λG(x)+μ2F(x)dkbk22dk+1=arg mind d1+μ2F(xk+1)dbk22bk+1=bkF(xk+1)+dk+1\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ \lambda G(\mathbf{x}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}_k - \mathbf{b}_k \|_2^2\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}}\ \|\mathbf{d}\|_1 + \frac{\mu}{2}\|F(\mathbf{x}_{k+1}) - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - F(\mathbf{x}_{k+1}) + \mathbf{d}_{k+1} \end{split} \end{equation}

Now back to our 1D TV denoising problem in Equation (7), let d=Du\mathbf{d}=\mathbf{D}\mathbf{u} and apply the above split Bregman method, we have:

uk+1=arg minu λ2Auv22+μ2Dudkbk22dk+1=arg mind d1+μ2Duk+1dbk22bk+1=bkDuk+1+dk+1\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}}\ \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 + \frac{\mu}{2}\|\mathbf{D}\mathbf{u} - \mathbf{d}_k - \mathbf{b}_k \|_2^2\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}}\ \|\mathbf{d}\|_1 + \frac{\mu}{2}\|\mathbf{D}\mathbf{u}_{k+1} - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - \mathbf{D}\mathbf{u}_{k+1} + \mathbf{d}_{k+1} \end{split} \end{equation}

The first subproblem can be solved by setting the derivative to zero and solving equations:

(λATA+μDTD)uk+1=λATv+μDT(dk+bk)\begin{equation} \begin{split} \left(\lambda \mathbf{A}^T\mathbf{A} + \mu \mathbf{D}^T\mathbf{D}\right)\mathbf{u}_{k+1} = \lambda \mathbf{A}^T\mathbf{v} + \mu \mathbf{D}^T(\mathbf{d}_k + \mathbf{b}_k) \end{split} \end{equation}

and the second subproblem has an explicit solution:

dk+1=S1/μ[Duk+1+bk]\begin{equation} \begin{split} \mathbf{d}_{k+1} = S_{1/\mu}[\mathbf{D}\mathbf{u}_{k+1} + \mathbf{b}_k] \end{split} \end{equation}

where St()S_t(\cdot) is the soft-thresholding operator introduced in this post.

ADMM

Alternating Direction Method of Multipliers (ADMM) is definitely the most widely used algorithm to solve L1-regularized problems, especially in MRI. For a separable problem with equality constraints:

arg minx,y F(x)+G(y)s.t. Ax+By=c\begin{equation} \begin{split} \argmin_{\mathbf{x}, \mathbf{y}}&\ F(\mathbf{x}) + G(\mathbf{y})\\ &s.t.\ \mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} = \mathbf{c} \end{split} \end{equation}

with its augmented Lagrangian form Lρ(x,y,v)=F(x)+G(y)+vT(Ax+Byc)+ρ2Ax+Byc22L_{\rho}(\mathbf{x}, \mathbf{y}, \mathbf{v})= F(\mathbf{x}) + G(\mathbf{y}) + \mathbf{v}^T(\mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} - \mathbf{c}) + \frac{\rho}{2} \|\mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} - \mathbf{c}\|_2^2, the ADMM iteration is like:

xk+1=arg minxLρ(x,yk,vk)yk+1=arg minyLρ(xk+1,y,vk)vk+1=vk+ρ(Axk+1+Byk+1c)\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}} L_{\rho}(\mathbf{x}, \mathbf{y}_k, \mathbf{v}_k)\\ \mathbf{y}_{k+1} &= \argmin_{\mathbf{y}} L_{\rho}(\mathbf{x}_{k+1}, \mathbf{y}, \mathbf{v}_k)\\ \mathbf{v}_{k+1} &= \mathbf{v}_k + \rho (\mathbf{A}\mathbf{x}_{k+1} + \mathbf{B}\mathbf{y}_{k+1} - \mathbf{c}) \end{split} \end{equation}

Let d=Du\mathbf{d}=\mathbf{D}\mathbf{u} and we would find that our 1D TV denoising problem falls within the ADMM form, thus we have the following iteration:

uk+1=arg minuLρ(u,dk,yk)dk+1=arg mindLρ(uk+1,d,yk)yk+1=yk+ρ(Dyk+1dk+1)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} L_{\rho}(\mathbf{u}, \mathbf{d}_k, \mathbf{y}_k)\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}} L_{\rho}(\mathbf{u}_{k+1}, \mathbf{d}, \mathbf{y}_k)\\ \mathbf{y}_{k+1} &= \mathbf{y}_k + \rho (\mathbf{D}\mathbf{y}_{k+1} -\mathbf{d}_{k+1}) \end{split} \end{equation}

where Lρ(u,dk,yk)L_{\rho}(\mathbf{u}, \mathbf{d}_k, \mathbf{y}_k) is:

Lρ(u,d,y)=d1+λ2Auv22+yT(Dud)+ρ2Dud22\begin{equation} \begin{split} L_{\rho}(\mathbf{u}, \mathbf{d}, \mathbf{y}) &= \|\mathbf{d}\|_1 + \frac{\lambda}{2} \|\mathbf{A}\mathbf{u} - \mathbf{v}\|_2^2 \\&+ \mathbf{y}^T(\mathbf{D}\mathbf{u} - \mathbf{d}) + \frac{\rho}{2} \|\mathbf{D}\mathbf{u} - \mathbf{d} \|_2^2 \end{split} \end{equation}

The first subproblem can be solved by setting the derivative to zero:

(λATA+ρDTD)uk+1=λATv+DT(ρdkyk)\begin{equation} \begin{split} (\lambda\mathbf{A}^T\mathbf{A} + \rho \mathbf{D}^T\mathbf{D})\mathbf{u}_{k+1} &= \lambda\mathbf{A}^T\mathbf{v} + \mathbf{D}^T(\rho \mathbf{d}_k - \mathbf{y}_k) \end{split} \end{equation}

and the solution of the second subproblem is:

dk+1=S1/ρ[Duk+1+1ρyk]\begin{equation} \begin{split} \mathbf{d}_{k+1} = S_{1/\rho}[\mathbf{D}\mathbf{u}_{k+1} + \frac{1}{\rho} \mathbf{y}_k] \end{split} \end{equation}

2D TV

As we’ve already known in previous section, there are two types of TVs for high-dimensional data. Equation (6) for the 2D case with the isotrophic TV is like:

arg minuRMNij(Iu)i,j2+(Ju)i,j2+λ2Auv22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} \sum_{i} \sum_{j} \sqrt{\left(\nabla_{I} \mathbf{u} \right)_{i,j}^2 + \left(\nabla_{J} \mathbf{u} \right)_{i,j}^2} + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation}

where A() ⁣:RMNRKA(\cdot)\colon \mathbb{R}^{MN} \longmapsto \mathbb{R}^{K} is a linear function acts on vectorized matrix u\mathbf{u} and vRK\mathbf{v} \in \mathbb{R}^{K} is measured signals. With the anisotrophic TV, Equation (6) is going to be like:

arg minuRMNij(Iui,j+Jui,j)+λ2Auv22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} \sum_{i} \sum_{j} \left(|\nabla_{I} \mathbf{u} |_{i,j} + |\nabla_{J} \mathbf{u} |_{i,j}\right) + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation}

Note that the differential operator I\nabla_{I} acts on vectorized matrices and returns two-dimensional matrices. It’s easier to derive gradients with vectorized matrices rather than matrices themselves. Keep in mind that We don’t actually construct explit differential matrices for I\nabla_{I} and its adjoint. For example, we could reshape vectors into matrices and perform forward differential operations along the corresponding dimension. For I\nabla_{I}^*, which acts on matrices and returns vectorized matrices, we could perform backward differential operations along the corresponding dimension, then negative all elements, finally reshape matrices into vectors.

Split Bregman

To apply the split Bregman method to Equation (36), we need to simplify the TV term by setting DI=Iu\mathbf{D}_I = \nabla_{I} \mathbf{u} and DJ=Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}:

arg minuRMN ij(DI)i,j2+(DJ)i,j2+λ2Auv22s.t DI=Iu DJ=Ju\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} &\ \sum_{i} \sum_{j} \sqrt{\left(\mathbf{D}_{I} \right)_{i,j}^2 + \left(\mathbf{D}_{J} \right)_{i,j}^2} + \frac{\lambda}{2} \|\mathbf{A}\mathbf{u} - \mathbf{v}\|_2^2\\ s.t &\ \mathbf{D}_I = \nabla_{I} \mathbf{u}\\ &\ \mathbf{D}_J = \nabla_{J} \mathbf{u} \end{split} \end{equation}

The iteration is like:

uk+1=arg minuλ2Auv22+μ2[DI,kDJ,k][IuJu][BI,kBJ,k]F2DI,k+1,DJ,k+1=arg minDI,DJij(DI)i,j2+(DJ)i,j2+μ2[DIDJ][Iuk+1Juk+1][BI,kBJ,k]F2[BI,k+1BJ,k+1]=[BI,kBJ,k]+[Iuk+1Juk+1][DI,k+1DJ,k+1]\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 + \frac{\mu}{2} \|\begin{bmatrix}\mathbf{D}_{I,k}\\ \mathbf{D}_{J,k}\end{bmatrix} - \begin{bmatrix}\nabla_I \mathbf{u} \\ \nabla_J \mathbf{u}\end{bmatrix} - \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix}\|_F^2\\ \mathbf{D}_{I, k+1}, \mathbf{D}_{J, k+1} &= \argmin_{\mathbf{D}_{I}, \mathbf{D}_{J}} \sum_{i} \sum_{j} \sqrt{\left(\mathbf{D}_{I} \right)_{i,j}^2 + \left(\mathbf{D}_{J} \right)_{i,j}^2} + \frac{\mu}{2} \|\begin{bmatrix}\mathbf{D}_{I}\\ \mathbf{D}_{J}\end{bmatrix} - \begin{bmatrix}\nabla_I \mathbf{u}_{k+1} \\ \nabla_J \mathbf{u}_{k+1}\end{bmatrix} - \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix}\|_F^2\\ \begin{bmatrix}\mathbf{B}_{I,k+1} \\ \mathbf{B}_{J,k+1} \end{bmatrix} &= \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix} + \begin{bmatrix}\nabla_I \mathbf{u}_{k+1} \\ \nabla_J \mathbf{u}_{k+1}\end{bmatrix} - \begin{bmatrix}\mathbf{D}_{I,k+1}\\ \mathbf{D}_{J,k+1}\end{bmatrix} \end{split} \end{equation}

The first subproblem can be solved with setting its derivative to zero:

(λAA+μ(II+JJ))u=λAv+μ(IZI,k+JZJ,k)ZI,k=DI,kBI,kZJ,k=DJ,kBJ,k\begin{equation} \begin{split} \left(\lambda \mathbf{A}^*\mathbf{A} + \mu\left( \nabla_I^*\nabla_I + \nabla_J^*\nabla_J\right)\right)\mathbf{u} &= \lambda\mathbf{A}^*\mathbf{v} + \mu \left(\nabla_I^*\mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k}\right)\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{B}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{B}_{J,k}\\ \end{split} \end{equation}

The second subproblem can be solved by letting wi,j=[(DI)i,j(DJ)i,j]\mathbf{w}_{i,j} = \begin{bmatrix}(\mathbf{D}_{I})_{i,j}\\ (\mathbf{D}_{J})_{i,j}\end{bmatrix}, then the subproblem can be transformed into:

arg minallwi,js ijwi,j2+μ2ijwi,jyi,j22 yi,j=[(Iuk+1BI,k)i,j(Juk+1BJ,k)i,j]\begin{equation} \begin{split} \argmin_{all \mathbf{w}_{i,j}s} &\ \sum_{i} \sum_{j} \|\mathbf{w}_{i,j}\|_2 + \frac{\mu}{2} \sum_{i} \sum_{j} \|\mathbf{w}_{i,j} - \mathbf{y}_{i,j}\|_2^2\\ &\ \mathbf{y}_{i,j} = \begin{bmatrix} (\nabla_I \mathbf{u}_{k+1}-\mathbf{B}_{I,k})_{i,j} \\ (\nabla_J \mathbf{u}_{k+1}-\mathbf{B}_{J,k})_{i,j}\end{bmatrix} \end{split} \end{equation}

We should know that for the 2nd-norm proximal problem:

arg minx x2+12txy22\begin{equation} \begin{split} \argmin_{\mathbf{x}} &\ \|\mathbf{x}\|_2 + \frac{1}{2t} \|\mathbf{x} - \mathbf{y}\|_2^2 \end{split} \end{equation}

it has an explict solution known as vectorial shrinkage:

x=St[y2]yy2\begin{equation} \begin{split} \mathbf{x}^* = S_t[\|\mathbf{y}\|_2] \frac{\mathbf{y}}{\|\mathbf{y}\|_2} \end{split} \end{equation}

The equation (41) is obviously separable for all wi,j\mathbf{w}_{i,j}s and each separable subproblem has the solution we list above:

wi,j=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/μ[yi,j2]yi,jyi,j2\begin{equation} \begin{split} \mathbf{w}_{i,j}^* &= \begin{bmatrix}(\mathbf{D}_{I,k+1})_{i,j}\\ (\mathbf{D}_{J,k+1})_{i,j}\end{bmatrix}\\ &= S_{1/\mu}[\|\mathbf{y}_{i,j}\|_2] \frac{\mathbf{y}_{i,j}}{\|\mathbf{y}_{i,j}\|_2} \end{split} \end{equation}

To apply the split Bregman method to Equation (37) by letting DI=Iu\mathbf{D}_I = \nabla_{I} \mathbf{u} and DJ=Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}, the only difference is the 2nd subproblem:

DI,k+1,DJ,k+1=arg minDI,DJvec(DI)1+vec(DJ)1+μ2vec(DIIuk+1BI,k22)+μ2vec(DJJuk+1BJ,k22)\begin{equation} \begin{split} \mathbf{D}_{I, k+1}, \mathbf{D}_{J, k+1} = \argmin_{\mathbf{D}_{I}, \mathbf{D}_{J}} &\|vec(\mathbf{D}_I)\|_1 + \|vec(\mathbf{D}_J)\|_1\\ &+ \frac{\mu}{2} \|vec(\mathbf{D}_I -\nabla_I\mathbf{u}_{k+1} - \mathbf{B}_{I,k}\|_2^2)\\ &+ \frac{\mu}{2} \|vec(\mathbf{D}_J -\nabla_J\mathbf{u}_{k+1} - \mathbf{B}_{J,k}\|_2^2) \end{split} \end{equation}

which can be solved with soft-thresholding:

DI,k+1=S1/μ[Iuk+1+BI,k]DJ,k+1=S1/μ[Juk+1+BJ,k]\begin{equation} \begin{split} \mathbf{D}_{I, k+1} &= S_{1/\mu}[\nabla_I \mathbf{u}_{k+1} + \mathbf{B}_{I, k}]\\ \mathbf{D}_{J, k+1} &= S_{1/\mu}[\nabla_J \mathbf{u}_{k+1} + \mathbf{B}_{J, k}]\\ \end{split} \end{equation}

ADMM

To apply ADMM to equation (36) by letting DI=Iu\mathbf{D}_I = \nabla_{I} \mathbf{u} and DJ=Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}, the augmented Lagrangian form is:

Lρ(u,DI,DJ,YI,YJ)=ij(DI)i,j2+(DJ)i,j2+λ2Auv22+ρ2IuDI,k+YI,kF2+ρ2JuDJ,k+YJ,kF2\begin{equation} \begin{split} L_{\rho}(\mathbf{u}, \mathbf{D}_I,\mathbf{D}_J, \mathbf{Y}_I, \mathbf{Y}_J) = &\sum_i\sum_j \sqrt{(\mathbf{D}_I)_{i,j}^2 + (\mathbf{D}_J)_{i,j}^2} + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2\\ &+ \frac{\rho}{2} \|\nabla_I\mathbf{u} - \mathbf{D}_{I,k} + \mathbf{Y}_{I,k}\|_F^2\\ &+ \frac{\rho}{2} \|\nabla_J\mathbf{u} - \mathbf{D}_{J,k} + \mathbf{Y}_{J,k}\|_F^2 \end{split} \end{equation}

The ADMM iteration is like:

uk+1=arg minuλ2Auv22+ρ2IuDI,k+YI,kF2+ρ2JuDJ,k+YJ,kF2DI,k+1,DJ,k+1=arg minDI,DJij(DI)i,j2+(DJ)i,j2+ρ2Iuk+1DI+YI,kF2+ρ2Juk+1DJ+YJ,kF2YI,k+1=YI,k+(Iuk+1DI,k+1)YJ,k+1=YJ,k+(Juk+1DJ,k+1)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 + \frac{\rho}{2} \|\nabla_I\mathbf{u} - \mathbf{D}_{I,k} + \mathbf{Y}_{I,k}\|_F^2 + \frac{\rho}{2} \|\nabla_J\mathbf{u} - \mathbf{D}_{J,k} + \mathbf{Y}_{J,k}\|_F^2\\ \mathbf{D}_{I,k+1}, \mathbf{D}_{J,k+1} &= \argmin_{\mathbf{D}_I, \mathbf{D}_J} \sum_i\sum_j \sqrt{(\mathbf{D}_I)_{i,j}^2 + (\mathbf{D}_J)_{i,j}^2} + \frac{\rho}{2} \|\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I} + \mathbf{Y}_{I,k}\|_F^2 + \frac{\rho}{2} \|\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J} + \mathbf{Y}_{J,k}\|_F^2\\ \mathbf{Y}_{I,k+1} &= \mathbf{Y}_{I,k} + (\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I, k+1})\\ \mathbf{Y}_{J,k+1} &= \mathbf{Y}_{J,k} + (\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J, k+1}) \end{split} \end{equation}

The solution of the first subproblem is:

(λAA+ρ(II+JJ))u=λAv+ρ(IZI,k+JZJ,k)ZI,k=DI,kYI,kZJ,k=DJ,kYJ,k\begin{equation} \begin{split} (\lambda \mathbf{A}^*\mathbf{A} + \rho (\nabla_I^*\nabla_I + \nabla_J^*\nabla_J))\mathbf{u} &= \lambda \mathbf{A}^*\mathbf{v} + \rho (\nabla_I^* \mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k})\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{Y}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{Y}_{J,k} \end{split} \end{equation}

The second subproblem can be solved as the same process in split Bregman by letting wi,j=[(DI)i,j(DJ)i,j]\mathbf{w}_{i,j} = \begin{bmatrix}(\mathbf{D}_{I})_{i,j}\\ (\mathbf{D}_{J})_{i,j}\end{bmatrix}:

wi,j=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/ρ[yi,j2]yi,jyi,j2yi,j=[(Iuk+1+YI,k)i,j(Juk+1+YJ,k)i,j]\begin{equation} \begin{split} \mathbf{w}_{i,j}^* &= \begin{bmatrix}(\mathbf{D}_{I,k+1})_{i,j}\\ (\mathbf{D}_{J,k+1})_{i,j}\end{bmatrix}\\ &= S_{1/\rho}[\|\mathbf{y}_{i,j}\|_2] \frac{\mathbf{y}_{i,j}}{\|\mathbf{y}_{i,j}\|_2}\\ \mathbf{y}_{i,j} &= \begin{bmatrix} (\nabla_I \mathbf{u}_{k+1}+\mathbf{Y}_{I,k})_{i,j} \\ (\nabla_J \mathbf{u}_{k+1}+\mathbf{Y}_{J,k})_{i,j}\end{bmatrix} \end{split} \end{equation}

The ADMM iteration to equation (37) is like:

(λAA+ρ(II+JJ))u=λAv+ρ(IZI,k+JZJ,k)ZI,k=DI,kYI,kZJ,k=DJ,kYJ,kDI,k+1=S1/ρ[Iuk+1+YI,k]DJ,k+1=S1/ρ[Juk+1+YJ,k]YI,k+1=YI,k+(Iuk+1DI,k+1)YJ,k+1=YJ,k+(Juk+1DJ,k+1)\begin{equation} \begin{split} (\lambda \mathbf{A}^*\mathbf{A} + \rho (\nabla_I^*\nabla_I + \nabla_J^*\nabla_J))\mathbf{u} &= \lambda \mathbf{A}^*\mathbf{v} + \rho (\nabla_I^* \mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k})\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{Y}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{Y}_{J,k}\\ \mathbf{D}_{I,k+1} &= S_{1/\rho}[\nabla_I\mathbf{u}_{k+1} + \mathbf{Y}_{I,k}]\\ \mathbf{D}_{J,k+1} &= S_{1/\rho}[\nabla_J\mathbf{u}_{k+1} + \mathbf{Y}_{J,k}]\\ \mathbf{Y}_{I,k+1} &= \mathbf{Y}_{I,k} + (\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I, k+1})\\ \mathbf{Y}_{J,k+1} &= \mathbf{Y}_{J,k} + (\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J, k+1}) \end{split} \end{equation}