Dive into ESPIRiT, from Theory to Practice
This post summarizes the ESPIRiT algorithm for estimating coil sensitivies in MRI. It took me months to fully grasp the theory behind it. A short introduction MRI is a common imaging technique for medical diagnosis nowadays. It’s nearly harmless for humans in contrast to other radioactive imaging techniques. In MRI, when a patient is exposed to a magnetic field, hydrogen atoms align with the field. This alignment could be disrupt by a radiofrequency pulse. As the atoms return to their original alignment, they emit signals captured by sensors and used to create medical images. The signals captured by a sensor ccc (or coil ccc) can be approximately formulated as: yc(k⃗(t))=∫x(r⃗)Bc(r⃗)e−i2πk⃗(t)⋅r⃗dr⃗\begin{equation} \begin{split} y_c(\vec{k}(t)) = \int x(\vec{r}) B_{c}(\vec{r}) e^{-i 2 \pi \vec{k}(t)\cdot\vec{r}}d\vec{r} \end{split} \end{equation} yc(k(t))=∫x(r)Bc(r)e−i2πk(t)⋅rdr where x(r⃗)x(\vec{r})x(r) is an unkown tissue value at location r⃗\vec{r}r, Bc(r⃗)B_{c}(\vec{r})Bc(r) is some value related to the receiver coil ccc, often named as coil sensitivities, k⃗(t)\vec{k}(t)k(t) is k-space coordinates along with the time ttt. It’s common to have dozens of coils in clinical settings. This formula indicates that the signals captured by the coil ccc are fourier transform of interested tissue values which are multipled by the sensitivity map of the coil ccc. A concise way of the above formula is to represent it with operators: yc=FScx\begin{equation} \begin{split} y_c = \mathcal{F}\mathcal{S}_cx \end{split} \end{equation} yc=FScx where F\mathcal{F}F denotes fourier transform, Sc\mathcal{S}_cSc denotes the sensitivity map of the coil ccc, ycy_cyc and xxx are both vectorized. Most of the time, coil sensitivities are not available beforehand. We have to estimate them from acquired signals. ESPIRiT is such a method for estimating coil sensitivities , supported by soild theoretical analysis. About GRAPPA Before diving into ESPIRiT, it’s important to first understand GRAPPA, a well-established parallel imaging method used in MRI. The core of GRAPPA is to interpolate missing k-space points by linearly weighting neighboring acquired points, with the weighting coefficients determined from the calibration region. Again, a concise way to describe GRAPPA is using operators: xc(r)=(PkRry)Tgc\begin{equation} \begin{split} x_c(r) = (\mathcal{P}_{k}\mathcal{R}_ry)^T \mathbf{g}_{c} \end{split} \end{equation} xc(r)=(PkRry)Tgc where yyy represents the vectorized multicoil k-space data with unacquired data set to zero, xc(r)x_c(r)xc(r) is the estimated value for coil ccc at location rrr (for simplicity, the vector notation is omitted, but keep in mind that rrr is a discrete spatial vector index), gc\mathbf{g}_cgc denotes weighting coefficients for coil ccc, Rr\mathcal{R}_rRr is a patch operator that selects a local block of data around location rrr and Pk\mathcal{P}_kPk is a sampling operator that selects only the acquired points within the patch (the subscript kkk implies that there may exist many sampling patterns). We can easily construct linear equations using the data from the calibration region, with the same undersamping pattern Pk\mathcal{P}_kPk: ycAC=APkTgc\begin{equation} \begin{split} \mathbf{y}_c^{AC} = \mathbf{A}\mathcal{P}_k^T\mathbf{g}_c \end{split} \end{equation} ycAC=APkTgc where ycAC\mathbf{y}_c^{AC}ycAC denotes the vectorized data in the calibration region, A\mathbf{A}A is a so-called calibration matrix, that each row is a local patch around the current acquired point. In this case, gc\mathbf{g}_cgc can be efficiently solved by many numerical methods. By construction , one of the columns of A\mathbf{A}A is actually ycAC\mathbf{y}_c^{AC}ycAC itself. Write this as Aec=ycAC\mathbf{A}\mathbf{e}_c = \mathbf{y}_c^{AC}Aec=ycAC, where ec\mathbf{e}_cec is a vector with 1 in the appropriate location to select the corresponding column of A\mathbf{A}A. We note that: APkTgc−ycAC=0APkTgc−Aec=0A(PkTgc−ec)=0\begin{equation} \begin{split} \mathbf{A}\mathcal{P}_k^T\mathbf{g}_c - \mathbf{y}_c^{AC} &= 0\\ \mathbf{A}\mathcal{P}_k^T\mathbf{g}_c - \mathbf{A}\mathbf{e}_c &= 0\\ \mathbf{A}(\mathcal{P}_k^T\mathbf{g}_c - \mathbf{e}_c) &= 0 \end{split} \end{equation} APkTgc−ycACAPkTgc−AecA(PkTgc−ec)=0=0=0 The existence of null-space implies redundancy in row-space of A\mathbf{A}A. gc\mathbf{g}_cgc in GRAPPA are very specific null-space vectors. Instead of using this specific form, ESPIRiT analyzes the calibration matrix A\mathbf{A}A directly. Key assumptions in ESPIRiT Since the matrix A\mathbf{A}A is constructed using local patches in the calibration region, ESPIRiT assumes that all patches, whether in the calibration region or not, lie in the row-space of A\mathbf{A}A: V⊥HRrx=0,∀r\begin{equation} \begin{split} \mathbf{V}_{\perp}^H\mathcal{R}_r \mathbf{x} &= \mathbf{0}, \quad \forall r \end{split} \end{equation} V⊥HRrx=0,∀r where V⊥\mathbf{V}_{\perp}V⊥ is the basis that spans the null-space and x\mathbf{x}x is the vectorized multicoil k-space data. Note that “lying in the row-space” is equivalent to being orthogonal to the null-space. V⊥\mathbf{V}_{\perp}V⊥ is computed by singular value decomposition of A\mathbf{A}A: A=UΣVH=UΣ[V∥V⊥]H\begin{equation} \begin{split} \mathbf{A} &= \mathbf{U} \mathbf{\Sigma} \mathbf{V}^H\\ &= \mathbf{U} \mathbf{\Sigma} \begin{bmatrix} \mathbf{V}_{\parallel} & \mathbf{V}_{\perp} \end{bmatrix}^H \end{split} \end{equation} A=UΣVH=UΣ[V∥V⊥]H where V∥\mathbf{V}_{\parallel}V∥ corresponds to non-zero singular values and V⊥\mathbf{V}_{\perp}V⊥ corresponds to zero singular values. Solving the nullspace constraint in the least-square framework \begin{equation} \DeclareMathOperator*{\argmin}{arg\,min} \begin{split} \argmin_{\mathbf{x}} \quad \sum_{r} \frac{1}{2}\|\mathbf{V}_{\perp}^H\mathcal{R}_r \mathbf{x}\|_{F}^2 \end{split} \end{equation} which leads to the following equations by zeroing out the gradient: ∑rRrHV⊥V⊥HRrx=0\begin{equation} \begin{split} \sum_r \mathcal{R}_r^H\mathbf{V}_{\perp}\mathbf{V}_{\perp}^H\mathcal{R}_r\mathbf{x} = \mathbf{0} \end{split} \end{equation} r∑RrHV⊥V⊥HRrx=0 Since V\mathbf{V}V is unitary, we have I=V⊥V⊥H+V∥V∥H\mathbf{I} = \mathbf{V}_{\perp}\mathbf{V}_{\perp}^H + \mathbf{V}_{\parallel}\mathbf{V}_{\parallel}^HI=V⊥V⊥H+V∥V∥H. By substituting into the above equations, we get: ∑rRrHV∥V∥HRrx=∑rRrHRrx\begin{equation} \begin{split} \sum_r \mathcal{R}_r^H\mathbf{V}_{\parallel}\mathbf{V}_{\parallel}^H\mathcal{R}_r\mathbf{x} = \sum_r \mathcal{R}_r^H\mathcal{R}_r \mathbf{x} \end{split} \end{equation} r∑RrHV∥V∥HRrx=r∑RrHRrx The effect of RrHRr\mathcal{R}_r^H\mathcal{R}_rRrHRr is to restore the patch to its original location while zeroing out others elements. The overall effect of ∑rRrHRr\sum_r \mathcal{R}_r^H\mathcal{R}_r∑rRrHRr is a diagonal matrix, where each diagonal element equals to MMM, indicating the number of times it appears in the neighboring patches. With MMM, the equation can be transformed to: M−1∑rRrHV∥V∥HRrx=x\begin{equation} \begin{split} M^{-1}\sum_r \mathcal{R}_r^H\mathbf{V}_{\parallel}\mathbf{V}_{\parallel}^H\mathcal{R}_r\mathbf{x} = \mathbf{x} \end{split} \end{equation} M−1r∑RrHV∥V∥HRrx=x which is equation (9) in the paper. “Obvious” convolution I believe the most challenging part is deriving equation (16) Gqs⃗q=s⃗q\mathcal{G}_q \vec{s}_q = \vec{s}_qGqsq=sq from the paper. The authors provided little detail about this step, only mentioning that the operator W=M−1∑rRrHV∥V∥HRr\mathcal{W} = M^{-1}\sum_r \mathcal{R}_r^H\mathbf{V}_{\parallel}\mathbf{V}_{\parallel}^H\mathcal{R}_rW=M−1∑rRrHV∥V∥HRr is a matrix-valued convolution, which can be decoupled into point-wise matrix operations in the image domain. The sensitivity maps are obtained by the eigenvalue decomposition of all Gq\mathcal{G}_qGq’s choosing only the eigenvectors corresponding to eigenvalue “=1”. After that, they jump straight to showing the computation of Gq\mathcal{G}_qGq in Figure 3 without any mathematical explanation. The intutive is that we could use the Fourier Convolution Theorem to get what we want, which is more straightforward in 1D cases. However, since W\mathcal{W}W is a high-dimensional operator (k-space locations and channels), applying the theorem directly is not straightforward. A better approach would be to transform W\mathcal{W}W into a form that resembles something familiar in 1D. To simplify the derivation, we first introduce some notations: P=V∥V∥H\mathbf{P} =\mathbf{V}_{\parallel}\mathbf{V}_{\parallel}^HP=V∥V∥H: P\mathbf{P}P operates on patches P(c,α),(d,β)\mathbf{P}_{(c,\alpha), (d, \beta)}P(c,α),(d,β): a scalar value, where (c,α)(c,\alpha)(c,α) represents the c-th channel and location α\alphaα in the output (row index), and (d,β)(d, \beta)(d,β) represents the ddd-th channel and location β\betaβ in the input (column index) xc(r)\mathbf{x}_c(r)xc(r): a scalar value, where ccc represents the ccc-th channel and rrr reprsents location rrr in the k-space (Rrx)(d,β)(\mathcal{R}_r\mathbf{x})_{(d, \beta)}(Rrx)(d,β): a scalar value, the ddd-th coil value at location β\betaβ in the patch centered at location rrr (PRrx)(c,α)(\mathbf{P}\mathcal{R}_r\mathbf{x})_{(c,\alpha)}(PRrx)(c,α): a scalar value, the ccc-th coil value at location α\alphaα of the output (RrHPRrx)c(m)(\mathcal{R}_r^H\mathbf{P}\mathcal{R}_r\mathbf{x})_c(m)(RrHPRrx)c(m): a scalar value, the ccc-th coil value at location mmm in the k-space As we have discussed above, Rr\mathcal{R}_rRr extracts a patch centered at location rrr: (Rrx)(d,β)=xd(r+β)\begin{equation} \begin{split} (\mathcal{R}_r\mathbf{x})_{(d,\beta)}=\mathbf{x}_d(r+\beta) \end{split} \end{equation} (Rrx)(d,β)=xd(r+β) where β\betaβ represents the relative shift with respect to the central location rrr. The effect of P\mathbf{P}P is to perform a weighted summation across the channels and locations of the input: (PRrx)(c,α)=∑d,βP(c,α),(d,β)(Rrx)(d,β)=∑d,βP(c,α),(d,β)xd(r+β)\begin{equation} \begin{split} (\mathbf{P}\mathcal{R}_r\mathbf{x})_{(c,\alpha)} &= \sum_{d,\beta}\mathbf{P}_{(c,\alpha), (d, \beta)} (\mathcal{R}_r\mathbf{x})_{(d,\beta)}\\ &= \sum_{d,\beta}\mathbf{P}_{(c,\alpha), (d, \beta)}\mathbf{x}_d(r+\beta) \end{split} \end{equation} (PRrx)(c,α)=d,β∑P(c,α),(d,β)(Rrx)(d,β)=d,β∑P(c,α),(d,β)xd(r+β) The effect of RrH\mathcal{R}_r^HRrH is to restore the patch to its original location while zeroing out others elements: (RrHPRrx)c(m)=∑αδ(m,r+α)(PRrx)(c,α)\begin{equation} \begin{split} (\mathcal{R}_r^H\mathbf{P}\mathcal{R}_r\mathbf{x})_c(m) &= \sum_{\alpha} \delta(m, r+\alpha)(\mathbf{P}\mathcal{R}_r\mathbf{x})_{(c,\alpha)} \end{split} \end{equation} (RrHPRrx)c(m)=α∑δ(m,r+α)(PRrx)(c,α) where δ(⋅,⋅)\delta(\cdot, \cdot)δ(⋅,⋅) is the Kronecker delta function, which plays an important role in eliminating the inner α\alphaα. With these notations, we have: (Wx)c(m)=M−1∑r(RrHPRrx)c(m)=M−1∑r∑αδ(m,r+α)(PRrx)(c,α)=M−1∑r∑αδ(m,r+α)∑d,βP(c,α),(d,β)xd(r+β)=M−1∑α,d,β∑rδ(m,r+α)P(c,α),(d,β)xd(r+β)\begin{equation} \begin{split} (\mathcal{W}\mathbf{x})_c(m) &= M^{-1}\sum_r (\mathcal{R}_r^H\mathbf{P}\mathcal{R}_r\mathbf{x})_c(m)\\ &= M^{-1}\sum_r\sum_{\alpha} \delta(m, r+\alpha)(\mathbf{P}\mathcal{R}_r\mathbf{x})_{(c,\alpha)}\\ &= M^{-1}\sum_r\sum_{\alpha} \delta(m, r+\alpha)\sum_{d,\beta}\mathbf{P}_{(c,\alpha), (d, \beta)}\mathbf{x}_d(r+\beta)\\ &=M^{-1}\sum_{\alpha,d,\beta}\sum_r\delta(m, r+\alpha)\mathbf{P}_{(c,\alpha), (d, \beta)}\mathbf{x}_d(r+\beta) \end{split} \end{equation} (Wx)c(m)=M−1r∑(RrHPRrx)c(m)=M−1r∑α∑δ(m,r+α)(PRrx)(c,α)=M−1r∑α∑δ(m,r+α)d,β∑P(c,α),(d,β)xd(r+β)=M−1α,d,β∑r∑δ(m,r+α)P(c,α),(d,β)xd(r+β) Since δ(m,r+α)=1\delta(m,r+\alpha) = 1δ(m,r+α)=1 if and only if m=r+αm=r+\alpham=r+α, we further have: (Wx)c(m)=M−1∑α,d,βP(c,α),(d,β)xd(m−(α−β))\begin{equation} \begin{split} (\mathcal{W}\mathbf{x})_c(m) &= M^{-1}\sum_{\alpha,d,\beta}\mathbf{P}_{(c,\alpha), (d, \beta)}\mathbf{x}_d(m - (\alpha-\beta)) \end{split} \end{equation} (Wx)c(m)=M−1α,d,β∑P(c,α),(d,β)xd(m−(α−β)) Let u=α−βu=\alpha - \betau=α−β represents the difference between the relative shifts, then: (Wx)c(m)=M−1∑d∑u(∑α,β,α−β=uP(c,α),(d,β))xd(m−u)=M−1∑d∑uKc,d(u)xd(m−u)\begin{equation} \begin{split} (\mathcal{W}\mathbf{x})_c(m) &= M^{-1}\sum_{d} \sum_u (\sum_{\alpha,\beta,\alpha - \beta= u}\mathbf{P}_{(c,\alpha), (d, \beta)})\mathbf{x}_d(m - u)\\ &=M^{-1}\sum_{d} \sum_u \mathbf{K}_{c,d}(u)\mathbf{x}_d(m - u) \end{split} \end{equation} (Wx)c(m)=M−1d∑u∑(α,β,α−β=u∑P(c,α),(d,β))xd(m−u)=M−1d∑u∑Kc,d(u)xd(m−u) where Kc,d(u)=∑α,β,α−β=uP(c,α),(d,β)\mathbf{K}_{c,d}(u) = \sum_{\alpha,\beta,\alpha - \beta= u}\mathbf{P}_{(c,\alpha), (d, \beta)}Kc,d(u)=∑α,β,α−β=uP(c,α),(d,β). Recall that uuu is determined by the relative shifts of α\alphaα and β\betaβ. What we are doing here is actually re-grouping the pairs (α,β)(\alpha, \beta)(α,β) according to their common difference uuu. We don’t need to worry about how to compute Kc,d(u)\mathbf{K}_{c,d}(u)Kc,d(u) right now; we only need to understand that it is a scalar value associated with c,dc,dc,d and the difference uuu. Finally, we got something that looks like a convolution. ∑uKc,d(u)xd(m−u)\sum_u \mathbf{K}_{c,d}(u)\mathbf{x}_d(m - u)∑uKc,d(u)xd(m−u) is a convolution of Kc,d(m)\mathbf{K}_{c,d}(m)Kc,d(m) and xd(m)\mathbf{x}_d(m)xd(m). It’s worth noting that uuu cannot exceed twice the size of the patch, as it represents the difference between the relative shifts. Kc,d(m)\mathbf{K}_{c,d}(m)Kc,d(m) is only meaningful in this range. Calculate coil sensitivies We should know that xc(m)=F[sc(n)x^(n)](m)\mathbf{x}_c(m)=\mathcal{F}[\mathbf{s}_c(n)\hat{\mathbf{x}}(n)](m)xc(m)=F[sc(n)x^(n)](m), as we have discussed in the introduction. Let’s add it to the right-hand side of the equation: \begin{equation} \begin{split} (\mathcal{W}\mathbf{x})_c(m) &= \mathbf{x}_c(m)\\ M^{-1}\sum_{d} \mathbf{K}_{c,d}(m) \ast\mathbf{x}_d(m) &= \mathbf{x}_c(m)\\ \end{split} Apply the inverse Fourier transform on both sides: M−1∑dK^c,d(n)sd(n)x^(n)=sc(n)x^(n)\begin{equation} \begin{split} M^{-1}\sum_{d} \hat{\mathbf{K}}_{c,d}(n)\mathbf{s}_d(n)\hat{\mathbf{x}}(n) &= \mathbf{s}_c(n)\hat{\mathbf{x}}(n)\\ \end{split} \end{equation} M−1d∑K^c,d(n)sd(n)x^(n)=sc(n)x^(n) where x^(n)\hat{\mathbf{x}}(n)x^(n) is in image-space, sd(n)\mathbf{s}_d(n)sd(n) is the ddd-th coil sensitivity value at the location nnn. With all coils at the location nnn, we have: M−1[K^1,1(n)⋯K^1,Nc(n)⋮⋱⋮K^Nc,1(n)⋯K^Nc,Nc(n)][s1(n)⋮sNc(n)]x^(n)=[s1(n)⋮sNc(n)]x^(n)\begin{equation} \begin{split} M^{-1} \begin{bmatrix} \hat{\mathbf{K}}_{1,1}(n) & \cdots & \hat{\mathbf{K}}_{1,N_c}(n)\\ \vdots & \ddots & \vdots\\ \hat{\mathbf{K}}_{N_c,1}(n) & \cdots & \hat{\mathbf{K}}_{N_c,N_c}(n)\\ \end{bmatrix} \begin{bmatrix} \mathbf{s}_1(n)\\ \vdots\\ \mathbf{s}_{N_c}(n)\\ \end{bmatrix} \hat{\mathbf{x}}(n) &= \begin{bmatrix} \mathbf{s}_1(n)\\ \vdots\\ \mathbf{s}_{N_c}(n)\\ \end{bmatrix} \hat{\mathbf{x}}(n) \end{split} \end{equation} M−1K^1,1(n)⋮K^Nc,1(n)⋯⋱⋯K^1,Nc(n)⋮K^Nc,Nc(n)s1(n)⋮sNc(n)x^(n)=s1(n)⋮sNc(n)x^(n) where x^(n)\hat{\mathbf{x}}(n)x^(n) can be eliminated if it is non-zero. We got it! Now it’s clear that the coil sensitivities at location nnn are the eigenvectors corresponding to the eigenvalues equal to 1. This is exactly equation (16) in the paper (different notations but essentially the same). The remaining question is what the inverse Fourier transform of K^c,d(n)\hat{\mathbf{K}}_{c,d}(n)K^c,d(n) is: K^c,d(n)=1N∑mKc,d(m)eiπnmN=1N∑u∑α,β,α−β=uP(c,α),(d,β)eiπnuN=1N∑α,βP(c,α),(d,β)eiπn(α−β)N\begin{equation} \begin{split} \hat{\mathbf{K}}_{c,d}(n) &= \frac{1}{N}\sum_m \mathbf{K}_{c,d}(m) e^{i\pi\frac{nm}{N}}\\ &= \frac{1}{N} \sum_u \sum_{\alpha,\beta,\alpha - \beta= u}\mathbf{P}_{(c,\alpha), (d, \beta)} e^{i\pi\frac{nu}{N}}\\ &= \frac{1}{N} \sum_{\alpha,\beta}\mathbf{P}_{(c,\alpha), (d, \beta)} e^{i\pi\frac{n(\alpha-\beta)}{N}}\\ \end{split} \end{equation} K^c,d(n)=N1m∑Kc,d(m)eiπNnm=N1u∑α,β,α−β=u∑P(c,α),(d,β)eiπNnu=N1α,β∑P(c,α),(d,β)eiπNn(α−β) where NNN is the number of spatial locations. Here we eliminate uuu by re-grouping elements according to the α\alphaα and β\betaβ. Recall that P=V∥V∥H\mathbf{P} =\mathbf{V}_{\parallel}\mathbf{V}_{\parallel}^HP=V∥V∥H: P(c,α),(d,β)=∑kvk(c,α)vk∗(d,β)\begin{equation} \begin{split} \mathbf{P}_{(c,\alpha), (d, \beta)} &= \sum_k \mathbf{v}_k(c,\alpha)\mathbf{v}_k^{\ast}(d,\beta) \end{split} \end{equation} P(c,α),(d,β)=k∑vk(c,α)vk∗(d,β) where vk(c,α)\mathbf{v}_k(c,\alpha)vk(c,α) represents a scalar value of the coil ccc and location α\alphaα of the kkk-th component of V∥\mathbf{V}_{\parallel}V∥. Then we have: K^c,d(n)=1N∑α,βP(c,α),(d,β)eiπn(α−β)N=N∑k(1N∑αvk(c,α)eiπnαN)(1N∑βvk∗(d,β)e−iπnβN)=N∑kv^k(c,n)v^k∗(d,n)\begin{equation} \begin{split} \hat{\mathbf{K}}_{c,d}(n) &= \frac{1}{N} \sum_{\alpha,\beta}\mathbf{P}_{(c,\alpha), (d, \beta)} e^{i\pi\frac{n(\alpha-\beta)}{N}}\\ &= N \sum_k \left(\frac{1}{N} \sum_{\alpha} \mathbf{v}_k(c, \alpha)e^{i\pi\frac{n\alpha}{N}}\right)\left(\frac{1}{N} \sum_{\beta} \mathbf{v}_k^{\ast}(d, \beta)e^{-i\pi\frac{n\beta}{N}}\right)\\ &= N \sum_k \hat{\mathbf{v}}_k(c, n)\hat{\mathbf{v}}_k^{\ast}(d, n) \end{split} \end{equation} K^c,d(n)=N1α,β∑P(c,α),(d,β)eiπNn(α−β)=Nk∑(N1α∑vk(c,α)eiπNnα)N1β∑vk∗(d,β)e−iπNnβ=Nk∑v^k(c,n)v^k∗(d,n) where v^k(c,n)\hat{\mathbf{v}}_k(c, n)v^k(c,n) is the inverse Fourier transform of vk(c,m)\mathbf{v}_k(c, m)vk(c,m) at the location nnn. Let’s summarize the steps to compute coil sensitivities: Reshape V∥\mathbf{V}_{\parallel}V∥ to the shape P×Nc×NkP \times N_c \times N_kP×Nc×Nk Center it and pad it to the full-grid size N×Nc×NkN \times N_c \times N_kN×Nc×Nk Apply iFFT on the k-space Extract Gn\mathbf{G}_nGn of shape Nc×NkN_c \times N_kNc×Nk for each pixel nnn Compute the eigenvectors of GnGnH\mathbf{G}_n\mathbf{G}_n^HGnGnH Select the eigenvectors corresponding to the eigenvalues of 1, which represent the coil sensitivities at pixel nnn Repeat 4 to 6 until all pixels have been processed
Powerful Bregman Methods
The Bregman method is an iterative method to solve convex optimization problems with equality constraints. The linearized Bregman method approximates subproblems in Bregman iterations with a linearized version. The split Bregman method decouples variables in the original problem to make subproblems in Bregman iterations much easier to solve, utilizing both the advantages of alternating minimization and Bregman method. In this post, I’ll use the denominator layout for gradients. What is Bregman Divergence The Bregman divergence (or Bregman distance) of a convex function F(x)F(\mathbf{x})F(x) at point y\mathbf{y}y is defined as the difference between itself and its 1st-order Talyor expansion: DFp(x,y)=F(x)−F(y)−pT(x−y)\begin{equation} \begin{split} D_F^{\mathbf{p}}(\mathbf{x}, \mathbf{y}) = F(\mathbf{x}) - F(\mathbf{y}) - \mathbf{p}^T(\mathbf{x} - \mathbf{y}) \end{split} \end{equation} DFp(x,y)=F(x)−F(y)−pT(x−y) where p∈∂F(y)\mathbf{p} \in \partial F(\mathbf{y})p∈∂F(y) is a subgradient. Here I list some of its properties and more can be found on wiki: DFp(x,y)≥0D_F^{\mathbf{p}}(\mathbf{x}, \mathbf{y}) \ge 0DFp(x,y)≥0 DFp(y,y)=0D_F^{\mathbf{p}}(\mathbf{y}, \mathbf{y}) = 0DFp(y,y)=0 DFp(x,y)+DFq(y,z)−DFq(x,z)=(p−q)T(y−x)D_F^{\mathbf{p}}(\mathbf{x}, \mathbf{y}) + D_F^{\mathbf{q}}(\mathbf{y}, \mathbf{z}) - D_F^{\mathbf{q}}(\mathbf{x}, \mathbf{z}) = (\mathbf{p}-\mathbf{q})^T(\mathbf{y}-\mathbf{x})DFp(x,y)+DFq(y,z)−DFq(x,z)=(p−q)T(y−x) Bregman Method Consider the following constrained problem: arg minx F(x)s.t.G(x)=0\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ &F(\mathbf{x}) \\ &s.t. G(\mathbf{x}) = 0 \end{split} \end{equation} xargmin F(x)s.t.G(x)=0 where F(x)F(\mathbf{x})F(x) and G(x)G(\mathbf{x})G(x) are convex, G(x)G(\mathbf{x})G(x) is differentiable, and minxG(x)=0\min_{\mathbf{x}} G(\mathbf{x}) = 0minxG(x)=0. An intuitive way to solve the above problem is to transform it into an unconstrained form: arg minx F(x)+λG(x)\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ F(\mathbf{x}) + \lambda G(\mathbf{x}) \end{split} \end{equation} xargmin F(x)+λG(x) where λ→∞\lambda \rightarrow \inftyλ→∞ to enforce that G(x)≈0G(\mathbf{x}) \approx 0G(x)≈0. However, for many problems, a large λ\lambdaλ would make the problem numerically unstable. It’s also difficult to determine how “large” is large for λ\lambdaλ. The Bregman method turns Equation (2) into a sequence of unconstrained problems. Rather than choosing a large λ\lambdaλ, the Bregman method approaches the equality with arbitrary precision by solving subproblems iteratively, with a fixed but smaller λ\lambdaλ. Instead of solving Equation (3), the Bregman method solves the following problem: xk+1=arg minx DFpk(x,xk)+λG(x)=arg minx F(x)−pkTx+λG(x)\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ D_F^{\mathbf{p}_k}(\mathbf{x}, \mathbf{x}_k) + \lambda G(\mathbf{x})\\ &= \argmin_{\mathbf{x}}\ F(\mathbf{x}) - \mathbf{p}_k^T\mathbf{x} + \lambda G(\mathbf{x}) \end{split} \end{equation} xk+1=xargmin DFpk(x,xk)+λG(x)=xargmin F(x)−pkTx+λG(x) By the subgradient optimality condition, we know that: 0∈∂F(xk+1)−pk+λ∇G(xk+1)\begin{equation} \begin{split} 0 \in \partial F(\mathbf{x}_{k+1}) - \mathbf{p}_k + \lambda \nabla G(\mathbf{x}_{k+1})\\ \end{split} \end{equation} 0∈∂F(xk+1)−pk+λ∇G(xk+1) Let pk+1∈∂F(xk+1)\mathbf{p}_{k+1} \in \partial F(\mathbf{x}_{k+1})pk+1∈∂F(xk+1), the Bregman iteration is going to like: xk+1=arg minx F(x)−pkTx+λG(x)pk+1=pk−λ∇G(xk+1)\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ F(\mathbf{x}) - \mathbf{p}_k^T\mathbf{x} + \lambda G(\mathbf{x})\\ \mathbf{p}_{k+1} &= \mathbf{p}_k - \lambda \nabla G(\mathbf{x}_{k+1}) \end{split} \end{equation} xk+1pk+1=xargmin F(x)−pkTx+λG(x)=pk−λ∇G(xk+1) where x0=arg minxG(x)\mathbf{x}_0 = \argmin_{\mathbf{x}} G(\mathbf{x})x0=argminxG(x) and pk=0\mathbf{p}_k = 0pk=0. It can prove that as k→∞k \rightarrow \inftyk→∞, G(xk)→0G(\mathbf{x}_k) \rightarrow 0G(xk)→0, thus the minimization problem in Equation (6) approximates the original problem in Equation (2). If G(xk)=12∥Ax−b∥22G(\mathbf{x}_k) = \frac{1}{2}\|\mathbf{A}\mathbf{x} -\mathbf{b} \|_2^2G(xk)=21∥Ax−b∥22, then ∇G(xk+1)=AT(Axk+1−b)\nabla G(\mathbf{x}_{k+1}) = \mathbf{A}^T\left(\mathbf{A}\mathbf{x}_{k+1} - \mathbf{b}\right)∇G(xk+1)=AT(Axk+1−b). Equation (6) can be transformed into a simpler form: xk+1=arg minx F(x)+λ2∥Ax−bk∥22bk+1=bk+b−Axk\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ F(\mathbf{x}) + \frac{\lambda}{2} \|\mathbf{A}\mathbf{x} -\mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k + \mathbf{b} - \mathbf{A}\mathbf{x}_k \end{split} \end{equation} xk+1bk+1=xargmin F(x)+2λ∥Ax−bk∥22=bk+b−Axk which simply adds the error in the constraint back to the right-hand side. Equation (7) can be proved by merging pkTx\mathbf{p}_k^T\mathbf{x}pkTx into G(x)G(\mathbf{x})G(x) and substituting pk\mathbf{p}_kpk with its explict form pk=p0−λ∑i=1k∇G(xi)\mathbf{p}_k = \mathbf{p}_0 - \lambda \sum_{i=1}^{k} \nabla G(\mathbf{x}_i)pk=p0−λ∑i=1k∇G(xi). Linearized Bregman Method The Bregman method doesn’t reduce the complexity of solving Equation (3), especially when F(x)F(\mathbf{x})F(x) is not differentiable (e.g. l1l_1l1 norm) and not separable (its elements are coupled with each other). Suppose that F(x)F(\mathbf{x})F(x) is separable, it would be easier to solve the problem if we could separate elements in G(x)G(\mathbf{x})G(x) either. That is what the linearized Bregman method does. It linearizes G(x)G(\mathbf{x})G(x) with its 1st-order Talyor expansion at xk\mathbf{x}_kxk: xk+1=arg minx DFpk(x,xk)+λG(xk)+λ∇G(xk)T(x−xk)+λμ2∥x−xk∥22=arg minx DFpk(x,xk)+λμ2∥x−(xk−1μ∇G(xk))∥22\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ D_F^{\mathbf{p}_k}(\mathbf{x}, \mathbf{x}_k)\\ &+ \lambda G(\mathbf{x}_k) + \lambda\nabla G(\mathbf{x}_k)^T(\mathbf{x} - \mathbf{x}_k) + \frac{\lambda\mu}{2} \|\mathbf{x} -\mathbf{x}_k\|_2^2\\ &= \argmin_{\mathbf{x}}\ D_F^{\mathbf{p}_k}(\mathbf{x}, \mathbf{x}_k) \\&+ \frac{\lambda\mu}{2} \|\mathbf{x} -\left(\mathbf{x}_k - \frac{1}{\mu} \nabla G(\mathbf{x}_k) \right)\|_2^2\\ \end{split} \end{equation} xk+1=xargmin DFpk(x,xk)+λG(xk)+λ∇G(xk)T(x−xk)+2λμ∥x−xk∥22=xargmin DFpk(x,xk)+2λμ∥x−(xk−μ1∇G(xk))∥22 where μ2∥x−xk∥22\frac{\mu}{2} \|\mathbf{x} -\mathbf{x}_k\|_2^22μ∥x−xk∥22 is a penalty term since this approximation only works when x\mathbf{x}x nears xk\mathbf{x}_kxk. Note that ∇G(xk)\nabla G(\mathbf{x}_k)∇G(xk) is merged into the l2l_2l2 norm as the same Equation (6) in post RPCA. Let the l2l_2l2 norm term as a new function, then we can derive pk+1\mathbf{p}_{k+1}pk+1 by Equation (6): pk+1=pk−λμ(x−xk+1μ∇G(xk))\begin{equation} \begin{split} \mathbf{p}_{k+1} = \mathbf{p}_{k} -\lambda\mu (\mathbf{x} - \mathbf{x}_k + \frac{1}{\mu} \nabla G(\mathbf{x}_k)) \end{split} \end{equation} pk+1=pk−λμ(x−xk+μ1∇G(xk)) Equation (8) is much easier to solve since all elements of x\mathbf{x}x are separable. Split Bregman Method The split Bregman method is used when F(x)F(\mathbf{x})F(x) is not obviously separable. The key idea of the split Bregman is to decouple l1l_1l1 and l2l_2l2 terms by equality constraints. Consider the following optimization problem: arg minx ∥F(x)∥1+λG(x)\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ \|F(\mathbf{x})\|_1 + \lambda G(\mathbf{x}) \end{split} \end{equation} xargmin ∥F(x)∥1+λG(x) where F(x)F(\mathbf{x})F(x) and G(x)G(\mathbf{x})G(x) are both convex and differentiable. Let d=F(x)\mathbf{d} = F(\mathbf{x})d=F(x), then we transform the original problem into a constrained problem: arg minx,d ∥d∥1+λG(x)s.t. F(x)−d=0\begin{equation} \begin{split} \argmin_{\mathbf{x}, \mathbf{d}}\ &\|\mathbf{d}\|_1 + \lambda G(\mathbf{x})\\ &s.t.\ F(\mathbf{x}) - \mathbf{d} = 0 \end{split} \end{equation} x,dargmin ∥d∥1+λG(x)s.t. F(x)−d=0 which is a joint optimization over x\mathbf{x}x and b\mathbf{b}b. This can be further transformed into an unconstrained problem with a penalty as the Bregman method did: arg minx,d ∥d∥1+λG(x)+μ2∥F(x)−d∥22\begin{equation} \begin{split} \argmin_{\mathbf{x}, \mathbf{d}}\ &\|\mathbf{d}\|_1 + \lambda G(\mathbf{x}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}\|_2^2\\ \end{split} \end{equation} x,dargmin ∥d∥1+λG(x)+2μ∥F(x)−d∥22 Let H(x,d)=∥d∥1+λG(x)H(\mathbf{x}, \mathbf{d}) = \|\mathbf{d}\|_1 + \lambda G(\mathbf{x})H(x,d)=∥d∥1+λG(x) and A(x,d)=F(x)−dA(\mathbf{x}, \mathbf{d}) = F(\mathbf{x}) - \mathbf{d}A(x,d)=F(x)−d (both separable for x\mathbf{x}x and d\mathbf{d}d), by Equation (6), we have: xk+1,dk+1=arg minx,d H(x,d)−px,kTx−pd,kTd+μ2∥A(x,d)∥22=arg minx,d H(x,d)−px,kTx−pd,kTd+μ2∥F(x)−d∥22px,k+1=px,k−μ(∇xA(xk+1,dk+1))TA(xk+1,dk+1)=px,k−μ(∇F(xk+1))T(F(xk+1)−dk+1)pd,k+1=pd,k−μ(∇dA(xk+1,dk+1))TA(xk+1,dk+1)=pd,k−μ(dk+1−F(xk+1))\begin{equation} \begin{split} \mathbf{x}_{k+1},\mathbf{d}_{k+1} &= \argmin_{\mathbf{x}, \mathbf{d}}\ H(\mathbf{x}, \mathbf{d}) - \mathbf{p}_{\mathbf{x},k}^T\mathbf{x} - \mathbf{p}_{\mathbf{d},k}^T\mathbf{d} \\&+ \frac{\mu}{2}\|A(\mathbf{x}, \mathbf{d})\|_2^2\\ &= \argmin_{\mathbf{x}, \mathbf{d}}\ H(\mathbf{x}, \mathbf{d}) - \mathbf{p}_{\mathbf{x},k}^T\mathbf{x} - \mathbf{p}_{\mathbf{d},k}^T\mathbf{d} \\&+ \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}\|_2^2\\ \mathbf{p}_{\mathbf{x},k+1} &= \mathbf{p}_{\mathbf{x},k} - \mu \left(\nabla_{\mathbf{x}} A(\mathbf{x}_{k+1}, \mathbf{d}_{k+1})\right)^T A(\mathbf{x}_{k+1}, \mathbf{d}_{k+1})\\ &= \mathbf{p}_{\mathbf{x},k} - \mu \left(\nabla F(\mathbf{x}_{k+1})\right)^T(F(\mathbf{x}_{k+1}) - \mathbf{d}_{k+1})\\ \mathbf{p}_{\mathbf{d},k+1} &= \mathbf{p}_{\mathbf{d},k} - \mu \left(\nabla_{\mathbf{d}} A(\mathbf{x}_{k+1}, \mathbf{d}_{k+1})\right)^T A(\mathbf{x}_{k+1}, \mathbf{d}_{k+1})\\ &= \mathbf{p}_{\mathbf{d},k} - \mu (\mathbf{d}_{k+1}- F(\mathbf{x}_{k+1}))\\ \end{split} \end{equation} xk+1,dk+1px,k+1pd,k+1=x,dargmin H(x,d)−px,kTx−pd,kTd+2μ∥A(x,d)∥22=x,dargmin H(x,d)−px,kTx−pd,kTd+2μ∥F(x)−d∥22=px,k−μ(∇xA(xk+1,dk+1))TA(xk+1,dk+1)=px,k−μ(∇F(xk+1))T(F(xk+1)−dk+1)=pd,k−μ(∇dA(xk+1,dk+1))TA(xk+1,dk+1)=pd,k−μ(dk+1−F(xk+1)) If F(x)F(\mathbf{x})F(x) is linear, then the above iteration can be simplified as that in Equation (7): xk+1,dk+1=arg minx,d H(x,d)+μ2∥F(x)−d−bk∥22bk+1=bk−F(xk+1)+dk+1\begin{equation} \begin{split} \mathbf{x}_{k+1},\mathbf{d}_{k+1} &= \argmin_{\mathbf{x}, \mathbf{d}}\ H(\mathbf{x}, \mathbf{d}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - F(\mathbf{x}_{k+1}) + \mathbf{d}_{k+1} \end{split} \end{equation} xk+1,dk+1bk+1=x,dargmin H(x,d)+2μ∥F(x)−d−bk∥22=bk−F(xk+1)+dk+1 Equation (14) is still a joint minimizaiton problem, which can be solved by (alternating minimization or coordinate descent) minimizing with respect to x\mathbf{x}x and d\mathbf{d}d, separately: xk+1=arg minx λG(x)+μ2∥F(x)−dk−bk∥22dk+1=arg mind ∥d∥1+μ2∥F(xk+1)−d−bk∥22bk+1=bk−F(xk+1)+dk+1\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ \lambda G(\mathbf{x}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}_k - \mathbf{b}_k \|_2^2\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}}\ \|\mathbf{d}\|_1 + \frac{\mu}{2}\|F(\mathbf{x}_{k+1}) - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - F(\mathbf{x}_{k+1}) + \mathbf{d}_{k+1} \end{split} \end{equation} xk+1dk+1bk+1=xargmin λG(x)+2μ∥F(x)−dk−bk∥22=dargmin ∥d∥1+2μ∥F(xk+1)−d−bk∥22=bk−F(xk+1)+dk+1 The first subproblem can be solved by setting the gradient to zero. The second subproblem has an explicit solution with soft-thresholding operator in this post. Note that these two subproblem are just one iteration of the alternating minimization method. To achieve the same convergence rate of Equation (14), it requires more iterations (which are called inner loops and the Bregman iterations are outer loops). For most applications, one iteration is only needed.
Total Variation Denoising Problem
Total variation (TV) denoising, also known as TV regularization or TV filtering, is a powerful technique widely used in various fields, including medical imaging, computer vision, etc. It removes noises while preserving most important structural features. The first image of black hole, captured by Event Horizon Telescope (EHT), was processed and revealed with this technique in 2019. The concept was proposed in 1992 by Rudin, Osher and Fatemi, known as the ROF model which is a continuous form of TV denoising problem. What is Total Variation The term, total variation, refers to a mathematical concept that is a little hard to understand for me. Nonrigorously, the TV of a function u(x)u(x)u(x) is the integral of its derivative magnitude within its bouned domain Ω\OmegaΩ: ∥u∥TV=∫Ω∣∇u(x)∣dx\begin{equation} \begin{split} \|u\|_{TV} = \int_{\Omega} |\nabla u(x)| dx \end{split} \end{equation} ∥u∥TV=∫Ω∣∇u(x)∣dx For practical use, a discretized version is more favorable. There are two common discretized TVs in literatures, the isotropic and anisotropic TVs. Suppose that u(X)u(\mathcal{X})u(X) is a function of an order-NNN tensor grid X∈RI1×I2×⋯IN\mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots I_N}X∈RI1×I2×⋯IN, the isotropic TV is defined as: ∥u∥TV=∑i1∑i2⋯∑iN(∇I1u)i1i2⋯iN2+⋯+(∇INu)i1i2⋯iN2\begin{equation} \begin{split} \|u\|_{TV} = \sum_{i_1} \sum_{i_2} \cdots \sum_{i_N} \sqrt{\left(\nabla_{I_1} u \right)_{i_1 i_2 \cdots i_N}^2 + \cdots + \left(\nabla_{I_N} u \right)_{i_1 i_2 \cdots i_N}^2} \end{split} \end{equation} ∥u∥TV=i1∑i2∑⋯iN∑(∇I1u)i1i2⋯iN2+⋯+(∇INu)i1i2⋯iN2 where ∇Iku\nabla_{I_k} u∇Iku is the derivatives along the IkI_kIk dimension. The isotropic TV is invariant to rotation of the domain that is if you rotate the image arbitrarily, the isotropic TV would not change. The anisotropic TV is defined as: ∥u∥TV=∑i1∑i2⋯∑iN∣(∇I1u)i1i2⋯iN∣+⋯+∣(∇INu)i1i2⋯iN∣\begin{equation} \begin{split} \|u\|_{TV} = \sum_{i_1} \sum_{i_2} \cdots \sum_{i_N} |\left(\nabla_{I_1} u \right)_{i_1 i_2 \cdots i_N}| + \cdots + |\left(\nabla_{I_N} u \right)_{i_1 i_2 \cdots i_N}| \end{split} \end{equation} ∥u∥TV=i1∑i2∑⋯iN∑∣(∇I1u)i1i2⋯iN∣+⋯+∣(∇INu)i1i2⋯iN∣ which is not rotation invariant. There is no difference between the isotropic and anisotropic TVs for 1D signals. Discrete Derivatives There are many ways to define derivatives for discretized signals. For better understanding, I will use a 1-d signal u(x)u(x)u(x) (and its discrete form ui,i=0,1,2,⋯ ,N−1u_i,i=0,1,2,\cdots,N-1ui,i=0,1,2,⋯,N−1) as an example to illustrate all these ways. Let ∇xu\nabla_{x} u∇xu denote the derivatives of u(x)u(x)u(x) in the x direction, (∇x+u)i=ui+1−ui(\nabla_{x}^+ u)_i = u_{i+1} - u_i(∇x+u)i=ui+1−ui denote the forward difference, and (∇x−u)i=ui−ui−1(\nabla_{x}^- u)_i = u_{i} - u_{i-1}(∇x−u)i=ui−ui−1 denote the backward difference. A few definitions of derivatives are: one-sided difference, (∇xu)i2=(∇x+u)i2(\nabla_{x}u)^2_{i} = (\nabla_{x}^+ u)^2_i(∇xu)i2=(∇x+u)i2 central difference, (∇xu)i2=(((∇x+u)i+(∇x−u)i)/2)2(\nabla_{x}u)^2_{i} = (((\nabla_{x}^+ u)_i + (\nabla_{x}^- u)_i)/2)^2(∇xu)i2=(((∇x+u)i+(∇x−u)i)/2)2 geometric average, (∇xu)i2=((∇x+u)i2+(∇x−u)i2)/2(\nabla_{x}u)^2_{i} = ((\nabla_{x}^+ u)^2_i + (\nabla_{x}^- u)^2_i)/2(∇xu)i2=((∇x+u)i2+(∇x−u)i2)/2 minmod difference, (∇xu)i2=m((∇x+u)i,(∇x−u)i)(\nabla_{x}u)^2_{i} = m((\nabla_{x}^+ u)_i, (\nabla_{x}^- u)_i)(∇xu)i2=m((∇x+u)i,(∇x−u)i), where m(a,b)=(sign(a)+sign(b)2)min(∣a∣,∣b∣)m(a,b)=\left(\frac{sign(a) + sign(b)}{2}\right)min(|a|, |b|)m(a,b)=(2sign(a)+sign(b))min(∣a∣,∣b∣) upwind discretization, (∇xu)i2=(max((∇x+u)i,0)2+max((∇x−u)i,0)2)/2(\nabla_{x}u)^2_{i} = (max((\nabla_{x}^+ u)_i, 0)^2+max((\nabla_{x}^- u)_i, 0)^2)/2(∇xu)i2=(max((∇x+u)i,0)2+max((∇x−u)i,0)2)/2 The one-sided difference (or forward difference) may be the most common way to discretize the derivatives but it’s not symmetric. Although the central difference is symmetric, it is not able to reveal thin and small structures (considering a single point with one and others are zero, the central difference at this point would be zero which is counter-intuitive). The geometric average, minmod difference and upwind discretization are both symmectic and being able to reveal thin structures at the cost of nonlinearity. I will use one-sided difference in the following content. Since we are dealing with finite signals, special handling is needed on the boundaries. I have seen two common ways to do that in literatures. One is reflective extension and the other is circulant extension. The reflective extention assumes that the signals extend in a reflective way and the circulant extension assumes that the signals extend in a circulant way. For the reflective extension, the matricized differential operator and its adjoint operator are: D=[−11−11⋱⋱−110]D∗=−[1−11⋱⋱−11−10]=DT\begin{equation} \begin{split} \mathbf{D} &= \begin{bmatrix} -1&1&&&\\ &-1&1&&\\ & &\ddots&\ddots&\\ & & &-1&1\\ &&&&0 \end{bmatrix}\\ \mathbf{D}^{\ast} &= - \begin{bmatrix} 1&&&&\\ -1&1&&&\\ &\ddots&\ddots&&\\ & &-1&1&\\ &&&-1&0 \end{bmatrix} = \mathbf{D}^T\\ \end{split} \end{equation} DD∗=−11−11⋱⋱−110=−1−11⋱⋱−11−10=DT For the circulant extension, the matricized differential operator and its adjoint operator are: D=[−11−11⋱⋱−111−1]D∗=−[1−1−11⋱⋱−11−11]=DT\begin{equation} \begin{split} \mathbf{D} &= \begin{bmatrix} -1&1&&&\\ &-1&1&&\\ & &\ddots&\ddots&\\ & & &-1&1\\ 1&&&&-1 \end{bmatrix}\\ \mathbf{D}^{\ast} &= - \begin{bmatrix} 1&&&&-1\\ -1&1&&&\\ &\ddots&\ddots&&\\ & &-1&1&\\ &&&-1&1 \end{bmatrix} = \mathbf{D}^T\\ \end{split} \end{equation} DD∗=−111−11⋱⋱−11−1=−1−11⋱⋱−11−1−11=DT As we’ve seen, the adjoint operators of differential operators are the negative backward differences. In practice, we don’t need to construct explicit matrices for those operators as long as we know how to perform the corresponding operations. Total Variation Denoising A general TV denoising problem minimizes the following objective function: arg minu∥u∥TV+λ2∥A(u)−v∥2\begin{equation} \begin{split} \argmin_{u} \|u\|_{TV} + \frac{\lambda}{2} \|A(u) - v\|^2 \end{split} \end{equation} uargmin∥u∥TV+2λ∥A(u)−v∥2 where u,vu, vu,v are functions of a given tensor grid X\mathcal{X}X, AAA is a linear operator, and λ\lambdaλ controls how much smoothing is performed. 1D TV Let’s make Equation (6) more concise for the 1D case: arg minu∈RN∥Du∥1+λ2∥Au−v∥22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^N} \|\mathbf{D} \mathbf{u} \|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation} u∈RNargmin∥Du∥1+2λ∥Au−v∥22 where A∈RM×N\mathbf{A} \in \mathbb{R}^{M \times N}A∈RM×N and D\mathbf{D}D is the differential operator defined above. Iterative Clipping The iterative clipping method solves the 1D TV denoising problem by solving its dual form. A useful fact is that the absolute value ∣x∣|x|∣x∣ can be written as an optimization problem and the l1l_1l1 norm likewisely$: ∣x∣=max∣z∣≤1zx∥x∥1=max∣z∣≤1zTx\begin{equation} \begin{split} |x| &= \max_{|z| \le 1} zx \\ \|\mathbf{x}\|_1 &= \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{x} \end{split} \end{equation} ∣x∣∥x∥1=∣z∣≤1maxzx=∣z∣≤1maxzTx where ∣z∣≤1|\mathbf{z}| \le 1∣z∣≤1 denotes each element of z\mathbf{z}z is less than or equals to 1. Let F(u)=∥Du∥1+λ2∥Au−v∥22F(\mathbf{u}) = \|\mathbf{D} \mathbf{u} \|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2F(u)=∥Du∥1+2λ∥Au−v∥22, the objective function can be written as: F(u)=max∣z∣≤1zTDu+λ2∥Au−v∥22\begin{equation} \begin{split} F(\mathbf{u}) &= \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation} F(u)=∣z∣≤1maxzTDu+2λ∥Au−v∥22 thus the Equation (7) can be written as: argminumax∣z∣≤1zTDu+λ2∥Au−v∥22\begin{equation} \begin{split} \arg &\min_{\mathbf{u}} \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2}\|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation} argumin∣z∣≤1maxzTDu+2λ∥Au−v∥22 The minmax theorem is that if the function f(x,y)f(x, y)f(x,y) is concave in x and convex in y, then the following equlity holds: maxxminyf(x,y)=minymaxxf(x,y)\begin{equation} \begin{split} \max_{x} \min_{y} f(x, y) = \min_{y} \max_{x} f(x, y) \end{split} \end{equation} xmaxyminf(x,y)=yminxmaxf(x,y) then we can change the order of minmax in Equation (10): argmax∣z∣≤1minuzTDu+λ2∥Au−v∥22\begin{equation} \begin{split} \arg \max_{|\mathbf{z}| \le 1} \min_{\mathbf{u}} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2}\|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation} arg∣z∣≤1maxuminzTDu+2λ∥Au−v∥22 which is a dual form of the original problem. The solution of the inner minimization problem is: uk+1=(ATA)†(ATv−1λDTzk)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\left(\mathbf{A}^T\mathbf{v} - \frac{1}{\lambda}\mathbf{D}^T\mathbf{z}_k\right) \end{split} \end{equation} uk+1=(ATA)†(ATv−λ1DTzk) Substituting Equation (13) back into Equation (12) gives: argmin∣z∣≤1zTD(ATA)†DTz−2λ(DA†v)Tz\begin{equation} \begin{split} \arg \min_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\mathbf{z} - 2\lambda \left(\mathbf{D}\mathbf{A}^{\dag}\mathbf{v}\right)^T\mathbf{z} \end{split} \end{equation} arg∣z∣≤1minzTD(ATA)†DTz−2λ(DA†v)Tz There are many ways to solve this quadratic form, one is by the majorization-minimization (MM) method. Given a function F(x)F(x)F(x), the MM method chooses an auxiliary function Gk(x)G_k(x)Gk(x) such that Gk(x)≥F(x), Gk(xk)=F(xk)G_k(x) \ge F(x),\ G_k(x_k) = F(x_k)Gk(x)≥F(x), Gk(xk)=F(xk), then solves xk+1=arg minxGk(x)x_{k+1}=\argmin_{x} G_k(x)xk+1=argminxGk(x). The sequence xkx_kxk converges to the minimizer of F(x)F(x)F(x) when F(x)F(x)F(x) is convex. We construct such a function Gk(z)G_k(\mathbf{z})Gk(z) by adding (z−zk)T(αI−D(ATA)†DT)(z−zk)\left(\mathbf{z} - \mathbf{z}_k\right)^T\left(\alpha\mathbf{I} - \mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\right)(\mathbf{z} - \mathbf{z}_k)(z−zk)T(αI−D(ATA)†DT)(z−zk) to Equation (14): Gk(z)=zTD(ATA)†DTz−2λ(DA†v)Tz+(z−zk)T(αI−D(ATA)†DT)(z−zk)\begin{equation} \begin{split} G_k(\mathbf{z}) &= \mathbf{z}^T\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\mathbf{z} - 2\lambda \left(\mathbf{D}\mathbf{A}^{\dag}\mathbf{v}\right)^T\mathbf{z}\\ &+\left(\mathbf{z} - \mathbf{z}_k\right)^T\left(\alpha\mathbf{I} - \mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\right)(\mathbf{z} - \mathbf{z}_k) \end{split} \end{equation} Gk(z)=zTD(ATA)†DTz−2λ(DA†v)Tz+(z−zk)T(αI−D(ATA)†DT)(z−zk) where α≥λ0(D(ATA)†DT)\alpha \ge \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T)α≥λ0(D(ATA)†DT) (the max eigenvalue) such that this term is always positive-semidefinite. Simplifing Equation (15) gives: argmin∣z∣≤1zTz−2(zk+λαDuk+1)Tz\begin{equation} \begin{split} \arg \min_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{z} - 2\left(\mathbf{z}_k + \frac{\lambda}{\alpha} \mathbf{D} \mathbf{u}_{k+1}\right)^T\mathbf{z} \end{split} \end{equation} arg∣z∣≤1minzTz−2(zk+αλDuk+1)Tz which is a separable quadratic optimization problem for each element of z\mathbf{z}z. The solution of the above problem is: zk+1=clip(zk+λαDuk+1,1)\begin{equation} \begin{split} \mathbf{z}_{k+1} &= clip(\mathbf{z}_k + \frac{\lambda}{\alpha} \mathbf{D}\mathbf{u}_{k+1}, 1) \end{split} \end{equation} zk+1=clip(zk+αλDuk+1,1) where clip(x,T)clip(x, T)clip(x,T) is a function clippling the input xxx: clip(x,T)={∣x∣∣x∣≤Tsign(x)Totherwise\begin{equation} \begin{split} clip(x, T) = \begin{cases} |x| & |x| \le T\\ sign(x)T & otherwise\\ \end{cases} \end{split} \end{equation} clip(x,T)={∣x∣sign(x)T∣x∣≤Totherwise We can scale z\mathbf{z}z by 1λ\frac{1}{\lambda}λ1, then the iterative clipping method iterates as the following: uk+1=(ATA)†(ATv−DTzk)zk+1=clip(zk+1αDuk+1,1λ)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\left(\mathbf{A}^T\mathbf{v} - \mathbf{D}^T\mathbf{z}_k\right)\\ \mathbf{z}_{k+1} &= clip(\mathbf{z}_k + \frac{1}{\alpha} \mathbf{D}\mathbf{u}_{k+1}, \frac{1}{\lambda})\\ \end{split} \end{equation} uk+1zk+1=(ATA)†(ATv−DTzk)=clip(zk+α1Duk+1,λ1) where α≥λ0(D(ATA)†DT)\alpha \ge \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T)α≥λ0(D(ATA)†DT). It turns out that the method can be accelerated with a smaller α\alphaα value by contraction mapping principle (basically, it means that there is a fixed point such that f(x)=xf(x)=xf(x)=x). To make zk+1αDuk+1\mathbf{z}_k + \frac{1}{\alpha} \mathbf{D}\mathbf{u}_{k+1}zk+α1Duk+1 a contraction function, we need to make sure I−1αD(ATA)†DT\mathbf{I} - \frac{1}{\alpha}\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^TI−α1D(ATA)†DT is a contraction function. It suggests that α>λ0(D(ATA)†DT)/2\alpha \gt \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T)/2α>λ0(D(ATA)†DT)/2, which halves the original value. If A=I\mathbf{A} = \mathbf{I}A=I, then Equation (7) becomes the naive TV denoising problem. It turns out that λ0(DDT)\lambda_0(\mathbf{D}\mathbf{D}^T)λ0(DDT) is less than 4 regardless of NNN for both definitions in Equation (4) and (5), thus α=2.3\alpha = 2.3α=2.3 is an appropriate option for A=I\mathbf{A} = \mathbf{I}A=I. Majorization Minimization We could also derive an algorithm with the MM method directly. Given F(x)=∣x∣F(x) = |x|F(x)=∣x∣ for scalar xxx, then G(x)=12∣xk∣x2+12∣xk∣G(x) = \frac{1}{2|x_k|}x^2 + \frac{1}{2}|x_k|G(x)=2∣xk∣1x2+21∣xk∣ such that G(x)≥F(x)G(x) \ge F(x)G(x)≥F(x) and G(xk)=F(xk)G(x_k) = F(x_k)G(xk)=F(xk). For the l1l_1l1 norm, G(x)G(\mathbf{x})G(x) is: G(x)=12xTΛk−1x+12∥xk∥1\begin{equation} \begin{split} G(\mathbf{x}) &= \frac{1}{2} \mathbf{x}^T\mathbf{\Lambda}_k^{-1}\mathbf{x} + \frac{1}{2}\|\mathbf{x}_k\|_1 \end{split} \end{equation} G(x)=21xTΛk−1x+21∥xk∥1 where Λk=diag(∣xk∣)\mathbf{\Lambda}_k=diag(|\mathbf{x}_k|)Λk=diag(∣xk∣). A majorizer of the TV cost function in Equation (7) is: arg minu12uTΛk−1u+12∥uk∥1+λ2∥Au−v∥22\begin{equation} \begin{split} \argmin_{\mathbf{u}} \frac{1}{2} \mathbf{u}^T\mathbf{\Lambda}_k^{-1}\mathbf{u} + \frac{1}{2}\|\mathbf{u}_k\|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation} uargmin21uTΛk−1u+21∥uk∥1+2λ∥Au−v∥22 Since the l1l_1l1 term above is now a constant, it’s easier to derive an explicit solution: uk+1=(ATA+1λDTΛk−1D)†ATv\begin{equation} \begin{split} \mathbf{u}_{k+1} = \left(\mathbf{A}^T\mathbf{A} + \frac{1}{\lambda} \mathbf{D}^T\mathbf{\Lambda}_k^{-1}\mathbf{D}\right)^{\dag} \mathbf{A}^T \mathbf{v} \end{split} \end{equation} uk+1=(ATA+λ1DTΛk−1D)†ATv A problem with this iteration form is that as the iterations progress, some values of Λk\mathbf{\Lambda}_kΛk will go to zero, causing division-by-zero errors. This issue can be solved with the matrix inverse lemma: uk+1=((ATA)†−(ATA)†DT(λΛk+D(ATA)†DT)†D(ATA)†)ATv=A†v−(ATA)†DT(λΛk+D(ATA)†DT)†DA†v\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\left(\mathbf{A}^T\mathbf{A}\right)^{\dag} - \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \right)^{\dag} \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \right) \mathbf{A}^T \mathbf{v}\\ &= \mathbf{A}^{\dag}\mathbf{v} - \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T \left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \right)^{\dag} \mathbf{D} \mathbf{A}^{\dag} \mathbf{v} \end{split} \end{equation} uk+1=((ATA)†−(ATA)†DT(λΛk+D(ATA)†DT)†D(ATA)†)ATv=A†v−(ATA)†DT(λΛk+D(ATA)†DT)†DA†v The complexity of the algorithm would depends on how quick to solve linear equations (λΛk+D(ATA)DT)x=DA†v\left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right) \mathbf{D}^T\right) \mathbf{x} = \mathbf{D} \mathbf{A}^{\dag} \mathbf{v}(λΛk+D(ATA)DT)x=DA†v. If A=I\mathbf{A} = \mathbf{I}A=I, then λΛk+DDT\lambda \mathbf{\Lambda}_k + \mathbf{D}\mathbf{D}^TλΛk+DDT is a banded matrix which can be solved fastly. The matrix inverse lemma is: (A+BCD)−1=A−1−A−1B(C−1+DA−1B)−1DA−1\begin{equation} \begin{split} (\mathbf{A} + \mathbf{B}\mathbf{C}\mathbf{D})^{-1} = \mathbf{A}^{-1} - \mathbf{A}^{-1}\mathbf{B}(\mathbf{C}^{-1} + \mathbf{D}\mathbf{A}^{-1}\mathbf{B})^{-1}\mathbf{D}\mathbf{A}^{-1} \end{split} \end{equation} (A+BCD)−1=A−1−A−1B(C−1+DA−1B)−1DA−1 Split Bregman I wrote more about the split Bregman method in another post. Here I briefly give the method. To solve the following problem: arg minx ∥F(x)∥1+λG(x)\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ \|F(\mathbf{x})\|_1 + \lambda G(\mathbf{x}) \end{split} \end{equation} xargmin ∥F(x)∥1+λG(x) where F(x)F(\mathbf{x})F(x) and G(x)G(\mathbf{x})G(x) are both convex and differentiable. F(x)F(\mathbf{x})F(x) is also linear. By setting d=F(x)\mathbf{d} = F(\mathbf{x})d=F(x),the split Bregman method has the following iterations: xk+1=arg minx λG(x)+μ2∥F(x)−dk−bk∥22dk+1=arg mind ∥d∥1+μ2∥F(xk+1)−d−bk∥22bk+1=bk−F(xk+1)+dk+1\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ \lambda G(\mathbf{x}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}_k - \mathbf{b}_k \|_2^2\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}}\ \|\mathbf{d}\|_1 + \frac{\mu}{2}\|F(\mathbf{x}_{k+1}) - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - F(\mathbf{x}_{k+1}) + \mathbf{d}_{k+1} \end{split} \end{equation} xk+1dk+1bk+1=xargmin λG(x)+2μ∥F(x)−dk−bk∥22=dargmin ∥d∥1+2μ∥F(xk+1)−d−bk∥22=bk−F(xk+1)+dk+1 Now back to our 1D TV denoising problem in Equation (7), let d=Du\mathbf{d}=\mathbf{D}\mathbf{u}d=Du and apply the above split Bregman method, we have: uk+1=arg minu λ2∥Au−v∥22+μ2∥Du−dk−bk∥22dk+1=arg mind ∥d∥1+μ2∥Duk+1−d−bk∥22bk+1=bk−Duk+1+dk+1\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}}\ \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 + \frac{\mu}{2}\|\mathbf{D}\mathbf{u} - \mathbf{d}_k - \mathbf{b}_k \|_2^2\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}}\ \|\mathbf{d}\|_1 + \frac{\mu}{2}\|\mathbf{D}\mathbf{u}_{k+1} - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - \mathbf{D}\mathbf{u}_{k+1} + \mathbf{d}_{k+1} \end{split} \end{equation} uk+1dk+1bk+1=uargmin 2λ∥Au−v∥22+2μ∥Du−dk−bk∥22=dargmin ∥d∥1+2μ∥Duk+1−d−bk∥22=bk−Duk+1+dk+1 The first subproblem can be solved by setting the derivative to zero and solving equations: (λATA+μDTD)uk+1=λATv+μDT(dk+bk)\begin{equation} \begin{split} \left(\lambda \mathbf{A}^T\mathbf{A} + \mu \mathbf{D}^T\mathbf{D}\right)\mathbf{u}_{k+1} = \lambda \mathbf{A}^T\mathbf{v} + \mu \mathbf{D}^T(\mathbf{d}_k + \mathbf{b}_k) \end{split} \end{equation} (λATA+μDTD)uk+1=λATv+μDT(dk+bk) and the second subproblem has an explicit solution: dk+1=S1/μ[Duk+1+bk]\begin{equation} \begin{split} \mathbf{d}_{k+1} = S_{1/\mu}[\mathbf{D}\mathbf{u}_{k+1} + \mathbf{b}_k] \end{split} \end{equation} dk+1=S1/μ[Duk+1+bk] where St(⋅)S_t(\cdot)St(⋅) is the soft-thresholding operator introduced in this post. ADMM Alternating Direction Method of Multipliers (ADMM) is definitely the most widely used algorithm to solve L1-regularized problems, especially in MRI. For a separable problem with equality constraints: arg minx,y F(x)+G(y)s.t. Ax+By=c\begin{equation} \begin{split} \argmin_{\mathbf{x}, \mathbf{y}}&\ F(\mathbf{x}) + G(\mathbf{y})\\ &s.t.\ \mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} = \mathbf{c} \end{split} \end{equation} x,yargmin F(x)+G(y)s.t. Ax+By=c with its augmented Lagrangian form Lρ(x,y,v)=F(x)+G(y)+vT(Ax+By−c)+ρ2∥Ax+By−c∥22L_{\rho}(\mathbf{x}, \mathbf{y}, \mathbf{v})= F(\mathbf{x}) + G(\mathbf{y}) + \mathbf{v}^T(\mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} - \mathbf{c}) + \frac{\rho}{2} \|\mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} - \mathbf{c}\|_2^2Lρ(x,y,v)=F(x)+G(y)+vT(Ax+By−c)+2ρ∥Ax+By−c∥22, the ADMM iteration is like: xk+1=arg minxLρ(x,yk,vk)yk+1=arg minyLρ(xk+1,y,vk)vk+1=vk+ρ(Axk+1+Byk+1−c)\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}} L_{\rho}(\mathbf{x}, \mathbf{y}_k, \mathbf{v}_k)\\ \mathbf{y}_{k+1} &= \argmin_{\mathbf{y}} L_{\rho}(\mathbf{x}_{k+1}, \mathbf{y}, \mathbf{v}_k)\\ \mathbf{v}_{k+1} &= \mathbf{v}_k + \rho (\mathbf{A}\mathbf{x}_{k+1} + \mathbf{B}\mathbf{y}_{k+1} - \mathbf{c}) \end{split} \end{equation} xk+1yk+1vk+1=xargminLρ(x,yk,vk)=yargminLρ(xk+1,y,vk)=vk+ρ(Axk+1+Byk+1−c) Let d=Du\mathbf{d}=\mathbf{D}\mathbf{u}d=Du and we would find that our 1D TV denoising problem falls within the ADMM form, thus we have the following iteration: uk+1=arg minuLρ(u,dk,yk)dk+1=arg mindLρ(uk+1,d,yk)yk+1=yk+ρ(Dyk+1−dk+1)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} L_{\rho}(\mathbf{u}, \mathbf{d}_k, \mathbf{y}_k)\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}} L_{\rho}(\mathbf{u}_{k+1}, \mathbf{d}, \mathbf{y}_k)\\ \mathbf{y}_{k+1} &= \mathbf{y}_k + \rho (\mathbf{D}\mathbf{y}_{k+1} -\mathbf{d}_{k+1}) \end{split} \end{equation} uk+1dk+1yk+1=uargminLρ(u,dk,yk)=dargminLρ(uk+1,d,yk)=yk+ρ(Dyk+1−dk+1) where Lρ(u,dk,yk)L_{\rho}(\mathbf{u}, \mathbf{d}_k, \mathbf{y}_k)Lρ(u,dk,yk) is: Lρ(u,d,y)=∥d∥1+λ2∥Au−v∥22+yT(Du−d)+ρ2∥Du−d∥22\begin{equation} \begin{split} L_{\rho}(\mathbf{u}, \mathbf{d}, \mathbf{y}) &= \|\mathbf{d}\|_1 + \frac{\lambda}{2} \|\mathbf{A}\mathbf{u} - \mathbf{v}\|_2^2 \\&+ \mathbf{y}^T(\mathbf{D}\mathbf{u} - \mathbf{d}) + \frac{\rho}{2} \|\mathbf{D}\mathbf{u} - \mathbf{d} \|_2^2 \end{split} \end{equation} Lρ(u,d,y)=∥d∥1+2λ∥Au−v∥22+yT(Du−d)+2ρ∥Du−d∥22 The first subproblem can be solved by setting the derivative to zero: (λATA+ρDTD)uk+1=λATv+DT(ρdk−yk)\begin{equation} \begin{split} (\lambda\mathbf{A}^T\mathbf{A} + \rho \mathbf{D}^T\mathbf{D})\mathbf{u}_{k+1} &= \lambda\mathbf{A}^T\mathbf{v} + \mathbf{D}^T(\rho \mathbf{d}_k - \mathbf{y}_k) \end{split} \end{equation} (λATA+ρDTD)uk+1=λATv+DT(ρdk−yk) and the solution of the second subproblem is: dk+1=S1/ρ[Duk+1+1ρyk]\begin{equation} \begin{split} \mathbf{d}_{k+1} = S_{1/\rho}[\mathbf{D}\mathbf{u}_{k+1} + \frac{1}{\rho} \mathbf{y}_k] \end{split} \end{equation} dk+1=S1/ρ[Duk+1+ρ1yk] 2D TV As we’ve already known in previous section, there are two types of TVs for high-dimensional data. Equation (6) for the 2D case with the isotrophic TV is like: arg minu∈RMN∑i∑j(∇Iu)i,j2+(∇Ju)i,j2+λ2∥Au−v∥22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} \sum_{i} \sum_{j} \sqrt{\left(\nabla_{I} \mathbf{u} \right)_{i,j}^2 + \left(\nabla_{J} \mathbf{u} \right)_{i,j}^2} + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation} u∈RMNargmini∑j∑(∇Iu)i,j2+(∇Ju)i,j2+2λ∥Au−v∥22 where A(⋅) :RMN⟼RKA(\cdot)\colon \mathbb{R}^{MN} \longmapsto \mathbb{R}^{K}A(⋅):RMN⟼RK is a linear function acts on vectorized matrix u\mathbf{u}u and v∈RK\mathbf{v} \in \mathbb{R}^{K}v∈RK is measured signals. With the anisotrophic TV, Equation (6) is going to be like: arg minu∈RMN∑i∑j(∣∇Iu∣i,j+∣∇Ju∣i,j)+λ2∥Au−v∥22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} \sum_{i} \sum_{j} \left(|\nabla_{I} \mathbf{u} |_{i,j} + |\nabla_{J} \mathbf{u} |_{i,j}\right) + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation} u∈RMNargmini∑j∑(∣∇Iu∣i,j+∣∇Ju∣i,j)+2λ∥Au−v∥22 Note that the differential operator ∇I\nabla_{I}∇I acts on vectorized matrices and returns two-dimensional matrices. It’s easier to derive gradients with vectorized matrices rather than matrices themselves. Keep in mind that We don’t actually construct explit differential matrices for ∇I\nabla_{I}∇I and its adjoint. For example, we could reshape vectors into matrices and perform forward differential operations along the corresponding dimension. For ∇I∗\nabla_{I}^*∇I∗, which acts on matrices and returns vectorized matrices, we could perform backward differential operations along the corresponding dimension, then negative all elements, finally reshape matrices into vectors. Split Bregman To apply the split Bregman method to Equation (36), we need to simplify the TV term by setting DI=∇Iu\mathbf{D}_I = \nabla_{I} \mathbf{u}DI=∇Iu and DJ=∇Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}DJ=∇Ju: arg minu∈RMN ∑i∑j(DI)i,j2+(DJ)i,j2+λ2∥Au−v∥22s.t DI=∇Iu DJ=∇Ju\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} &\ \sum_{i} \sum_{j} \sqrt{\left(\mathbf{D}_{I} \right)_{i,j}^2 + \left(\mathbf{D}_{J} \right)_{i,j}^2} + \frac{\lambda}{2} \|\mathbf{A}\mathbf{u} - \mathbf{v}\|_2^2\\ s.t &\ \mathbf{D}_I = \nabla_{I} \mathbf{u}\\ &\ \mathbf{D}_J = \nabla_{J} \mathbf{u} \end{split} \end{equation} u∈RMNargmins.t i∑j∑(DI)i,j2+(DJ)i,j2+2λ∥Au−v∥22 DI=∇Iu DJ=∇Ju The iteration is like: uk+1=arg minuλ2∥Au−v∥22+μ2∥[DI,kDJ,k]−[∇Iu∇Ju]−[BI,kBJ,k]∥F2DI,k+1,DJ,k+1=arg minDI,DJ∑i∑j(DI)i,j2+(DJ)i,j2+μ2∥[DIDJ]−[∇Iuk+1∇Juk+1]−[BI,kBJ,k]∥F2[BI,k+1BJ,k+1]=[BI,kBJ,k]+[∇Iuk+1∇Juk+1]−[DI,k+1DJ,k+1]\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 + \frac{\mu}{2} \|\begin{bmatrix}\mathbf{D}_{I,k}\\ \mathbf{D}_{J,k}\end{bmatrix} - \begin{bmatrix}\nabla_I \mathbf{u} \\ \nabla_J \mathbf{u}\end{bmatrix} - \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix}\|_F^2\\ \mathbf{D}_{I, k+1}, \mathbf{D}_{J, k+1} &= \argmin_{\mathbf{D}_{I}, \mathbf{D}_{J}} \sum_{i} \sum_{j} \sqrt{\left(\mathbf{D}_{I} \right)_{i,j}^2 + \left(\mathbf{D}_{J} \right)_{i,j}^2} + \frac{\mu}{2} \|\begin{bmatrix}\mathbf{D}_{I}\\ \mathbf{D}_{J}\end{bmatrix} - \begin{bmatrix}\nabla_I \mathbf{u}_{k+1} \\ \nabla_J \mathbf{u}_{k+1}\end{bmatrix} - \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix}\|_F^2\\ \begin{bmatrix}\mathbf{B}_{I,k+1} \\ \mathbf{B}_{J,k+1} \end{bmatrix} &= \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix} + \begin{bmatrix}\nabla_I \mathbf{u}_{k+1} \\ \nabla_J \mathbf{u}_{k+1}\end{bmatrix} - \begin{bmatrix}\mathbf{D}_{I,k+1}\\ \mathbf{D}_{J,k+1}\end{bmatrix} \end{split} \end{equation} uk+1DI,k+1,DJ,k+1[BI,k+1BJ,k+1]=uargmin2λ∥Au−v∥22+2μ∥[DI,kDJ,k]−[∇Iu∇Ju]−[BI,kBJ,k]∥F2=DI,DJargmini∑j∑(DI)i,j2+(DJ)i,j2+2μ∥[DIDJ]−[∇Iuk+1∇Juk+1]−[BI,kBJ,k]∥F2=[BI,kBJ,k]+[∇Iuk+1∇Juk+1]−[DI,k+1DJ,k+1] The first subproblem can be solved with setting its derivative to zero: (λA∗A+μ(∇I∗∇I+∇J∗∇J))u=λA∗v+μ(∇I∗ZI,k+∇J∗ZJ,k)ZI,k=DI,k−BI,kZJ,k=DJ,k−BJ,k\begin{equation} \begin{split} \left(\lambda \mathbf{A}^*\mathbf{A} + \mu\left( \nabla_I^*\nabla_I + \nabla_J^*\nabla_J\right)\right)\mathbf{u} &= \lambda\mathbf{A}^*\mathbf{v} + \mu \left(\nabla_I^*\mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k}\right)\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{B}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{B}_{J,k}\\ \end{split} \end{equation} (λA∗A+μ(∇I∗∇I+∇J∗∇J))uZI,kZJ,k=λA∗v+μ(∇I∗ZI,k+∇J∗ZJ,k)=DI,k−BI,k=DJ,k−BJ,k The second subproblem can be solved by letting wi,j=[(DI)i,j(DJ)i,j]\mathbf{w}_{i,j} = \begin{bmatrix}(\mathbf{D}_{I})_{i,j}\\ (\mathbf{D}_{J})_{i,j}\end{bmatrix}wi,j=[(DI)i,j(DJ)i,j], then the subproblem can be transformed into: arg minallwi,js ∑i∑j∥wi,j∥2+μ2∑i∑j∥wi,j−yi,j∥22 yi,j=[(∇Iuk+1−BI,k)i,j(∇Juk+1−BJ,k)i,j]\begin{equation} \begin{split} \argmin_{all \mathbf{w}_{i,j}s} &\ \sum_{i} \sum_{j} \|\mathbf{w}_{i,j}\|_2 + \frac{\mu}{2} \sum_{i} \sum_{j} \|\mathbf{w}_{i,j} - \mathbf{y}_{i,j}\|_2^2\\ &\ \mathbf{y}_{i,j} = \begin{bmatrix} (\nabla_I \mathbf{u}_{k+1}-\mathbf{B}_{I,k})_{i,j} \\ (\nabla_J \mathbf{u}_{k+1}-\mathbf{B}_{J,k})_{i,j}\end{bmatrix} \end{split} \end{equation} allwi,jsargmin i∑j∑∥wi,j∥2+2μi∑j∑∥wi,j−yi,j∥22 yi,j=[(∇Iuk+1−BI,k)i,j(∇Juk+1−BJ,k)i,j] We should know that for the 2nd-norm proximal problem: arg minx ∥x∥2+12t∥x−y∥22\begin{equation} \begin{split} \argmin_{\mathbf{x}} &\ \|\mathbf{x}\|_2 + \frac{1}{2t} \|\mathbf{x} - \mathbf{y}\|_2^2 \end{split} \end{equation} xargmin ∥x∥2+2t1∥x−y∥22 it has an explict solution known as vectorial shrinkage: x∗=St[∥y∥2]y∥y∥2\begin{equation} \begin{split} \mathbf{x}^* = S_t[\|\mathbf{y}\|_2] \frac{\mathbf{y}}{\|\mathbf{y}\|_2} \end{split} \end{equation} x∗=St[∥y∥2]∥y∥2y The equation (41) is obviously separable for all wi,j\mathbf{w}_{i,j}wi,js and each separable subproblem has the solution we list above: wi,j∗=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/μ[∥yi,j∥2]yi,j∥yi,j∥2\begin{equation} \begin{split} \mathbf{w}_{i,j}^* &= \begin{bmatrix}(\mathbf{D}_{I,k+1})_{i,j}\\ (\mathbf{D}_{J,k+1})_{i,j}\end{bmatrix}\\ &= S_{1/\mu}[\|\mathbf{y}_{i,j}\|_2] \frac{\mathbf{y}_{i,j}}{\|\mathbf{y}_{i,j}\|_2} \end{split} \end{equation} wi,j∗=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/μ[∥yi,j∥2]∥yi,j∥2yi,j To apply the split Bregman method to Equation (37) by letting DI=∇Iu\mathbf{D}_I = \nabla_{I} \mathbf{u}DI=∇Iu and DJ=∇Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}DJ=∇Ju, the only difference is the 2nd subproblem: DI,k+1,DJ,k+1=arg minDI,DJ∥vec(DI)∥1+∥vec(DJ)∥1+μ2∥vec(DI−∇Iuk+1−BI,k∥22)+μ2∥vec(DJ−∇Juk+1−BJ,k∥22)\begin{equation} \begin{split} \mathbf{D}_{I, k+1}, \mathbf{D}_{J, k+1} = \argmin_{\mathbf{D}_{I}, \mathbf{D}_{J}} &\|vec(\mathbf{D}_I)\|_1 + \|vec(\mathbf{D}_J)\|_1\\ &+ \frac{\mu}{2} \|vec(\mathbf{D}_I -\nabla_I\mathbf{u}_{k+1} - \mathbf{B}_{I,k}\|_2^2)\\ &+ \frac{\mu}{2} \|vec(\mathbf{D}_J -\nabla_J\mathbf{u}_{k+1} - \mathbf{B}_{J,k}\|_2^2) \end{split} \end{equation} DI,k+1,DJ,k+1=DI,DJargmin∥vec(DI)∥1+∥vec(DJ)∥1+2μ∥vec(DI−∇Iuk+1−BI,k∥22)+2μ∥vec(DJ−∇Juk+1−BJ,k∥22) which can be solved with soft-thresholding: DI,k+1=S1/μ[∇Iuk+1+BI,k]DJ,k+1=S1/μ[∇Juk+1+BJ,k]\begin{equation} \begin{split} \mathbf{D}_{I, k+1} &= S_{1/\mu}[\nabla_I \mathbf{u}_{k+1} + \mathbf{B}_{I, k}]\\ \mathbf{D}_{J, k+1} &= S_{1/\mu}[\nabla_J \mathbf{u}_{k+1} + \mathbf{B}_{J, k}]\\ \end{split} \end{equation} DI,k+1DJ,k+1=S1/μ[∇Iuk+1+BI,k]=S1/μ[∇Juk+1+BJ,k] ADMM To apply ADMM to equation (36) by letting DI=∇Iu\mathbf{D}_I = \nabla_{I} \mathbf{u}DI=∇Iu and DJ=∇Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}DJ=∇Ju, the augmented Lagrangian form is: Lρ(u,DI,DJ,YI,YJ)=∑i∑j(DI)i,j2+(DJ)i,j2+λ2∥Au−v∥22+ρ2∥∇Iu−DI,k+YI,k∥F2+ρ2∥∇Ju−DJ,k+YJ,k∥F2\begin{equation} \begin{split} L_{\rho}(\mathbf{u}, \mathbf{D}_I,\mathbf{D}_J, \mathbf{Y}_I, \mathbf{Y}_J) = &\sum_i\sum_j \sqrt{(\mathbf{D}_I)_{i,j}^2 + (\mathbf{D}_J)_{i,j}^2} + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2\\ &+ \frac{\rho}{2} \|\nabla_I\mathbf{u} - \mathbf{D}_{I,k} + \mathbf{Y}_{I,k}\|_F^2\\ &+ \frac{\rho}{2} \|\nabla_J\mathbf{u} - \mathbf{D}_{J,k} + \mathbf{Y}_{J,k}\|_F^2 \end{split} \end{equation} Lρ(u,DI,DJ,YI,YJ)=i∑j∑(DI)i,j2+(DJ)i,j2+2λ∥Au−v∥22+2ρ∥∇Iu−DI,k+YI,k∥F2+2ρ∥∇Ju−DJ,k+YJ,k∥F2 The ADMM iteration is like: uk+1=arg minuλ2∥Au−v∥22+ρ2∥∇Iu−DI,k+YI,k∥F2+ρ2∥∇Ju−DJ,k+YJ,k∥F2DI,k+1,DJ,k+1=arg minDI,DJ∑i∑j(DI)i,j2+(DJ)i,j2+ρ2∥∇Iuk+1−DI+YI,k∥F2+ρ2∥∇Juk+1−DJ+YJ,k∥F2YI,k+1=YI,k+(∇Iuk+1−DI,k+1)YJ,k+1=YJ,k+(∇Juk+1−DJ,k+1)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 + \frac{\rho}{2} \|\nabla_I\mathbf{u} - \mathbf{D}_{I,k} + \mathbf{Y}_{I,k}\|_F^2 + \frac{\rho}{2} \|\nabla_J\mathbf{u} - \mathbf{D}_{J,k} + \mathbf{Y}_{J,k}\|_F^2\\ \mathbf{D}_{I,k+1}, \mathbf{D}_{J,k+1} &= \argmin_{\mathbf{D}_I, \mathbf{D}_J} \sum_i\sum_j \sqrt{(\mathbf{D}_I)_{i,j}^2 + (\mathbf{D}_J)_{i,j}^2} + \frac{\rho}{2} \|\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I} + \mathbf{Y}_{I,k}\|_F^2 + \frac{\rho}{2} \|\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J} + \mathbf{Y}_{J,k}\|_F^2\\ \mathbf{Y}_{I,k+1} &= \mathbf{Y}_{I,k} + (\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I, k+1})\\ \mathbf{Y}_{J,k+1} &= \mathbf{Y}_{J,k} + (\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J, k+1}) \end{split} \end{equation} uk+1DI,k+1,DJ,k+1YI,k+1YJ,k+1=uargmin2λ∥Au−v∥22+2ρ∥∇Iu−DI,k+YI,k∥F2+2ρ∥∇Ju−DJ,k+YJ,k∥F2=DI,DJargmini∑j∑(DI)i,j2+(DJ)i,j2+2ρ∥∇Iuk+1−DI+YI,k∥F2+2ρ∥∇Juk+1−DJ+YJ,k∥F2=YI,k+(∇Iuk+1−DI,k+1)=YJ,k+(∇Juk+1−DJ,k+1) The solution of the first subproblem is: (λA∗A+ρ(∇I∗∇I+∇J∗∇J))u=λA∗v+ρ(∇I∗ZI,k+∇J∗ZJ,k)ZI,k=DI,k−YI,kZJ,k=DJ,k−YJ,k\begin{equation} \begin{split} (\lambda \mathbf{A}^*\mathbf{A} + \rho (\nabla_I^*\nabla_I + \nabla_J^*\nabla_J))\mathbf{u} &= \lambda \mathbf{A}^*\mathbf{v} + \rho (\nabla_I^* \mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k})\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{Y}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{Y}_{J,k} \end{split} \end{equation} (λA∗A+ρ(∇I∗∇I+∇J∗∇J))uZI,kZJ,k=λA∗v+ρ(∇I∗ZI,k+∇J∗ZJ,k)=DI,k−YI,k=DJ,k−YJ,k The second subproblem can be solved as the same process in split Bregman by letting wi,j=[(DI)i,j(DJ)i,j]\mathbf{w}_{i,j} = \begin{bmatrix}(\mathbf{D}_{I})_{i,j}\\ (\mathbf{D}_{J})_{i,j}\end{bmatrix}wi,j=[(DI)i,j(DJ)i,j]: wi,j∗=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/ρ[∥yi,j∥2]yi,j∥yi,j∥2yi,j=[(∇Iuk+1+YI,k)i,j(∇Juk+1+YJ,k)i,j]\begin{equation} \begin{split} \mathbf{w}_{i,j}^* &= \begin{bmatrix}(\mathbf{D}_{I,k+1})_{i,j}\\ (\mathbf{D}_{J,k+1})_{i,j}\end{bmatrix}\\ &= S_{1/\rho}[\|\mathbf{y}_{i,j}\|_2] \frac{\mathbf{y}_{i,j}}{\|\mathbf{y}_{i,j}\|_2}\\ \mathbf{y}_{i,j} &= \begin{bmatrix} (\nabla_I \mathbf{u}_{k+1}+\mathbf{Y}_{I,k})_{i,j} \\ (\nabla_J \mathbf{u}_{k+1}+\mathbf{Y}_{J,k})_{i,j}\end{bmatrix} \end{split} \end{equation} wi,j∗yi,j=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/ρ[∥yi,j∥2]∥yi,j∥2yi,j=[(∇Iuk+1+YI,k)i,j(∇Juk+1+YJ,k)i,j] The ADMM iteration to equation (37) is like: (λA∗A+ρ(∇I∗∇I+∇J∗∇J))u=λA∗v+ρ(∇I∗ZI,k+∇J∗ZJ,k)ZI,k=DI,k−YI,kZJ,k=DJ,k−YJ,kDI,k+1=S1/ρ[∇Iuk+1+YI,k]DJ,k+1=S1/ρ[∇Juk+1+YJ,k]YI,k+1=YI,k+(∇Iuk+1−DI,k+1)YJ,k+1=YJ,k+(∇Juk+1−DJ,k+1)\begin{equation} \begin{split} (\lambda \mathbf{A}^*\mathbf{A} + \rho (\nabla_I^*\nabla_I + \nabla_J^*\nabla_J))\mathbf{u} &= \lambda \mathbf{A}^*\mathbf{v} + \rho (\nabla_I^* \mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k})\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{Y}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{Y}_{J,k}\\ \mathbf{D}_{I,k+1} &= S_{1/\rho}[\nabla_I\mathbf{u}_{k+1} + \mathbf{Y}_{I,k}]\\ \mathbf{D}_{J,k+1} &= S_{1/\rho}[\nabla_J\mathbf{u}_{k+1} + \mathbf{Y}_{J,k}]\\ \mathbf{Y}_{I,k+1} &= \mathbf{Y}_{I,k} + (\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I, k+1})\\ \mathbf{Y}_{J,k+1} &= \mathbf{Y}_{J,k} + (\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J, k+1}) \end{split} \end{equation} (λA∗A+ρ(∇I∗∇I+∇J∗∇J))uZI,kZJ,kDI,k+1DJ,k+1YI,k+1YJ,k+1=λA∗v+ρ(∇I∗ZI,k+∇J∗ZJ,k)=DI,k−YI,k=DJ,k−YJ,k=S1/ρ[∇Iuk+1+YI,k]=S1/ρ[∇Juk+1+YJ,k]=YI,k+(∇Iuk+1−DI,k+1)=YJ,k+(∇Juk+1−DJ,k+1)
Tensor Decompositions
This post contains some userful tensor notation and tricks which I have seen and collected. The majority of the content is from Tamara Kolda and Brett Bader’s review, Tensor Decompositions and Applications. What is A Tensor The mathematical definition of tensor is hard to understand for me. I would prefer viewing a tensor as a multi-dimensional array. An order-NNN tensor X∈RI1×I2×⋯IN\mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots I_N}X∈RI1×I2×⋯IN is a NNN dimensional array and IkI_kIk is the degree of the kkk-th dimension. So matrix are order-2 tensors and vectors are order-1 tensors. Elements in tensor X\mathcal{X}X are denoted as xi1,i2,⋯ ,iNx_{i_1,i_2,\cdots,i_N}xi1,i2,⋯,iN, where iki_kik is the index running from 1 to IkI_kIk. Norm, Inner Product, and Rank-One The inner product of two same-sized tensors X,Y∈RI1×I2×⋯IN\mathcal{X}, \mathcal{Y} \in \mathbb{R}^{I_1 \times I_2 \times \cdots I_N}X,Y∈RI1×I2×⋯IN is the sum of the product of their elements: ⟨X,Y⟩=∑i1∑i2⋯∑iNxi1,i2,⋯ ,iNyi1,i2,⋯ ,iN\begin{equation} \begin{split} \langle \mathcal{X}, \mathcal{Y} \rangle = \sum_{i_1} \sum_{i_2} \cdots \sum_{i_N} x_{i_1,i_2,\cdots,i_N} y_{i_1,i_2,\cdots,i_N} \end{split} \end{equation} ⟨X,Y⟩=i1∑i2∑⋯iN∑xi1,i2,⋯,iNyi1,i2,⋯,iN By this definition, the norm of a tensor X\mathcal{X}X is the square root of the inner product of X\mathcal{X}X and itself: ∥X∥=⟨X,X⟩\begin{equation} \begin{split} \|\mathcal{X}\| = \sqrt{\langle \mathcal{X}, \mathcal{X} \rangle} \end{split} \end{equation} ∥X∥=⟨X,X⟩ An order-NNN tensor X\mathcal{X}X is rank-one if it can be written as the outer product of NNN vectors: ∥X∥=a1∘⋯∘aN\begin{equation} \begin{split} \|\mathcal{X}\| = \mathbf{a}^1 \circ \cdots \circ \mathbf{a}^N \end{split} \end{equation} ∥X∥=a1∘⋯∘aN where xi1,i2,⋯ ,iN=ai11⋯aiNNx_{i_1,i_2,\cdots,i_N} = a_{i_1}^1 \cdots a_{i_N}^Nxi1,i2,⋯,iN=ai11⋯aiNN. K-mode Product As matrix product, the kkk-mode product between a tensor X\mathcal{X}X and a matrix A∈RJ×Ik\mathbf{A} \in \mathbb{R}^{J \times I_k}A∈RJ×Ik is defined as: Y=X×kA\begin{equation} \begin{split} \mathcal{Y} &= \mathcal{X} \times_k \mathbf{A} \end{split} \end{equation} Y=X×kA where the elements of tensor Y∈RI1×I2×J×⋯IN\mathcal{Y} \in \mathbb{R}^{I_1 \times I_2 \times J \times \cdots I_N}Y∈RI1×I2×J×⋯IN can be computed as: yi1,i2,⋯ ,j,⋯ ,iN=∑ik=1Ikxi1,i2,⋯ ,ik,⋯ ,iNaj,ik\begin{equation} \begin{split} y_{i_1,i_2,\cdots,j,\cdots,i_N} &= \sum_{i_k=1}^{I_k} x_{i_1,i_2,\cdots,i_k,\cdots,i_N} a_{j, i_k} \end{split} \end{equation} yi1,i2,⋯,j,⋯,iN=ik=1∑Ikxi1,i2,⋯,ik,⋯,iNaj,ik Continuous kkk-mode products with matrices A1∈RJ1×I1,⋯ ,AN∈RJN×IN\mathbf{A}^1 \in \mathbb{R}^{J_1 \times I_1},\cdots,\mathbf{A}^N \in \mathbb{R}^{J_N \times I_N}A1∈RJ1×I1,⋯,AN∈RJN×IN are denoted as: Y=X×1A1×2A2⋯×NAN\begin{equation} \begin{split} \mathcal{Y} &= \mathcal{X} \times_1 \mathbf{A}^1 \times_2 \mathbf{A}^2 \cdots \times_N \mathbf{A}^N \end{split} \end{equation} Y=X×1A1×2A2⋯×NAN Intuitively, if no two matrices act on the same mode, then the order of product between them is interchangeable. If two matrices share the same mode, then X×nA×nB=X×n(BA)\begin{equation} \begin{split} \mathcal{X} \times_n \mathbf{A} \times_n \mathbf{B} = \mathcal{X} \times_n \left(\mathbf{B} \mathbf{A}\right) \end{split} \end{equation} X×nA×nB=X×n(BA) Mode-K Unfolding The mode-kkk unfolding of a tensor is a matricization process, that is: X(k)=[⋯v⋯]\begin{equation} \begin{split} \mathcal{X}_{(k)} = \begin{bmatrix}\cdots & \mathbf{v} & \cdots \end{bmatrix} \end{split} \end{equation} X(k)=[⋯v⋯] where X(k)∈RIk×I1⋯Ik−1Ik+1⋯IN\mathcal{X}_{(k)} \in \mathbb{R}^{I_k \times I_1 \cdots I_{k-1}I_{k+1} \cdots I_N}X(k)∈RIk×I1⋯Ik−1Ik+1⋯IN. Each column v\mathbf{v}v is called a fiber, which is just elements along the kkk-th dimension given other indices. For simplicity, let’s assume that this reshape operation follows the row-major layout or C-like order, with the index of the last axis changing the fastest. And the kkk-rank of X\mathcal{X}X is defined as the rank of the mode-k unfolding of X\mathcal{X}X. With the mode-kkk unfolding, the kkk-mode product can be expressed as a normal matrix product: Y(k)=AX(k)\begin{equation} \begin{split} \mathcal{Y}_{(k)} &= \mathbf{A} \mathcal{X}_{(k)} \end{split} \end{equation} Y(k)=AX(k) or for continuous kkk-mode products: Y(k)=AkX(k)(A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN)T\begin{equation} \begin{split} \mathcal{Y}_{(k)} &= \mathbf{A}^k \mathcal{X}_{(k)} \left(\mathbf{A}^1 \otimes \cdots \otimes \mathbf{A}^{k-1} \otimes \mathbf{A}^{k+1} \otimes \cdots \otimes \mathbf{A}^{N} \right)^T \end{split} \end{equation} Y(k)=AkX(k)(A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN)T where ⊗\otimes⊗ is the Kronecker product and A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN∈RJ1J2⋯JN×I1I2⋯IN\mathbf{A}^1 \otimes \cdots \otimes \mathbf{A}^{k-1} \otimes \mathbf{A}^{k+1} \otimes \cdots \otimes \mathbf{A}^{N} \in \mathbb{R}^{J_1J_2\cdots J_N \times I_1I_2\cdots I_N }A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN∈RJ1J2⋯JN×I1I2⋯IN is a super big matrix (you would find that Kolda & Bader matricized the tensor with the column-major layout and the order of Kronecker products was reversed). The Kronecker product of matrices A∈RI×J\mathbf{A} \in \mathbb{R}^{I \times J}A∈RI×J and B∈RK×L\mathbf{B} \in \mathbb{R}^{K \times L}B∈RK×L is defined by: A⊗B=[a1,1B⋯a1,JB⋮⋱⋮aI,1B⋯aI,JB]=[a1⊗b1⋯a1⊗bLa2⊗b1⋯aJ⊗bL]\begin{equation} \begin{split} \mathbf{A} \otimes \mathbf{B} &= \begin{bmatrix} a_{1,1} \mathbf{B} & \cdots & a_{1, J} \mathbf{B}\\ \vdots & \ddots & \vdots\\ a_{I,1} \mathbf{B}& \cdots & a_{I,J} \mathbf{B} \end{bmatrix}\\ &= \begin{bmatrix} \mathbf{a}_1 \otimes \mathbf{b}_1 & \cdots & \mathbf{a}_1\otimes \mathbf{b}_L & \mathbf{a}_2\otimes \mathbf{b}_1 & \cdots & \mathbf{a}_J \otimes \mathbf{b}_L \end{bmatrix} \end{split} \end{equation} A⊗B=a1,1B⋮aI,1B⋯⋱⋯a1,JB⋮aI,JB=[a1⊗b1⋯a1⊗bLa2⊗b1⋯aJ⊗bL] Tensor Decompositions Two most common tensor decompositions are CP and Tucker decompositions, considered to be higher-order generalizations of the matrix SVD and PCA, separately. CP Decomposition CP decomposition (cnanonical decomposition/parallel factors, CANDECOMP/PARAFAC) is to express a tensor as the sum of a finite number of rank-one tensors. For example, given a 3-order tensor X∈RI×J×K\mathcal{X} \in \mathbb{R}^{I \times J \times K}X∈RI×J×K, CP decomposes it as: X≈∑i=1Rai∘bi∘ci\begin{equation} \begin{split} \mathcal{X} &\approx \sum_{i=1}^{R} \mathbf{a}_i \circ \mathbf{b}_i \circ \mathbf{c}_i \end{split} \end{equation} X≈i=1∑Rai∘bi∘ci where the combination of the column vectors are called factor matrices, i.e., A=[a1⋯aR]\mathbf{A} = \begin{bmatrix}\mathbf{a}_1 & \cdots & \mathbf{a}_R \end{bmatrix}A=[a1⋯aR]. Let’s assume that each vector is normalized to value one with a weight vector λ∈RR\mathbf{\lambda} \in \mathbb{R}^Rλ∈RR so that the decomposition is: X≈∑i=1Rλiai∘bi∘ci\begin{equation} \begin{split} \mathcal{X} &\approx \sum_{i=1}^{R} \lambda_i \mathbf{a}_i \circ \mathbf{b}_i \circ \mathbf{c}_i \end{split} \end{equation} X≈i=1∑Rλiai∘bi∘ci For a general NNN-order tensor X∈RI1×I2×⋯×IN\mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots \times I_N}X∈RI1×I2×⋯×IN, with the weight vector λ\mathbf{\lambda}λ and factor matrices Ak∈RIk×R\mathbf{A}^k \in \mathbb{R}^{I_k \times R}Ak∈RIk×R, the CP decomposition is: X≈∑i=1Rλiai1∘ai2∘⋯∘aiN\begin{equation} \begin{split} \mathcal{X} &\approx \sum_{i=1}^{R} \lambda_i \mathbf{a}_i^1 \circ \mathbf{a}_i^2 \circ \cdots \circ \mathbf{a}_i^N \end{split} \end{equation} X≈i=1∑Rλiai1∘ai2∘⋯∘aiN and the mode-kkk matricized version is: X(k)≈AkΛ(A1⊙⋯Ak−1⊙Ak+1⊙⋯AN)T\begin{equation} \begin{split} \mathcal{X}_{(k)} &\approx \mathbf{A}^k \mathbf{\Lambda} \left( \mathbf{A}^1 \odot \cdots \mathbf{A}^{k-1} \odot \mathbf{A}^{k+1} \odot \cdots \mathbf{A}^{N} \right)^T \end{split} \end{equation} X(k)≈AkΛ(A1⊙⋯Ak−1⊙Ak+1⊙⋯AN)T where Λ∈RR×R\mathbf{\Lambda} \in \mathbb{R}^{R \times R}Λ∈RR×R is a diagonal matrix of the weight vector λ\mathbf{\lambda}λ and ⊙\odot⊙ is the Khatri-Rao product. The Khatri-Rao product of two matrices A∈RI×K\mathbf{A} \in \mathbb{R}^{I \times K}A∈RI×K and B∈RJ×K\mathbf{B} \in \mathbb{R}^{J \times K}B∈RJ×K is defined by: A⊙B=[a1⊗b1a2⊗b2⋯aK⊗bK]\begin{equation} \begin{split} \mathbf{A} \odot \mathbf{B} &= \begin{bmatrix} \mathbf{a}_1 \otimes \mathbf{b}_1 & \mathbf{a}_2 \otimes \mathbf{b}_2 & \cdots & \mathbf{a}_K \otimes \mathbf{b}_K \end{bmatrix} \end{split} \end{equation} A⊙B=[a1⊗b1a2⊗b2⋯aK⊗bK] where the resulting matrix is in RIJ×K\mathbb{R}^{IJ \times K}RIJ×K. The rank of a tensor is defined as the smallest number RRR to achieve an exact CP decomposition in Equation (14). The tensor rank is an analogue to the matrix rank, but it has many different properites: the tensor rank may be different over R\mathbb{R}R and C\mathbf{C}C; there is no straightforward algorithm to determine the rank of a given tensor yet; the rank decomposition is generally unique with the exception of permuation and scaling operations. It’s well-known that the best rank-kkk approximation of a rank-RRR matrix A\mathbf{A}A is the leading kkk components of its SVD: A≈∑i=1kλiui∘vi, with λ1≥λ2≥⋯≥λk\begin{equation} \begin{split} \mathbf{A} &\approx \sum_{i=1}^{k} \lambda_i \mathbf{u}_i \circ \mathbf{v}_i,\ with\ \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_k \end{split} \end{equation} A≈i=1∑kλiui∘vi, with λ1≥λ2≥⋯≥λk but this does not hold true for high-order tensors. In fact, the best rank-kkk approximation of a tensor may not exist. Since there is no easy way to determine the rank of a tensor, most CP decomposition procedures would fit multiple models until finding one that is good enough to approximate the given tensor. The Alternating least squares (ALS) method is one of them to compute a CP decomposition by solving one factor matrix each time with others fixed: Ak=X(k)(A1⊙⋯Ak−1⊙Ak+1⋯⊙AN)((A1)TA1∗⋯∗(AN)TAN)†\begin{equation} \begin{split} \mathbf{A}^k = &\mathcal{X}_{(k)} \left(\mathbf{A}^1 \odot \cdots \mathbf{A}^{k-1} \odot \mathbf{A}^{k+1} \cdots \odot \mathbf{A}^N \right)\\ &\left( (\mathbf{A}^1)^T\mathbf{A}^1 \ast \cdots \ast (\mathbf{A}^N)^T\mathbf{A}^N\right)^\dag \end{split} \end{equation} Ak=X(k)(A1⊙⋯Ak−1⊙Ak+1⋯⊙AN)((A1)TA1∗⋯∗(AN)TAN)† where ∗\ast∗ is the elementwise product (Hadamard product) and †\dag† is the pseudo-inverse. λ\mathbf{\lambda}λ can be computed by normalizing the columns of Ak\mathbf{A}^kAk. The ALS method iterates until the stopping criteria satisfied. However, the ALS method is not guaranteed to converge to a global minimum or even a stationary point. Tucker Decomposition The Tucker decomposition decomposes a tensor into a core tensor multiplied by a factor matrix along each mode: X=G×1A1×2A2⋯×NAN\begin{equation} \begin{split} \mathcal{X} &= \mathcal{G} \times_1 \mathbf{A}^1 \times_2 \mathbf{A}^2 \cdots \times_N \mathbf{A}^N \end{split} \end{equation} X=G×1A1×2A2⋯×NAN where G∈RR1×⋯×RN\mathcal{G} \in \mathbb{R}^{R_1 \times \cdots \times R_N}G∈RR1×⋯×RN, Ak∈RIk×Rk\mathbf{A}^k \in \mathbb{R}^{I_k \times R_k}Ak∈RIk×Rk and Ak\mathbf{A}^kAk is orthogonal. It can be seen that the CP decomposition is a special case of the Tucker decomposition where the core tensor is superdiagonal and R1=R2=⋯=RNR_1 = R_2 = \cdots = R_NR1=R2=⋯=RN. An important concept of Tucker Decomposition is the nnn-rank of X\mathcal{X}X, denoted rankn(X)rank_n(\mathcal{X})rankn(X), which is the rank of X(n)\mathcal{X}_{(n)}X(n). The Tucker decomposition requires a rank group (R1,R2,⋯ ,RN)(R_1,R_2,\cdots, R_N)(R1,R2,⋯,RN), which is also the size of the core tensor G\mathcal{G}G. Higher-order orthogonal iteration (HOOI) solves Ak\mathbf{A}^kAk with others fixed, as the ALS method: Y=X×1(A1)T⋯×k−1(Ak−1)T×k+1(Ak+1)T⋯×N(AN)TY(k)=UΣVTAk=[u1u2⋯uRn]\begin{equation} \begin{split} \mathcal{Y} &= \mathcal{X} \times_1 (\mathbf{A}^1)^T \cdots \times_{k-1}\\ &(\mathbf{A}^{k-1})^T \times_{k+1} (\mathbf{A}^{k+1})^T \cdots \times_{N} (\mathbf{A}^{N})^T\\ \mathcal{Y}_{(k)} &= \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T\\ \mathbf{A}^k &= \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \cdots & \mathbf{u}_{R_n}\end{bmatrix} \end{split} \end{equation} YY(k)Ak=X×1(A1)T⋯×k−1(Ak−1)T×k+1(Ak+1)T⋯×N(AN)T=UΣVT=[u1u2⋯uRn] and the core tensor G=X×1(A1)T×⋯×(AN)T\mathcal{G} = \mathcal{X} \times_1 (\mathbf{A}^{1})^T \times \cdots \times (\mathbf{A}^{N})^TG=X×1(A1)T×⋯×(AN)T. Note that the Tucker decomposition is not unique.
Preconditioned Conjugate Gradient Method
This weekend I learned Preconditioned Conjugate Gradient method with Jonathan Richard Schewchuk’s lecture note “An Introduction to the Conjugate Gradient Method Without the Agonizing Path”. Here I document what I have learned from the note. TL;DR In short, all methods mentioned in this post are proposed to iteratively solve linear equations Ax=b\mathbf{A} \mathbf{x} = \mathbf{b}Ax=b, assuming A\mathbf{A}A is symmetric and positive-definite. Steepest Gradient Method (SG), which moves along the direciton of residual or negative gradient each step, converges slowly and requires more matrix-vector multiply operations. SG can be improved by moving along the conjugate direction instead of residual direction each step, which converges faster and reduces matrix-vector multiply operations efficiently. However, it requires construct conjugate directions in advance, which sometimes costs as much as with SG. Conjugate Gradient Method (CG) is an efficient algorithm to compute conjugate directions on the fly by constructing conjugate directions with residuals, making it a practical method to solve equations. Finally, Preconditioned Conjugate Method (PCG) is a further improvement to convergence rate by reducing the condition number of A\mathbf{A}A. Steepest Gradient Method Considering the following minimization problem: arg minx∈Rn 12xTAx−bTx+c\begin{equation} \begin{split} \argmin_{\mathbf{x} \in \mathbb{R}^{n}} &\ \frac{1}{2} \mathbf{x}^T \mathbf{A} \mathbf{x} - \mathbf{b}^T\mathbf{x} + c \end{split} \end{equation} x∈Rnargmin 21xTAx−bTx+c where A∈Rn×n\mathbf{A} \in \mathbb{R}^{n \times n}A∈Rn×n is symmetric and positive-definite. Since this is a convex problem, the global minimizer exists and can be derived with setting the gradient f′(x)=xTA−bT∈R1×nf'(\mathbf{x}) = \mathbf{x}^T\mathbf{A} - \mathbf{b}^T \in \mathbb{R}^{1 \times n}f′(x)=xTA−bT∈R1×n to zero. Solving the minimization problem above is equivalent to solving the linear equations Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}Ax=b. Recall that the residual rk\mathbf{r}_krk, error ek\mathbf{e}_kek are defined as follows: rk=b−Axkek=xk−x⋆\begin{equation} \begin{split} \mathbf{r}_k &= \mathbf{b} - \mathbf{A} \mathbf{x}_k \\ \mathbf{e}_k &= \mathbf{x}_k - \mathbf{x}^{\star}\\ \end{split} \end{equation} rkek=b−Axk=xk−x⋆ where x⋆\mathbf{x}^{\star}x⋆ is the solution of Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}Ax=b. The residual and error item are related by: Aek=A(xk−x⋆)=Axk−b+b−Ax⋆=−rk\begin{equation} \begin{split} \mathbf{A} \mathbf{e}_k &= \mathbf{A} \left( \mathbf{x}_k - \mathbf{x}^{\star} \right)\\ &= \mathbf{A} \mathbf{x}_k - \mathbf{b} + \mathbf{b} - \mathbf{A} \mathbf{x}^{\star}\\ &= -\mathbf{r}_{k} \end{split} \end{equation} Aek=A(xk−x⋆)=Axk−b+b−Ax⋆=−rk With these notations, Gradient Descent Method updates each point as follows: xk+1=xk+αkrk\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{r}_k \\ \end{split} \end{equation} xk+1=xk+αkrk where αk\alpha_kαk is the k-th step and rk\mathbf{r}_krk is equal to the negative gradient. SG uses line search to determine how big a step is taken. By the chain rule ddαkf(xk+1)=f′(xk+1)ddαkxk+1\frac{d}{d\alpha_k} f(\mathbf{x}_{k+1})= f'(\mathbf{x}_{k+1}) \frac{d}{d\alpha_k}\mathbf{x}_{k+1}dαkdf(xk+1)=f′(xk+1)dαkdxk+1 and setting the gradient to zero, we have the following αk\alpha_kαk: αk=rkTrkrkTArk\begin{equation} \begin{split} \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{r}_k}{\mathbf{r}_k^T \mathbf{A} \mathbf{r}_k} \\ \end{split} \end{equation} αk=rkTArkrkTrk The SG is: r0=b−Ax0αk=rkTrkrkTArkxk+1=xk+αkrkrk+1=b−Axk+1\begin{equation} \begin{split} \mathbf{r}_0 &= \mathbf{b} - \mathbf{A}\mathbf{x}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{r}_k}{\mathbf{r}_k^T \mathbf{A} \mathbf{r}_k} \\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{r}_k \\ \mathbf{r}_{k+1} &= \mathbf{b} - \mathbf{A} \mathbf{x}_{k+1} \end{split} \end{equation} r0αkxk+1rk+1=b−Ax0=rkTArkrkTrk=xk+αkrk=b−Axk+1 or eliminates one matrix-vector multiply (Axk+1\mathbf{A} \mathbf{x}_{k+1}Axk+1) by iterating residuals as follows: r0=b−Ax0αk=rkTrkrkTArkxk+1=xk+αkrkrk+1=rk−αkArk\begin{equation} \begin{split} \mathbf{r}_0 &= \mathbf{b} - \mathbf{A}\mathbf{x}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{r}_k}{\mathbf{r}_k^T \mathbf{A} \mathbf{r}_k} \\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{r}_k \\ \mathbf{r}_{k+1} &= \mathbf{r}_k - \alpha_k \mathbf{A} \mathbf{r}_k \\ \end{split} \end{equation} r0αkxk+1rk+1=b−Ax0=rkTArkrkTrk=xk+αkrk=rk−αkArk Conjugate Directions SG often finds itself taking similar directions. It would be better if we took a step and got it right the first time. An analogous example is that a person stands at the top left corner of a 5 by 5 grid, aiming to reach to the bottom right corner by moving either right or down a few grids in each step. We could move one grid at a time following a zig-zag path, or we could move to the bottom first and then move to the end with just two steps. In short, we’ll take exactly one step for each direction (totally n steps). Given n directions {dk}\{\mathbf{d}_k\}{dk}, the update formula is as follows: xk+1=xk+αkdk\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{d}_k \\ \end{split} \end{equation} xk+1=xk+αkdk which is exactly the update formula of GD with replacing rk\mathbf{r}_krk by dk\mathbf{d}_kdk. These directions {dk}\{\mathbf{d}_k\}{dk} should be orthogonal to each other, and ek+1\mathbf{e}_{k+1}ek+1 should be orthogonal to dk\mathbf{d}_{k}dk, otherwise you would always find a direction aligned with previous directions in next updates. Notice that by subtracting x⋆\mathbf{x}^{\star}x⋆ on both sides, Equation (8) becomes: xk+1−x⋆=xk−x⋆+αkdkek+1=ek+αkdk\begin{equation} \begin{split} \mathbf{x}_{k+1} - \mathbf{x}^{\star} &= \mathbf{x}_k - \mathbf{x}^{\star} + \alpha_k \mathbf{d}_k \\ \mathbf{e}_{k+1} &= \mathbf{e}_{k} + \alpha_k \mathbf{d}_k \end{split} \end{equation} xk+1−x⋆ek+1=xk−x⋆+αkdk=ek+αkdk With orthogonal conditions and Equation (9), we have: dkTek+1=0dkT(ek+αkdk)=0αk=−dkTekdkTdk\begin{equation} \begin{split} \mathbf{d}_{k}^T \mathbf{e}_{k+1} &= 0\\ \mathbf{d}_{k}^T \left(\mathbf{e}_{k} + \alpha_k \mathbf{d}_k\right) &= 0\\ \alpha_k &= -\frac{\mathbf{d}_{k}^T \mathbf{e}_{k}}{\mathbf{d}_{k}^T\mathbf{d}_{k}} \end{split} \end{equation} dkTek+1dkT(ek+αkdk)αk=0=0=−dkTdkdkTek We haven’t known ek\mathbf{e}_kek yet, thus Equation (10) can’t be used to calculate αk\alpha_kαk. The solution is to use A-orthogonal instead of orthogonal. Two vectors di\mathbf{d}_idi and dj\mathbf{d}_jdj are A-orthogonal or conjugate, if: diTAdj=0\begin{equation} \begin{split} \mathbf{d}_{i}^T \mathbf{A} \mathbf{d}_{j} &= 0\\ \end{split} \end{equation} diTAdj=0 Once again, without steping into previous directions, the new requirement is that ek+1\mathbf{e}_{k+1}ek+1 be A-orthogonal to dk\mathbf{d}_kdk. This equation is equivalent to finding the minimum point along the search direction dk\mathbf{d}_{k}dk with the line search method: ddαkf(xk+1)=0f′(xk+1)ddαkxk+1=0−rk+1Tdk=0ek+1TAdk=0\begin{equation} \begin{split} \frac{d}{d\alpha_k} f(\mathbf{x}_{k+1}) &= 0\\ f'(\mathbf{x}_{k+1}) \frac{d}{d\alpha_k}\mathbf{x}_{k+1} &= 0\\ -\mathbf{r}_{k+1}^T \mathbf{d}_k &= 0\\ \mathbf{e}_{k+1}^T \mathbf{A} \mathbf{d}_k &= 0 \end{split} \end{equation} dαkdf(xk+1)f′(xk+1)dαkdxk+1−rk+1Tdkek+1TAdk=0=0=0=0 By Equation (9), αk\alpha_kαk is computed as: ek+1TAdk=0(ek+αkdk)TAdk=0αk=−ekTAdkdkTAdk=rkTdkdkTAdk\begin{equation} \begin{split} \mathbf{e}_{k+1}^T \mathbf{A} \mathbf{d}_k &= 0\\ \left( \mathbf{e}_k + \alpha_k \mathbf{d}_k \right)^T \mathbf{A} \mathbf{d}_k &= 0\\ \alpha_k &= - \frac{\mathbf{e}_k^T \mathbf{A} \mathbf{d}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}\\ &= \frac{\mathbf{r}_k^T \mathbf{d}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k} \end{split} \end{equation} ek+1TAdk(ek+αkdk)TAdkαk=0=0=−dkTAdkekTAdk=dkTAdkrkTdk Similar to the SG method, the iterative formulas are as follows: r0=b−Ax0αk=rkTdkdkTAdkxk+1=xk+αkdkrk+1=rk−αkAdk\begin{equation} \begin{split} \mathbf{r}_0 &= \mathbf{b} - \mathbf{A}\mathbf{x}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{d}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}\\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{d}_k\\ \mathbf{r}_{k+1} &= \mathbf{r}_k - \alpha_k \mathbf{A} \mathbf{d}_k\\ \end{split} \end{equation} r0αkxk+1rk+1=b−Ax0=dkTAdkrkTdk=xk+αkdk=rk−αkAdk We can prove that the error item e0\mathbf{e}_0e0 exactly converges to zero after n steps with these formulas. By Equation (9), we can have that ek=e0+Σi=0k−1αidi\mathbf{e}_k = \mathbf{e}_0 + \Sigma_{i=0}^{k-1} \alpha_i \mathbf{d}_iek=e0+Σi=0k−1αidi. Suppose e0=Σi=0nδidi\mathbf{e}_0 = \Sigma_{i=0}^{n} \delta_i \mathbf{d}_ie0=Σi=0nδidi ({dk}\{\mathbf{d}_k\}{dk} span the whole linear space), all the δi\delta_iδi values can be found by multiplying the expression by dkTA\mathbf{d}_k^T\mathbf{A}dkTA: dkTAe0=Σi=0nδidkTAdi=δkdkTAdkδk=dkTAe0dkTAdk=dkTA(e0+Σi=0k−1αidi)dkTAdk=dkTAekdkTAdk=−dkTrkdkTAdk=−αk\begin{equation} \begin{split} \mathbf{d}_k^T \mathbf{A} \mathbf{e}_0 &= \Sigma_{i=0}^{n} \delta_i \mathbf{d}_k^T\mathbf{A}\mathbf{d}_i\\ &= \delta_k \mathbf{d}_k^T\mathbf{A}\mathbf{d}_k\\ \delta_k &= \frac{\mathbf{d}_k^T \mathbf{A} \mathbf{e}_0}{\mathbf{d}_k^T\mathbf{A}\mathbf{d}_k}\\ &= \frac{\mathbf{d}_k^T \mathbf{A} \left(\mathbf{e}_0 + \Sigma_{i=0}^{k-1}\alpha_i \mathbf{d}_i \right)}{\mathbf{d}_k^T\mathbf{A}\mathbf{d}_k}\\ &= \frac{\mathbf{d}_k^T \mathbf{A} \mathbf{e}_k}{\mathbf{d}_k^T\mathbf{A}\mathbf{d}_k}\\ &= - \frac{\mathbf{d}_k^T \mathbf{r}_k}{\mathbf{d}_k^T\mathbf{A}\mathbf{d}_k}\\ &= - \alpha_k \end{split} \end{equation} dkTAe0δk=Σi=0nδidkTAdi=δkdkTAdk=dkTAdkdkTAe0=dkTAdkdkTA(e0+Σi=0k−1αidi)=dkTAdkdkTAek=−dkTAdkdkTrk=−αk where δk\delta_kδk is exactly equal to the −αk-\alpha_k−αk value. This fact demonstrates that we can eliminate one component of e0\mathbf{e}_0e0 each step. Consequently, the remaining error item ek\mathbf{e}_kek is: ek=e0+Σi=0k−1αidi=Σi=knδidi\begin{equation} \begin{split} \mathbf{e}_k &= \mathbf{e}_0 + \Sigma_{i=0}^{k-1} \alpha_i \mathbf{d}_i\\ &= \Sigma_{i=k}^{n} \delta_i \mathbf{d}_i \end{split} \end{equation} ek=e0+Σi=0k−1αidi=Σi=knδidi Conjugate Gradient Method The remaining question is that how to compute a set of A-orthogonal directions {dk}\{\mathbf{d}_k\}{dk}. Here we employ the Gram-Schmidt process. Suppose we have a set of n linearly independent vectors {uk}\{ \mathbf{u}_k \}{uk} and let d0=u0\mathbf{d}_0 = \mathbf{u}_0d0=u0, dk\mathbf{d}_kdk is constructed as: dk=uk+Σi=0k−1βk,idi\begin{equation} \begin{split} \mathbf{d}_k = \mathbf{u}_k + \Sigma_{i=0}^{k-1} \beta_{k,i} \mathbf{d}_{i} \end{split} \end{equation} dk=uk+Σi=0k−1βk,idi where βk,i\beta_{k,i}βk,i is computed as: diTAdk=diTA(uk+Σj=0k−1βk,jdj)0=diTAuk+βk,idiTAdiβk,i=−diTAukdiTAdi\begin{equation} \begin{split} \mathbf{d}_i^T\mathbf{A}\mathbf{d}_k &= \mathbf{d}_i^T\mathbf{A}\left(\mathbf{u}_k + \Sigma_{j=0}^{k-1} \beta_{k,j} \mathbf{d}_{j}\right)\\ 0 &= \mathbf{d}_i^T\mathbf{A}\mathbf{u}_k + \beta_{k,i} \mathbf{d}_i^T\mathbf{A}\mathbf{d}_i\\ \beta_{k,i} &= - \frac{\mathbf{d}_i^T\mathbf{A}\mathbf{u}_k}{\mathbf{d}_i^T\mathbf{A}\mathbf{d}_i} \end{split} \end{equation} diTAdk0βk,i=diTA(uk+Σj=0k−1βk,jdj)=diTAuk+βk,idiTAdi=−diTAdidiTAuk The drawback of the contruction process is that it keeps all older directions in memory to create a new direction. CG employs a more efficient way to constructing these directions by initializing the vectors with residuals uk=rk\mathbf{u}_k = \mathbf{r}_kuk=rk. With these starting points, we will see that most computations can be eliminated. By Equation (17) and (18), dk\mathbf{d}_kdk is constructed as: dk=rk+Σi=0k−1βk,idiβk,i=−diTArkdiTAdi\begin{equation} \begin{split} \mathbf{d}_k &= \mathbf{r}_k + \Sigma_{i=0}^{k-1} \beta_{k,i} \mathbf{d}_{i}\\ \beta_{k,i} &= - \frac{\mathbf{d}_i^T\mathbf{A}\mathbf{r}_k}{\mathbf{d}_i^T\mathbf{A}\mathbf{d}_i} \end{split} \end{equation} dkβk,i=rk+Σi=0k−1βk,idi=−diTAdidiTArk Let’s consider the numerator of βk,i\beta_{k,i}βk,i with Equation (14): diTArk=1αi(ri−ri+1)Trk=1αi(riTrk−ri+1Trk),i=0,1,...,k−1\begin{equation} \begin{split} \mathbf{d}_i^T\mathbf{A}\mathbf{r}_k &= \frac{1}{\alpha_i}(\mathbf{r}_i - \mathbf{r}_{i+1})^T \mathbf{r}_k\\ &= \frac{1}{\alpha_i}(\mathbf{r}_i^T\mathbf{r}_k - \mathbf{r}_{i+1}^T\mathbf{r}_k), i=0,1,...,k-1 \end{split} \end{equation} diTArk=αi1(ri−ri+1)Trk=αi1(riTrk−ri+1Trk),i=0,1,...,k−1 Luckily, we shall see that rk\mathbf{r}_krk is orthogonal to previous residuals. Refering to Equation (16), the error item ek=Σi=knδidi\mathbf{e}_k = \Sigma_{i=k}^{n} \delta_i \mathbf{d}_iek=Σi=knδidi, we have the following equation by multiplying −djTA-\mathbf{d}_j^T \mathbf{A}−djTA on both sides: −djTAek=−Σi=knδidjTAdi,j<kdjTrk=0,j<k\begin{equation} \begin{split} -\mathbf{d}_j^T \mathbf{A} \mathbf{e}_k &= -\Sigma_{i=k}^{n} \delta_i \mathbf{d}_j^T \mathbf{A} \mathbf{d}_i, j<k\\ \mathbf{d}_j^T \mathbf{r}_k &= 0, j<k \end{split} \end{equation} −djTAekdjTrk=−Σi=knδidjTAdi,j<k=0,j<k Thus the residuals are orthogonal to previous conjugate directions if we choose the residuals as starting vectors to construct these directions. According to Equation (19), we further conclude that the residuals are orthogonal to previous residuals: djTrk=0,j<k(rj+Σi=0j−1βj,idi)Trk=0rjTrk=0,j<k\begin{equation} \begin{split} \mathbf{d}_j^T \mathbf{r}_k &= 0, j<k\\ \left(\mathbf{r}_j + \Sigma_{i=0}^{j-1} \beta_{j,i} \mathbf{d}_{i}\right)^T\mathbf{r}_k &= 0\\ \mathbf{r}_j^T \mathbf{r}_k &= 0,j<k \end{split} \end{equation} djTrk(rj+Σi=0j−1βj,idi)TrkrjTrk=0,j<k=0=0,j<k By Equation (21) and (22), βk,i\beta_{k,i}βk,i is simplified as: βk,i={1αk−1rkTrkdk−1TAdk−1i=k−10otherwise\begin{equation} \begin{split} \beta_{k,i} = \begin{cases} \frac{1}{\alpha_{k-1}} \frac{\mathbf{r}_k^T\mathbf{r}_k}{\mathbf{d}_{k-1}^T\mathbf{A}\mathbf{d}_{k-1}} & i = k-1\\ 0 & otherwise\\ \end{cases} \end{split} \end{equation} βk,i={αk−11dk−1TAdk−1rkTrk0i=k−1otherwise where we only need to keep one previous direction instead of all directions. The method of CG is: d0=r0=b−Ax0αk=rkTdkdkTAdk=rkTrkdkTAdkxk+1=xk+αkdkrk+1=rk−αkAdkβk+1=1αkrk+1Trk+1dkTAdk=dkTAdkrkTrkrk+1Trk+1dkTAdk=rk+1Trk+1rkTrkdk+1=rk+1+βk+1dk\begin{equation} \begin{split} \mathbf{d}_0 = \mathbf{r}_0 &= \mathbf{b} - \mathbf{A}\mathbf{x}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{d}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k} = \frac{\mathbf{r}_k^T \mathbf{r}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}\\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{d}_k\\ \mathbf{r}_{k+1} &= \mathbf{r}_k - \alpha_k \mathbf{A} \mathbf{d}_k\\ \beta_{k+1} &= \frac{1}{\alpha_{k}} \frac{\mathbf{r}_{k+1}^T\mathbf{r}_{k+1}}{\mathbf{d}_{k}^T\mathbf{A}\mathbf{d}_{k}}\\ &= \frac{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}{\mathbf{r}_k^T \mathbf{r}_k}\frac{\mathbf{r}_{k+1}^T\mathbf{r}_{k+1}}{\mathbf{d}_{k}^T\mathbf{A}\mathbf{d}_{k}}\\ &= \frac{\mathbf{r}_{k+1}^T\mathbf{r}_{k+1}}{\mathbf{r}_k^T \mathbf{r}_k}\\ \mathbf{d}_{k+1} &= \mathbf{r}_{k+1} + \beta_{k+1} \mathbf{d}_{k}\\ \end{split} \end{equation} d0=r0αkxk+1rk+1βk+1dk+1=b−Ax0=dkTAdkrkTdk=dkTAdkrkTrk=xk+αkdk=rk−αkAdk=αk1dkTAdkrk+1Trk+1=rkTrkdkTAdkdkTAdkrk+1Trk+1=rkTrkrk+1Trk+1=rk+1+βk+1dk Preconditioned Conjugate Gradient Method The analysis of complexity of CG shows that a small condition number κ(A)\kappa(\mathbf{A})κ(A) would improve the convergence rate. Preconditioning is one such technique for reducing the condition number of a matrix. Let’s consider a symmetric, positive-definite matrix M\mathbf{M}M that approximates A\mathbf{A}A but is easier to invert. Instead of solving Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}Ax=b, we solve: M−1Ax=M−1b\begin{equation} \begin{split} \mathbf{M}^{-1}\mathbf{A}\mathbf{x} = \mathbf{M}^{-1}\mathbf{b} \end{split} \end{equation} M−1Ax=M−1b by ensuring that κ(M−1A)≪κ(A)\kappa(\mathbf{M}^{-1}\mathbf{A}) \ll \kappa(\mathbf{A})κ(M−1A)≪κ(A). The problem is that M−1A\mathbf{M}^{-1}\mathbf{A}M−1A is generally neither symmetric nor positive-definite, even thought M\mathbf{M}M and A\mathbf{A}A are. The solution is to decompose M\mathbf{M}M with Cholesky decompostion. The Cholesky decompostion is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, denoted as M=LLT\mathbf{M}=\mathbf{L}\mathbf{L}^TM=LLT. We then solve the following equation: L−1A(L−1)TLTx=L−1bL−1A(L−1)Tx^=b^\begin{equation} \begin{split} \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T \mathbf{L}^T \mathbf{x} &= \mathbf{L}^{-1}\mathbf{b}\\ \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T \hat{\mathbf{x}} &= \hat{\mathbf{b}}\\ \end{split} \end{equation} L−1A(L−1)TLTxL−1A(L−1)Tx^=L−1b=b^ where x^=LTx\hat{\mathbf{x}} = \mathbf{L}^T \mathbf{x}x^=LTx and b^=L−1b\hat{\mathbf{b}} = \mathbf{L}^{-1}\mathbf{b}b^=L−1b. Notice that L−1A(L−1)T\mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^TL−1A(L−1)T is symmetric and positive-definite, allowing us to apply CG directly. Now the method of PCG is: d^0=r^0=b−L−1A(L−1)Tx^0αk=r^kTr^kd^kTL−1A(L−1)Td^kx^k+1=x^k+αkd^kr^k+1=r^k−αkL−1A(L−1)Td^kβk+1=r^k+1Tr^k+1r^kTr^kd^k+1=r^k+1+βk+1d^k\begin{equation} \begin{split} \hat{\mathbf{d}}_0 = \hat{\mathbf{r}}_0 &= \mathbf{b} - \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T\hat{\mathbf{x}}_0\\ \alpha_k &= \frac{\hat{\mathbf{r}}_k^T \hat{\mathbf{r}}_k}{\hat{\mathbf{d}}_k^T \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T \hat{\mathbf{d}}_k}\\ \hat{\mathbf{x}}_{k+1} &= \hat{\mathbf{x}}_k + \alpha_k \hat{\mathbf{d}}_k\\ \hat{\mathbf{r}}_{k+1} &= \hat{\mathbf{r}}_k - \alpha_k \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T \hat{\mathbf{d}}_k\\ \beta_{k+1} &= \frac{\hat{\mathbf{r}}_{k+1}^T\hat{\mathbf{r}}_{k+1}}{\hat{\mathbf{r}}_k^T \hat{\mathbf{r}}_k}\\ \hat{\mathbf{d}}_{k+1} &= \hat{\mathbf{r}}_{k+1} + \beta_{k+1} \hat{\mathbf{d}}_{k}\\ \end{split} \end{equation} d^0=r^0αkx^k+1r^k+1βk+1d^k+1=b−L−1A(L−1)Tx^0=d^kTL−1A(L−1)Td^kr^kTr^k=x^k+αkd^k=r^k−αkL−1A(L−1)Td^k=r^kTr^kr^k+1Tr^k+1=r^k+1+βk+1d^k With some substitutions, we can further eliminate L−1\mathbf{L}^{-1}L−1: r0=b−Ax0d0=M−1r0αk=rkTM−1rkdkTAdkxk+1=xk+αkdkrk+1=rk−αkAdkβk+1=rk+1TM−1rk+1rkTM−1rkdk+1=M−1rk+1+βk+1dk\begin{equation} \begin{split} \mathbf{r}_0 &= \mathbf{b} - \mathbf{A} \mathbf{x}_0\\ \mathbf{d}_0 &= \mathbf{M}^{-1} \mathbf{r}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{M}^{-1} \mathbf{r}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}\\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{d}_k\\ \mathbf{r}_{k+1} &= \mathbf{r}_k - \alpha_k \mathbf{A} \mathbf{d}_k\\ \beta_{k+1} &= \frac{\mathbf{r}_{k+1}^T \mathbf{M}^{-1} \mathbf{r}_{k+1}}{\mathbf{r}_k^T \mathbf{M}^{-1} \mathbf{r}_k}\\ \mathbf{d}_{k+1} &= \mathbf{M}^{-1} \mathbf{r}_{k+1} + \beta_{k+1} \mathbf{d}_{k}\\ \end{split} \end{equation} r0d0αkxk+1rk+1βk+1dk+1=b−Ax0=M−1r0=dkTAdkrkTM−1rk=xk+αkdk=rk−αkAdk=rkTM−1rkrk+1TM−1rk+1=M−1rk+1+βk+1dk M\mathbf{M}M is called a preconditioner and the effectiveness of PCG depends on finding a preconditioner that approximates A\mathbf{A}A well enough without too much cost of computing M−1\mathbf{M}^{-1}M−1. In fact, it’s unnecessary to know the explicit form of M−1\mathbf{M}^{-1}M−1, Instead, by directly computing M−1rk\mathbf{M}^{-1}\mathbf{r}_kM−1rk, which can be viewed as an operator with input rk\mathbf{r}_krk achieving the same effect of matrix-vector product. The implementation of PCG benefits from this operator concept, extending it to more general cases. Implementation Considerations Starting point If you have an estimate of x\mathbf{x}x, use it as the starting point x0\mathbf{x}_0x0; otherwise use the zero vector. Stopping criterion Theoretically, CG/PCG finds the solution when the residual becomes zero after n steps. Therefore, the maximum number of iterations would never exceed n; otherwise a division-by-zero error would occur. We may choose to stop the algorithm immediately when the residual falls below a specified threshold. A common criterion is ∥rk∥2<ϵ∥r0∥2\|\mathbf{r}_k\|_2 \lt \epsilon \|\mathbf{r}_0\|_2∥rk∥2<ϵ∥r0∥2. The accumulated roundoff error in the residul iteration formula may yield a false zero residual. To address this issue, once we have a residual falls below the stopping value, we recalculate the residual using the definition (Equation (2)) and double-check if it still falls below the stopping threshold. If not, we restart the iteration. Matlab’s implementation does a little more than that. It checks whether the current direction is small enough in each iteration with the criterion ∥αkdk∥2<ϵ∥xk∥2\|\alpha_k \mathbf{d}_k\|_2 \lt \epsilon \|\mathbf{x}_k\|_2∥αkdk∥2<ϵ∥xk∥2. If true (for 3 times), it believes that the current solution is stuck and terminiates the iteration with warning messages. Choose of Preconditioner M=A\mathbf{M} = \mathbf{A}M=A is an ideal but not practical preconditioner. Solving M−1\mathbf{M}^{-1}M−1 is equivalent to solving Mx=b\mathbf{M}\mathbf{x}=\mathbf{b}Mx=b. One simple preconditioner is a diagonal matrix whose diagonal elements are identical to those of A\mathbf{A}A. Another preconditioner is the incomplete Cholesky factorization. Incomplete Cholesky factorization is a sparse approximation of the Cholesky decomposition by setting elements to zero if the corresponding elements in A\mathbf{A}A are also zero. This preserves the sparsity of A\mathbf{A}A, making it suitable for problems with sparsity matrices. Yet I haven’t found such a function in numpy and scipy packages. I’ll look into it when I have time, refering to wiki and Julia’s code.
Fourier Transform
The Fourier transform decomposes a function into different frequency components, which can be summed to represent the original function. Fourier Series TL;DR Any continuous periodic function can be represented by Fourier series, where Fourier coefficients are the weighted integrals of the periodic function over one period. Definition A periodic function fT(x)f_{T}(x)fT(x) with a period TTT can be decomposed into the following exponential form: fT(x)=∑n=−∞∞cnei2πnxT\begin{equation} f_{T}(x) = \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n \frac{x}{T}} \end{equation} fT(x)=n=−∞∑∞cnei2πnTx where cnc_ncn are Fourier coefficients: cn=1T∫TfT(x)e−i2πnxTdx\begin{equation} c_n = \frac{1}{T} \int_{T} f_{T}(x) e^{-i 2\pi n \frac{x}{T}} dx \end{equation} cn=T1∫TfT(x)e−i2πnTxdx Continuous Fourier Transform TL;DR The Fourier transform works for both periodic and non-periodic continuous functions. The Fourier transform is a special case of the Fourier series when the period goes to infinity. The Fourier transform of a periodic function is a summation of weighted delta functions at specific frequencies (harmonics of 1/T1/T1/T), where the weights are the Fourier coefficients. Definition For any integrable real-valued function f:R→Cf: \mathbb{R} \rightarrow \mathbb{C}f:R→C, the Fourier transform is defined as: f^(k)=∫−∞∞f(x)e−i2πkxdx\begin{equation} \hat{f}(k) = \int_{-\infty}^{\infty} f(x) e^{-i 2\pi k x} dx \end{equation} f^(k)=∫−∞∞f(x)e−i2πkxdx where k∈Rk \in \mathbb{R}k∈R represents continuous frequencies. For example, if xxx is measured in seconds, then frequency is in hertz. The Fourier transform is able to represent periodic and non-periodic functions, whereas the Fourier series only works for periodic functions. The inverse Fourier transform is defined as: f(x)=∫−∞∞f^(k)ei2πxkdk\begin{equation} f(x) = \int_{-\infty}^{\infty} \hat{f}(k) e^{i 2\pi x k} dk \end{equation} f(x)=∫−∞∞f^(k)ei2πxkdk f(x)f(x)f(x) and f^(k)\hat{f}(k)f^(k) are often referred to as a Fourier transform pair. Here, we use F\mathcal{F}F and F−1\mathcal{F}^{-1}F−1 to denote the Fourier transform (FT) and inverse Fourier transform (iFT), respectively. Sign of Fourier transform Remember the sign used in Fourier transform and inverse Fourier transform is just a convention. Mathematicians usually choose a negative sign for the inverse Fourier transform while engineers stuck to a positive sign for it. It is not that one is better than the other. The consistency is the key, otherwise errors and confusions may arise. Connection to Fourier Series Only periodic signals could be decomposed with Fourier series, what about non-periodic signals? We would see that Fourier transform is actually Fourier series when TTT goes to the infinity, meaning there is non-periodic signals. f(x)=limT→∞fT(x)=limT→∞∑n=−∞∞cnei2πnxT=limT→∞∑n=−∞∞[1T∫TfT(x)e−i2πnxTdx]ei2πnxT=limΔfn→0∑n=−∞∞[∫−T/2T/2fT(x)e−i2πknxdx]ei2πxknΔkn, kn=nT Δkn=1T=∫−∞∞[∫−∞∞f(x)e−i2πkxdx]ei2πxkdk=∫−∞∞f^(k)ei2πxkdk\begin{equation} \begin{split} f(x) &= \lim_{T \rightarrow \infty} f_T(x)\\ &= \lim_{T \rightarrow \infty} \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n \frac{x}{T}}\\ &= \lim_{T \rightarrow \infty} \sum_{n=-\infty}^{\infty} \left[ \frac{1}{T} \int_{T} f_{T}(x) e^{-i 2\pi n \frac{x}{T}} dx \right] e^{i 2\pi n \frac{x}{T}}\\ &= \lim_{\Delta{f_n} \rightarrow 0} \sum_{n=-\infty}^{\infty} \left[ \int_{-T/2}^{T/2} f_{T}(x) e^{-i 2\pi k_n x} dx \right] e^{i 2\pi x k_n} \Delta{k_n},\ k_n = \frac{n}{T}\ \Delta{k_n}=\frac{1}{T}\\ &= \int_{-\infty}^{\infty} \left[ \int_{-\infty}^{\infty} f(x) e^{-i 2\pi k x} dx \right] e^{i 2\pi x k} dk\\ &= \int_{-\infty}^{\infty} \hat{f}(k) e^{i 2\pi x k} dk \end{split} \end{equation} f(x)=T→∞limfT(x)=T→∞limn=−∞∑∞cnei2πnTx=T→∞limn=−∞∑∞[T1∫TfT(x)e−i2πnTxdx]ei2πnTx=Δfn→0limn=−∞∑∞[∫−T/2T/2fT(x)e−i2πknxdx]ei2πxknΔkn, kn=Tn Δkn=T1=∫−∞∞[∫−∞∞f(x)e−i2πkxdx]ei2πxkdk=∫−∞∞f^(k)ei2πxkdk and that is exactly Fourier transform (note the definition of integral above). Properties Linearity For any complex numbers aaa and bbb, if h(x)=af(x)+bg(x)h(x)=af(x)+bg(x)h(x)=af(x)+bg(x), then h^(k)=f^(k)+g^(k)\hat{h}(k) = \hat{f}(k) + \hat{g}(k)h^(k)=f^(k)+g^(k). Time Shifting For any real number x0x_0x0, if h(x)=f(x−x0)h(x)=f(x-x_0)h(x)=f(x−x0), then h^(k)=e−i2πx0kf^(k)\hat{h}(k)=e^{-i 2\pi x_0 k} \hat{f}(k)h^(k)=e−i2πx0kf^(k). F[h(x)]=∫−∞∞f(x−x0)e−i2πkxdx=∫−∞∞f(x^)e−i2πk(x^+x0)d(x^+x0)=e−i2πx0k∫−∞∞f(x^)e−i2πkx^dx^=e−i2πx0kf^(k)\begin{equation} \begin{split} \mathcal{F}\left[h(x)\right] &= \int_{-\infty}^{\infty} f(x-x_0) e^{-i 2\pi k x} dx\\ &= \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi k (\hat{x}+x_0)} d(\hat{x}+x_0)\\ &= e^{-i 2\pi x_0 k} \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi k \hat{x}} d\hat{x}\\ &= e^{-i 2\pi x_0 k} \hat{f}(k) \end{split} \end{equation} F[h(x)]=∫−∞∞f(x−x0)e−i2πkxdx=∫−∞∞f(x^)e−i2πk(x^+x0)d(x^+x0)=e−i2πx0k∫−∞∞f(x^)e−i2πkx^dx^=e−i2πx0kf^(k) Frequency Shifting For any real number k0k_0k0, if h^(k)=f^(k−k0)\hat{h}(k) = \hat{f}(k-k_0)h^(k)=f^(k−k0), then h(x)=ei2πk0xf(x)h(x)=e^{i 2 \pi k_0 x} f(x)h(x)=ei2πk0xf(x). F−1[h^(k)]=∫−∞∞f^(k−k0)ei2πxkdk=∫−∞∞f^(k^)ei2πx(k^+k0)d(k^+k0)=ei2πk0x∫−∞∞f^(k^)ei2πxk^dk^=ei2πk0xf(x)\begin{equation} \begin{split} \mathcal{F}^{-1}\left[\hat{h}(k)\right] &= \int_{-\infty}^{\infty} \hat{f}(k-k_0) e^{i 2\pi x k} dk\\ &= \int_{-\infty}^{\infty} \hat{f}(\hat{k}) e^{i 2\pi x(\hat{k}+k_0)} d(\hat{k}+k_0)\\ &= e^{i 2\pi k_0 x} \int_{-\infty}^{\infty} \hat{f}(\hat{k}) e^{i 2\pi x\hat{k}} d\hat{k}\\ &= e^{i 2\pi k_0 x} f(x) \end{split} \end{equation} F−1[h^(k)]=∫−∞∞f^(k−k0)ei2πxkdk=∫−∞∞f^(k^)ei2πx(k^+k0)d(k^+k0)=ei2πk0x∫−∞∞f^(k^)ei2πxk^dk^=ei2πk0xf(x) Scale Property For any real number aaa, if h(x)=f(ax)h(x)=f(ax)h(x)=f(ax), then h^(k)=1∣a∣f^(ka)\hat{h}(k)=\frac{1}{|a|}\hat{f}(\frac{k}{a})h^(k)=∣a∣1f^(ak). Let’s assuming a>0a \gt 0a>0: F[h(x)]=∫−∞∞f(ax)e−i2πkxdx=∫−∞∞f(x^)e−i2πk(x^/a)d(x^/a)=1a∫−∞∞f(x^)e−i2π(k/a)x^dx^=1af^(ka)\begin{equation} \begin{split} \mathcal{F}\left[h(x)\right] &= \int_{-\infty}^{\infty} f(ax) e^{-i 2\pi k x} dx\\ &= \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi k (\hat{x}/a)} d(\hat{x}/a)\\ &= \frac{1}{a} \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi (k / a) \hat{x}} d\hat{x}\\ &= \frac{1}{a} \hat{f}(\frac{k}{a}) \end{split} \end{equation} F[h(x)]=∫−∞∞f(ax)e−i2πkxdx=∫−∞∞f(x^)e−i2πk(x^/a)d(x^/a)=a1∫−∞∞f(x^)e−i2π(k/a)x^dx^=a1f^(ak) and if a<0a \lt 0a<0: F[h(x)]=∫−∞∞f(ax)e−i2πkxdx=∫∞−∞f(x^)e−i2πk(x^/a)d(x^/a)=1−a∫−∞∞f(x^)e−i2π(k/a)x^dx^=1−aF(ka)\begin{equation} \begin{split} \mathcal{F}\left[h(x)\right] &= \int_{-\infty}^{\infty} f(ax) e^{-i 2\pi k x} dx\\ &= \int_{\infty}^{-\infty} f(\hat{x}) e^{-i 2\pi k (\hat{x}/a)} d(\hat{x}/a)\\ &= \frac{1}{-a} \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi (k/ a) \hat{x}} d\hat{x}\\ &= \frac{1}{-a} F(\frac{k}{a}) \end{split} \end{equation} F[h(x)]=∫−∞∞f(ax)e−i2πkxdx=∫∞−∞f(x^)e−i2πk(x^/a)d(x^/a)=−a1∫−∞∞f(x^)e−i2π(k/a)x^dx^=−a1F(ak) Time Convolution Theorem For Fourier transform pairs f(x)↔f^(k)f(x) \leftrightarrow \hat{f}(k)f(x)↔f^(k) and g(x)↔g^(k)g(x) \leftrightarrow \hat{g}(k)g(x)↔g^(k), we have: F[f(x)∗g(x)]=∫−∞∞∫−∞∞f(τ)g(x−τ)dτe−i2πkxdx=∫−∞∞∫−∞∞g(x−τ)e−i2πkxdxf(τ)dτ=∫−∞∞∫−∞∞g(x^)e−i2πk(x^+τ)d(x^+τ)f(τ)dτ=∫−∞∞(∫−∞∞g(x^)e−i2πkx^dx^)f(τ)e−i2πkτdτ=g^(k)∫−∞∞f(τ)e−i2πkτdτ=g^(k)f^(k)\begin{equation} \begin{split} \mathcal{F}\left[f(x) \ast g(x)\right] &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(\tau)g(x-\tau) d\tau e^{-i 2\pi k x} dx \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x-\tau) e^{-i 2\pi k x} dx f(\tau) d\tau\\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(\hat{x}) e^{-i 2\pi k (\hat{x}+\tau)} d(\hat{x}+\tau) f(\tau) d\tau\\ &= \int_{-\infty}^{\infty} \left(\int_{-\infty}^{\infty} g(\hat{x}) e^{-i 2\pi k \hat{x}} d\hat{x}\right) f(\tau) e^{-i 2\pi k \tau} d\tau\\ &= \hat{g}(k) \int_{-\infty}^{\infty} f(\tau) e^{-i 2\pi k \tau} d\tau\\ &= \hat{g}(k) \hat{f}(k) \end{split} \end{equation} F[f(x)∗g(x)]=∫−∞∞∫−∞∞f(τ)g(x−τ)dτe−i2πkxdx=∫−∞∞∫−∞∞g(x−τ)e−i2πkxdxf(τ)dτ=∫−∞∞∫−∞∞g(x^)e−i2πk(x^+τ)d(x^+τ)f(τ)dτ=∫−∞∞(∫−∞∞g(x^)e−i2πkx^dx^)f(τ)e−i2πkτdτ=g^(k)∫−∞∞f(τ)e−i2πkτdτ=g^(k)f^(k) Frequency Convolution Theorem For Fourier transform pairs f(x)↔f^(k)f(x) \leftrightarrow \hat{f}(k)f(x)↔f^(k) and g(x)↔g^(k)g(x) \leftrightarrow \hat{g}(k)g(x)↔g^(k), we have: F−1[f^(k)∗g^(k)]=∫−∞∞∫−∞∞f^(τ)g^(k−τ)dτei2πxkdk=∫−∞∞∫−∞∞g^(k−τ)ei2πxkdkf^(τ)dτ=∫−∞∞∫−∞∞g^(k^)ei2πx(k^+τ)d(k^+τ)f^(τ)dτ=∫−∞∞(∫−∞∞g^(k^)ei2πxk^dk^)f^(τ)ei2πxτdτ=g(x)∫−∞∞f^(τ)ei2πxτdτ=g(x)f(x)\begin{equation} \begin{split} \mathcal{F}^{-1}\left[\hat{f}(k) \ast \hat{g}(k)\right] &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{f}(\tau) \hat{g}(k-\tau) d\tau e^{i 2\pi x k} dk\\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{g}(k-\tau) e^{i 2\pi x k} dk \hat{f}(\tau) d\tau\\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{g}(\hat{k}) e^{i 2\pi x (\hat{k}+\tau)} d(\hat{k}+\tau) \hat{f}(\tau) d\tau\\ &= \int_{-\infty}^{\infty} \left(\int_{-\infty}^{\infty} \hat{g}(\hat{k}) e^{i 2\pi x \hat{k}} d\hat{k}\right) \hat{f}(\tau) e^{i 2\pi x \tau} d\tau\\ &= g(x) \int_{-\infty}^{\infty} \hat{f}(\tau) e^{i 2\pi x \tau} d\tau\\ &= g(x) f(x) \end{split} \end{equation} F−1[f^(k)∗g^(k)]=∫−∞∞∫−∞∞f^(τ)g^(k−τ)dτei2πxkdk=∫−∞∞∫−∞∞g^(k−τ)ei2πxkdkf^(τ)dτ=∫−∞∞∫−∞∞g^(k^)ei2πx(k^+τ)d(k^+τ)f^(τ)dτ=∫−∞∞(∫−∞∞g^(k^)ei2πxk^dk^)f^(τ)ei2πxτdτ=g(x)∫−∞∞f^(τ)ei2πxτdτ=g(x)f(x) Conguation F[f(x)‾]=∫−∞∞f(x)‾e−i2πkxdx=∫−∞∞f(x)ei2πkxdx‾=∫−∞∞f(x)e−i2π(−k)xdx‾=f^(−k)‾\begin{equation} \begin{split} \mathcal{F}\left[\overline{f(x)}\right] &= \int_{-\infty}^{\infty} \overline{f(x)} e^{-i 2\pi k x} dx \\ &= \overline{ \int_{-\infty}^{\infty} f(x) e^{i 2\pi k x} dx}\\ &= \overline{ \int_{-\infty}^{\infty} f(x) e^{-i 2\pi (-k) x} dx}\\ &= \overline{\hat{f}(-k)} \end{split} \end{equation} F[f(x)]=∫−∞∞f(x)e−i2πkxdx=∫−∞∞f(x)ei2πkxdx=∫−∞∞f(x)e−i2π(−k)xdx=f^(−k) If f(x)f(x)f(x) is a real-valued function, then f^(−k)=f^(k)‾\hat{f}(-k)=\overline{\hat{f}(k)}f^(−k)=f^(k), which is referred to as conjugate symmetric property. If f(x)f(x)f(x) is an imaginary-valued function, then f^(−k)=−f^(k)‾\hat{f}(-k)=- \overline{\hat{f}(k)}f^(−k)=−f^(k), which is referred to as conjugate anti-symmetric property. Same properties occur in the inverse Fourier transform. Common FT Pairs Time Domain Frequency Domain Description 111 δ(k)\delta(k)δ(k) δ(k)=∫−∞∞e−i2πkxdx\delta(k) = \int_{-\infty}^{\infty} e^{-i 2\pi k x} dxδ(k)=∫−∞∞e−i2πkxdx δ(x)\delta(x)δ(x) 111 1=∫−∞∞δ(x)e−i2πkxdx1 = \int_{-\infty}^{\infty} \delta(x) e^{-i 2\pi k x} dx1=∫−∞∞δ(x)e−i2πkxdx sgn(x)={1,x>00,x=0−1,x<0\mathrm{sgn}(x) = \left\{\begin{aligned} 1,x \gt 0\\ 0,x = 0\\-1,x \lt 0\\ \end{aligned}\right.sgn(x)=⎩⎨⎧1,x>00,x=0−1,x<0 1/(iπk)1/(i\pi k)1/(iπk) sgn(x)\mathrm{sgn}(x)sgn(x) is the sign function u(x)={1,x>01/2,x=00,x<0u(x) = \left\{\begin{aligned} 1,x \gt 0\\ 1/2,x = 0\\0,x \lt 0\\ \end{aligned}\right.u(x)=⎩⎨⎧1,x>01/2,x=00,x<0 12(δ(k)+1/(iπk))\frac{1}{2}\left(\delta(k) + 1/(i\pi k)\right)21(δ(k)+1/(iπk)) u(x)u(x)u(x) is the unit step function.u(x)=12(1+sgn(x))u(x)=\frac{1}{2}(1+\mathrm{sgn}(x))u(x)=21(1+sgn(x)) eiaxe^{i a x}eiax δ(k−a/(2π))\delta(k-a/(2\pi))δ(k−a/(2π)) Frequency Shifting cos(ax)\mathrm{cos}(ax)cos(ax) 12(δ(k−a/(2π))+δ(k+a/(2π)))\frac{1}{2}\left(\delta(k-a/(2\pi)) + \delta(k+a/(2\pi))\right)21(δ(k−a/(2π))+δ(k+a/(2π))) cos(ax)=12(eiax+e−iax)\mathrm{cos}(ax) =\\ \frac{1}{2}(e^{i a x} + e^{-i a x})cos(ax)=21(eiax+e−iax) sin(ax)\mathrm{sin}(ax)sin(ax) 12(δ(k−a/(2π))−δ(k+a/(2π)))\frac{1}{2}\left(\delta(k-a/(2\pi)) - \delta(k+a/(2\pi))\right)21(δ(k−a/(2π))−δ(k+a/(2π))) sin(ax)=12(eiax−e−iax)\mathrm{sin}(ax) =\\ \frac{1}{2}(e^{i a x} - e^{-i a x})sin(ax)=21(eiax−e−iax) rect(x)={1,∣x∣<1/21/2,∣x∣=1/20,∣x∣>1/2\mathrm{rect}(x)=\left\{\begin{aligned}1,\lvert x \rvert \lt 1/2\\ 1/2,\lvert x \rvert = 1/2\\ 0,\lvert x \rvert \gt 1/2\end{aligned}\right.rect(x)=⎩⎨⎧1,∣x∣<1/21/2,∣x∣=1/20,∣x∣>1/2 sinc(k)=sin(πk)/(πk)\mathrm{sinc}(k) = \mathrm{sin}(\pi k)/(\pi k)sinc(k)=sin(πk)/(πk) rect(x)=u(x+1/2)−u(x−1/2)\mathrm{rect}(x) =\\ u(x+1/2)-u(x-1/2)rect(x)=u(x+1/2)−u(x−1/2) sinc(x)\mathrm{sinc}(x)sinc(x) rect(k)={1,∣k∣<1/21/2,∣k∣=1/20,∣k∣>1/2\mathrm{rect}(k)=\left\{\begin{aligned}1,\lvert k \rvert \lt 1/2\\ 1/2,\lvert k \rvert = 1/2\\ 0,\lvert k \rvert \gt 1/2\end{aligned}\right.rect(k)=⎩⎨⎧1,∣k∣<1/21/2,∣k∣=1/20,∣k∣>1/2 ∑n=−∞∞δ(x−nΔx)\sum_{n=-\infty}^{\infty} \delta(x-n \Delta x)∑n=−∞∞δ(x−nΔx) 1Δx∑m=−∞∞δ(k−m/Δx)\frac{1}{\Delta x} \sum_{m=-\infty}^{\infty} \delta(k-m/\Delta x)Δx1∑m=−∞∞δ(k−m/Δx) Comb function Comb Function A comb function (a.k.a sampling function, sometimes referred to as impulse sampling) is a periodic function with the formula: SΔx(x)=∑n=−∞∞δ(x−nΔx)S_{\Delta{x}}(x) = \sum_{n=-\infty}^{\infty} \delta(x-n\Delta{x}) SΔx(x)=n=−∞∑∞δ(x−nΔx) where Δx\Delta{x}Δx is the given period. Fourier transform could be extended to [[generalized functions]] like δ(x)\delta(x)δ(x), which makes it possible to bypass the limitation of absolute integrable property on f(x)f(x)f(x). The key is to decompose periodic functions into Fourier series and use additive property of Fourier transform . The Fourier transform of SΔx(x)S_{\Delta{x}}(x)SΔx(x) is: F(SΔx(x))=∫−∞∞∑n=−∞∞δ(x−nΔx)e−i2πkxdx=∑n=−∞∞∫−∞∞δ(x−nΔx)e−i2πkxdx=∑n=−∞∞e−i2πknΔx\begin{equation} \begin{split} \mathcal{F}\left(S_{\Delta{x}}(x)\right) &= \int_{-\infty}^{\infty} \sum_{n=-\infty}^{\infty} \delta(x-n\Delta{x}) e^{-i 2\pi k x} dx\\ &= \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} \delta(x-n\Delta{x}) e^{-i 2\pi k x} dx\\ &= \sum_{n=-\infty}^{\infty} e^{-i 2\pi k n \Delta{x}}\\ \end{split} \end{equation} F(SΔx(x))=∫−∞∞n=−∞∑∞δ(x−nΔx)e−i2πkxdx=n=−∞∑∞∫−∞∞δ(x−nΔx)e−i2πkxdx=n=−∞∑∞e−i2πknΔx It is not obvious to see what the transform is, but we can prove it with Fourier series. Note that SΔx(x)S_{\Delta{x}}(x)SΔx(x) is a periodic function, its Fourier series is represented as: cn=1Δx∫ΔxSΔx(x)e−i2πnx/Δxdx=1Δx∫−Δx2Δx2∑m=−∞∞δ(x−mΔx)e−i2πnx/Δxdx=1Δx∫−Δx2Δx2δ(x)e−i2πnx/Δxdx=1Δx∫−∞∞δ(x)e−i2πnx/Δxdx=1ΔxSΔx(x)=∑n=−∞∞cnei2πnx/Δx=1Δx∑n=−∞∞ei2πnx/Δx\begin{equation} \begin{split} c_n &= \frac{1}{\Delta{x}} \int_{\Delta{x}} S_{\Delta{x}}(x) e^{-i 2\pi n x / \Delta{x}} dx\\ &= \frac{1}{\Delta{x}} \int_{-\frac{\Delta{x}}{2}}^{\frac{\Delta{x}}{2}} \sum_{m=-\infty}^{\infty} \delta(x-m\Delta{x}) e^{-i 2\pi n x / \Delta{x}} dx\\ &= \frac{1}{\Delta{x}} \int_{-\frac{\Delta{x}}{2}}^{\frac{\Delta{x}}{2}} \delta(x) e^{-i 2\pi n x / \Delta{x}} dx\\ &= \frac{1}{\Delta{x}} \int_{-\infty}^{\infty} \delta(x) e^{-i 2\pi n x / \Delta{x}} dx\\ &= \frac{1}{\Delta{x}}\\ S_{\Delta{x}}(x) &= \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n x / \Delta{x}}\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} e^{i 2\pi n x / \Delta{x}} \end{split} \end{equation} cnSΔx(x)=Δx1∫ΔxSΔx(x)e−i2πnx/Δxdx=Δx1∫−2Δx2Δxm=−∞∑∞δ(x−mΔx)e−i2πnx/Δxdx=Δx1∫−2Δx2Δxδ(x)e−i2πnx/Δxdx=Δx1∫−∞∞δ(x)e−i2πnx/Δxdx=Δx1=n=−∞∑∞cnei2πnx/Δx=Δx1n=−∞∑∞ei2πnx/Δx and Dirac delta function is expressed as: δ(x)=∫−∞∞ei2πxkdk=∫−∞∞e−i2πx(−k)dk=∫∞−∞e−i2πxk^d(−k^)=∫−∞∞e−i2πxk^d(k^)\begin{equation} \begin{split} \delta(x) &= \int_{-\infty}^{\infty} e^{i 2\pi x k} dk\\ &= \int_{-\infty}^{\infty} e^{-i 2\pi x (-k)} dk\\ &= \int_{\infty}^{-\infty} e^{-i 2\pi x \hat{k}} d(-\hat{k})\\ &= \int_{-\infty}^{\infty} e^{-i 2\pi x \hat{k}} d(\hat{k}) \end{split} \end{equation} δ(x)=∫−∞∞ei2πxkdk=∫−∞∞e−i2πx(−k)dk=∫∞−∞e−i2πxk^d(−k^)=∫−∞∞e−i2πxk^d(k^) Now we can apply Fourier transform to SΔx(x)S_{\Delta{x}}(x)SΔx(x): F(SΔx(x))=∫−∞∞∑n=−∞∞cnei2πnx/Δxe−i2πkxdx=1Δx∑n=−∞∞∫−∞∞ei2π(n/Δx−k)xdx=1Δx∑n=−∞∞δ(k−n/Δx)\begin{equation} \begin{split} \mathcal{F}\left(S_{\Delta{x}}(x)\right) &= \int_{-\infty}^{\infty} \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n x / \Delta{x}} e^{-i 2\pi k x} dx\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} e^{i 2\pi (n/\Delta{x} - k) x} dx\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \delta({k - n/\Delta{x}}) \end{split} \end{equation} F(SΔx(x))=∫−∞∞n=−∞∑∞cnei2πnx/Δxe−i2πkxdx=Δx1n=−∞∑∞∫−∞∞ei2π(n/Δx−k)xdx=Δx1n=−∞∑∞δ(k−n/Δx) so F(SΔx(x))\mathcal{F}\left(S_{\Delta{x}}(x)\right)F(SΔx(x)) is also a periodic function with the period as 1/Δx1/\Delta{x}1/Δx. Hence, we again apply Fourier series: cn=Δx∫1ΔxF(SΔx(x))e−i2πnkΔxdk=∫−12Δx12Δx∑m=−∞∞δ(k−m/Δx)e−i2πnkΔxdk=∫−12Δx12Δxδ(k)e−i2πnkΔxdk=∫−∞∞δ(k)e−i2πnkΔxdk=1F(SΔx(x))=∑n=−∞∞cnei2πnkΔx=∑n=−∞∞ei2πnkΔx=∑n=−∞∞e−i2πnkΔx\begin{equation} \begin{split} c_n &= \Delta{x} \int_{\frac{1}{\Delta{x}}} \mathcal{F}\left(S_{\Delta{x}}(x)\right) e^{-i 2\pi n k \Delta{x}} dk\\ &= \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \sum_{m=-\infty}^{\infty} \delta(k-m/\Delta{x}) e^{-i 2\pi n k \Delta{x}} dk\\ &= \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \delta(k) e^{-i 2\pi n k \Delta{x}} dk\\ &= \int_{-\infty}^{\infty} \delta(k) e^{-i 2\pi n k \Delta{x}} dk\\ &= 1\\ \mathcal{F}\left(S_{\Delta{x}}(x)\right) &= \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n k \Delta{x}}\\ &= \sum_{n=-\infty}^{\infty} e^{i 2\pi n k \Delta{x}}\\ &= \sum_{n=-\infty}^{\infty} e^{-i 2\pi n k \Delta{x}} \end{split} \end{equation} cnF(SΔx(x))=Δx∫Δx1F(SΔx(x))e−i2πnkΔxdk=∫−2Δx12Δx1m=−∞∑∞δ(k−m/Δx)e−i2πnkΔxdk=∫−2Δx12Δx1δ(k)e−i2πnkΔxdk=∫−∞∞δ(k)e−i2πnkΔxdk=1=n=−∞∑∞cnei2πnkΔx=n=−∞∑∞ei2πnkΔx=n=−∞∑∞e−i2πnkΔx and we can see that why Fourier transform of a comb function is still a comb function. FT of Periodic Functions With the help of Dirac delta function, Fourier transform could also be used on periodic functions. Considering a periodic function fT(x)f_{T}(x)fT(x) with period TTT, we can write it as a Fourier series: fT(x)=∑n=−∞∞cnei2πnx/T\begin{equation} f_{T}(x) = \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n x/T} \end{equation} fT(x)=n=−∞∑∞cnei2πnx/T Let’s compute the Fourier transform: F(fT(x))=∫−∞∞fT(x)e−i2πkxdx=∑n=−∞∞cn∫−∞∞ei2π(n/T−k)xdx=∑n=−∞∞cnδ(n/T−k)=∑n=−∞∞cnδ(k−n/T)\begin{equation} \begin{split} \mathcal{F}(f_T(x)) &= \int_{-\infty}^{\infty} f_T(x) e^{-i 2\pi k x} dx\\ &= \sum_{n=-\infty}^{\infty} c_n \int_{-\infty}^{\infty} e^{i 2\pi (n/T-k) x} dx\\ &= \sum_{n=-\infty}^{\infty} c_n \delta(n/T - k)\\ &= \sum_{n=-\infty}^{\infty} c_n \delta(k - n/T) \end{split} \end{equation} F(fT(x))=∫−∞∞fT(x)e−i2πkxdx=n=−∞∑∞cn∫−∞∞ei2π(n/T−k)xdx=n=−∞∑∞cnδ(n/T−k)=n=−∞∑∞cnδ(k−n/T) so the Fourier transform of a periodic function is a sum of delta functions at the Fourier series frequencies and the weight of each delta function is the Fourier series coefficient, as we proved in the Fourier transform of the comb function. The inverse Fourier transform is: F(∑n=−∞∞cnδ(k−n/T))=∫−∞∞∑n=−∞∞cnδ(k−n/T)ei2πxkdk=∑n=−∞∞cn∫−∞∞δ(k−n/T)ei2πxkdk=∑n=−∞∞cnei2πxn/T=fT(x)\begin{equation} \begin{split} \mathcal{F}(\sum_{n=-\infty}^{\infty} c_n \delta(k - n/T)) &= \int_{-\infty}^{\infty} \sum_{n=-\infty}^{\infty} c_n \delta(k - n/T) e^{i 2\pi x k} dk\\ &= \sum_{n=-\infty}^{\infty} c_n \int_{-\infty}^{\infty} \delta(k - n/T) e^{i 2\pi x k} dk\\ &= \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi x n/T}\\ &= f_T(x) \end{split} \end{equation} F(n=−∞∑∞cnδ(k−n/T))=∫−∞∞n=−∞∑∞cnδ(k−n/T)ei2πxkdk=n=−∞∑∞cn∫−∞∞δ(k−n/T)ei2πxkdk=n=−∞∑∞cnei2πxn/T=fT(x) Discrete-Time Fourier Transform TL;DR The discrete-time Fourier transform (DTFT) is a special case of the Fourier transform when the original function is sampled. The frequency domain of DTFT is continuous but periodic, with a period of 1/Δx1/\Delta{x}1/Δx, where Δx\Delta{x}Δx is the sampling interval. The DTFT of periodic sequence is a summation of weighted delta functions. Definition For a discrete sequence of real or complex values f[n]f[n]f[n] with all integers nnn, the discrete-time Fourier transform is defined as: f^dtft(k)=∑n=−∞∞f[n]e−i2πnkΔx\begin{equation} \hat{f}_{dtft}(k) = \sum_{n=-\infty}^{\infty} f[n] e^{-i2\pi n k \Delta{x}} \end{equation} f^dtft(k)=n=−∞∑∞f[n]e−i2πnkΔx where 1Δx\frac{1}{\Delta{x}}Δx1 is the sampling frequency in the time domain. This formula can be seen as a Fourier series (−-− and +++ signs are the same thing) and f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k) is actually a periodic function with period 1/Δx1/\Delta{x}1/Δx. The inverse discrete-time Fourier transform is defined as: f[n]=Δx∫1/Δxf^dtft(k)ei2πnkΔxdk\begin{equation} f[n] = \Delta{x} \int_{1/\Delta{x}} \hat{f}_{dtft}(k) e^{i 2\pi n k \Delta{x}} dk \end{equation} f[n]=Δx∫1/Δxf^dtft(k)ei2πnkΔxdk note the integral is only evaluated in a period. Here we use Fdtft\mathcal{F}_{dtft}Fdtft to represent the discrete-time Fourier transform (DTFT) and Fdtft−1\mathcal{F}_{dtft}^{-1}Fdtft−1 to represent the inverse discrete-time Fourier transform (iDTFT). Connection to the FT Modern computers can only handle discrete values instead of continuous signals. The most basic discretization technique is sampling. Considering a continuous function f(x)f(x)f(x) and an uniform sampling pattern SΔx(x)=∑n=−∞∞δ(x−nΔx)S_{\Delta{x}}(x) = \sum_{n=-\infty}^{\infty} \delta(x-n\Delta{x})SΔx(x)=∑n=−∞∞δ(x−nΔx), which is the [[#Comb Function]] we described above, the sampling process can be simulated as: fS(x)=f(x)SΔx(x)=∑n=−∞∞f(x)δ(x−nΔx)\begin{equation} \begin{split} f_S(x) &= f(x) S_{\Delta{x}}(x)\\ &= \sum_{n=-\infty}^{\infty} f(x)\delta(x-n\Delta{x}) \end{split} \end{equation} fS(x)=f(x)SΔx(x)=n=−∞∑∞f(x)δ(x−nΔx) The Fourier transform (using the definition) of the above function is: f^S(k)=∫−∞∞fS(x)e−i2πkxdx=∫−∞∞(∑n=−∞∞f(x)δ(x−nΔx))e−i2πkxdx=∑n=−∞∞(∫−∞∞f(x)e−i2πkxδ(x−nΔx)dx)=∑n=−∞∞f(nΔx)e−i2πknΔx, f[n]=f(nΔx)=∑n=−∞∞f[n]e−i2πnkΔx=f^dtft(k)\begin{equation} \begin{split} \hat{f}_S(k) &= \int_{-\infty}^{\infty} f_S(x) e^{-i 2\pi k x} dx\\ &= \int_{-\infty}^{\infty} \left( \sum_{n=-\infty}^{\infty} f(x)\delta(x-n\Delta{x}) \right) e^{-i 2\pi k x} dx\\ &= \sum_{n=-\infty}^{\infty} \left( \int_{-\infty}^{\infty} f(x) e^{-i 2\pi k x} \delta(x-n\Delta{x}) dx\right)\\ &= \sum_{n=-\infty}^{\infty} f(n\Delta{x})e^{-i 2\pi k n\Delta{x}},\ f[n]=f(n\Delta{x})\\ &=\sum_{n=-\infty}^{\infty} f[n] e^{-i 2\pi n k \Delta{x}}\\ &= \hat{f}_{dtft}(k) \end{split} \end{equation} f^S(k)=∫−∞∞fS(x)e−i2πkxdx=∫−∞∞(n=−∞∑∞f(x)δ(x−nΔx))e−i2πkxdx=n=−∞∑∞(∫−∞∞f(x)e−i2πkxδ(x−nΔx)dx)=n=−∞∑∞f(nΔx)e−i2πknΔx, f[n]=f(nΔx)=n=−∞∑∞f[n]e−i2πnkΔx=f^dtft(k) which is the definition of discrete-time Fourier transform. The next step is to prove the correctness of the inverse discrete-time Fourier transform: Δx∫1/Δxf^dtft(k)ei2πnkΔxdk=Δx∫−12Δx12Δxf^dtft(k)ei2πnkΔxdk=Δx∫−12Δx12Δxf^S(k)ei2πnkΔxdk=Δx∫−12Δx12Δx[∑m=−∞∞f[m]e−i2πmkΔx]ei2πnkΔxdk=Δx∑m=−∞∞f[m][∫−12Δx12Δxei2π(n−m)kΔxdk]=Δx∑m=−∞∞f[m][1i2π(n−m)Δxei2π(n−m)kΔx∣−12Δx12Δx]=Δx∑m=−∞∞f[m][1Δxsin(π(n−m))π(n−m)]=f[n]\begin{equation} \begin{split} \Delta{x} \int_{1/\Delta{x}} \hat{f}_{dtft}(k) e^{i 2\pi n k \Delta{x}} dk &= \Delta{x} \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \hat{f}_{dtft}(k) e^{i 2\pi n k \Delta{x}} dk\\ &= \Delta{x} \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \hat{f}_{S}(k) e^{i 2\pi n k \Delta{x}} dk\\ &=\Delta{x} \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \left[\sum_{m=-\infty}^{\infty} f[m] e^{-i 2\pi m k \Delta{x}}\right] e^{i 2\pi n k \Delta{x}} dk\\ &=\Delta{x} \sum_{m=-\infty}^{\infty} f[m] \left[ \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} e^{i 2\pi (n-m) k \Delta{x}} dk \right]\\ &= \Delta{x} \sum_{m=-\infty}^{\infty} f[m] \left[ \left. \frac{1}{i 2\pi (n-m) \Delta{x}} e^{i 2\pi (n-m) k \Delta{x}} \right|_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \right]\\ &= \Delta{x} \sum_{m=-\infty}^{\infty} f[m] \left[\frac{1}{\Delta{x}} \frac{\sin(\pi(n-m))}{\pi (n-m)}\right]\\ &= f[n] \end{split} \end{equation} Δx∫1/Δxf^dtft(k)ei2πnkΔxdk=Δx∫−2Δx12Δx1f^dtft(k)ei2πnkΔxdk=Δx∫−2Δx12Δx1f^S(k)ei2πnkΔxdk=Δx∫−2Δx12Δx1[m=−∞∑∞f[m]e−i2πmkΔx]ei2πnkΔxdk=Δxm=−∞∑∞f[m][∫−2Δx12Δx1ei2π(n−m)kΔxdk]=Δxm=−∞∑∞f[m][i2π(n−m)Δx1ei2π(n−m)kΔx−2Δx12Δx1]=Δxm=−∞∑∞f[m][Δx1π(n−m)sin(π(n−m))]=f[n] note that sinc(x)=sin(πx)πx\mathrm{sinc}(x)=\frac{\sin(\pi x)}{\pi x}sinc(x)=πxsin(πx) only has a nonzero value 1 at x=0x=0x=0 and 0 for other integers. It is clear that f^S(k)\hat{f}_S(k)f^S(k) is a periodic function with period 1/Δx1/\Delta{x}1/Δx, we can see that from the convolution theorem of the Fourier transform: f^S(k)=F[f(x)SΔx(x)]=F[f(x)]∗F[SΔx(x)]=f^(k)∗S^Δx(k)=f^(k)∗1Δx∑n=−∞∞δ(k−n/Δx)=1Δx∑n=−∞∞f^(k)∗δ(k−n/Δx)=1Δx∑n=−∞∞∫−∞∞f^(τ)δ(k−τ−n/Δx)dτ=1Δx∑n=−∞∞∫−∞∞f^(τ)δ(−(τ−k+n/Δx))dτ=1Δx∑n=−∞∞∫−∞∞f^(τ)δ(τ−(k−n/Δx))dτ=1Δx∑n=−∞∞f^(k−n/Δx)\begin{equation} \begin{split} \hat{f}_S(k) &= \mathcal{F}\left[f(x)S_{\Delta{x}}(x)\right]\\ &= \mathcal{F}\left[f(x)\right] \ast \mathcal{F}\left[S_{\Delta{x}}(x)\right]\\ &= \hat{f}(k) \ast \hat{S}_{\Delta{x}}(k)\\ &= \hat{f}(k) \ast \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \delta(k - n/\Delta{x})\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \hat{f}(k) \ast \delta(k - n/\Delta{x})\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{f}(\tau) \delta(k - \tau - n/\Delta{x}) d\tau\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{f}(\tau) \delta(-(\tau - k + n/\Delta{x})) d\tau\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{f}(\tau) \delta(\tau - (k - n/\Delta{x})) d\tau\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \hat{f}(k-n/\Delta{x})\\ \end{split} \end{equation} f^S(k)=F[f(x)SΔx(x)]=F[f(x)]∗F[SΔx(x)]=f^(k)∗S^Δx(k)=f^(k)∗Δx1n=−∞∑∞δ(k−n/Δx)=Δx1n=−∞∑∞f^(k)∗δ(k−n/Δx)=Δx1n=−∞∑∞∫−∞∞f^(τ)δ(k−τ−n/Δx)dτ=Δx1n=−∞∑∞∫−∞∞f^(τ)δ(−(τ−k+n/Δx))dτ=Δx1n=−∞∑∞∫−∞∞f^(τ)δ(τ−(k−n/Δx))dτ=Δx1n=−∞∑∞f^(k−n/Δx) so the discrete-time Fourier transform of f[n]f[n]f[n] is a summation of shifted replicates of f^(k)\hat{f}(k)f^(k) in terms of a frequency period 1/Δx1/\Delta{x}1/Δx. Properties Common DTFT Pairs DTFT of Periodic Sequences Considering f[n]f[n]f[n] is an NNN-periodic sequence, we can write f[n]f[n]f[n] as fS(x)=fT(x)SΔx(x)f_S(x) = f_T(x) S_{\Delta{x}}(x)fS(x)=fT(x)SΔx(x), where NΔx=TN\Delta{x}=TNΔx=T. fS(x)f_S(x)fS(x) is also a periodic function with a period TTT: fS(x+T)=fT(x+T)SΔx(x+T)=fT(x)SΔx(x+NΔx)=fT(x)SΔx(x)=fS(x)f[n+N]=fS((n+N)Δx)=fS(nΔx+T)=fS(nΔx)=f[n]\begin{equation} \begin{split} f_{S}(x+T) &= f_{T}(x+T) S_{\Delta{x}}(x+T)\\ &= f_T(x) S_{\Delta{x}}(x+N\Delta{x})\\ &= f_T(x) S_{\Delta{x}}(x)\\ &= f_S(x)\\ f[n+N] &= f_S((n+N)\Delta{x})\\ &= f_S(n\Delta{x}+T)\\ &= f_S(n\Delta{x})\\ &= f[n] \end{split} \end{equation} fS(x+T)f[n+N]=fT(x+T)SΔx(x+T)=fT(x)SΔx(x+NΔx)=fT(x)SΔx(x)=fS(x)=fS((n+N)Δx)=fS(nΔx+T)=fS(nΔx)=f[n] For delta function, we also know that: ∫abf(x)δ(x−x0)dx==∫−∞∞f(x)(u(x−a)−u(x−b))δ(x−x0)dx=f(x0)(u(x0−a)−u(x0−b))={f(x0)x0∈(a,b)f(x0)/2x0=a,b0else\begin{equation} \begin{split} \int_{a}^{b} f(x) \delta(x-x_0) dx &=\\ &= \int_{-\infty}^{\infty} f(x) \left(u(x-a) - u(x-b)\right) \delta(x-x_0) dx\\ &= f(x_0)\left(u(x_0-a) - u(x_0-b)\right)\\ &= \begin{cases} f(x_0) & x_0 \in (a, b)\\ f(x_0)/2 & x_0=a,b\\ 0 & \text{else} \end{cases} \end{split} \end{equation} ∫abf(x)δ(x−x0)dx==∫−∞∞f(x)(u(x−a)−u(x−b))δ(x−x0)dx=f(x0)(u(x0−a)−u(x0−b))=⎩⎨⎧f(x0)f(x0)/20x0∈(a,b)x0=a,belse where u(x)u(x)u(x) is the unit step function. Remember that Fourier transform of a periodic function is a summation of delta functions weighted by the Fourier coefficients: cm=1T∫TfS(x)e−i2πmx/Tdx=1T∫−−Δx2T−−Δx2fT(x)SΔx(x)e−i2πmx/Tdx=1T∫−−Δx2T−−Δx2fT(x)∑n=−∞∞δ(x−nΔx)e−i2πmx/Tdx=1T∫−−Δx2T−−Δx2fT(x)∑n=0N−1δ(x−nΔx)e−i2πmx/Tdx=1T∑n=0N−1∫−−Δx2T−−Δx2fT(x)e−i2πmx/Tδ(x−nΔx)dx=1T∑n=0N−1fT(nΔx)e−i2πnmΔx/T=1T∑n=0N−1fT(nΔx)e−i2πnmΔx/(NΔx))=1T∑n=0N−1fT(nΔx)e−i2πnm/N)=1T∑n=0N−1f[n]e−i2πnm/Nf^dtft(k)=F(fS(x))=∑m=−∞∞cmδ(k−m/T)=1T∑m=−∞∞(∑n=0N−1f[n]e−i2πnm/N)δ(k−m/T)=1T∑m=−∞∞f^[m]δ(k−m/T)=1NΔx∑m=−∞∞f^[m]δ(k−mNΔx)\begin{equation} \begin{split} c_m &= \frac{1}{T} \int_{T} f_{S}(x) e^{-i 2\pi m x / T} dx\\ &= \frac{1}{T} \int_{-\frac{-\Delta{x}}{2}}^{T-\frac{-\Delta{x}}{2}} f_T(x) S_{\Delta{x}}(x) e^{-i 2\pi m x / T} dx\\ &= \frac{1}{T} \int_{-\frac{-\Delta{x}}{2}}^{T-\frac{-\Delta{x}}{2}} f_T(x) \sum_{n=-\infty}^{\infty} \delta(x-n\Delta{x}) e^{-i 2\pi m x / T} dx\\ &= \frac{1}{T} \int_{-\frac{-\Delta{x}}{2}}^{T-\frac{-\Delta{x}}{2}} f_T(x) \sum_{n=0}^{N-1} \delta(x-n\Delta{x}) e^{-i 2\pi m x / T} dx\\ &= \frac{1}{T} \sum_{n=0}^{N-1} \int_{-\frac{-\Delta{x}}{2}}^{T-\frac{-\Delta{x}}{2}} f_T(x) e^{-i 2\pi m x / T} \delta(x-n\Delta{x}) dx\\ &= \frac{1}{T} \sum_{n=0}^{N-1} f_T(n\Delta{x}) e^{-i 2\pi n m \Delta{x} / T}\\ &= \frac{1}{T} \sum_{n=0}^{N-1} f_T(n\Delta{x}) e^{-i 2\pi n m \Delta{x} / (N\Delta{x}))}\\ &= \frac{1}{T} \sum_{n=0}^{N-1} f_T(n\Delta{x}) e^{-i 2\pi n m / N)}\\ &= \frac{1}{T} \sum_{n=0}^{N-1} f[n] e^{-i 2\pi n m / N}\\ \hat{f}_{dtft}(k) &= \mathcal{F}(f_S(x))\\ &= \sum_{m=-\infty}^{\infty} c_m \delta(k - m/T)\\ &= \frac{1}{T} \sum_{m=-\infty}^{\infty} \left(\sum_{n=0}^{N-1} f[n] e^{-i 2\pi n m / N}\right) \delta(k - m/T)\\ &= \frac{1}{T} \sum_{m=-\infty}^{\infty} \hat{f}[m] \delta(k - m/T)\\ &= \frac{1}{N\Delta{x}} \sum_{m=-\infty}^{\infty} \hat{f}[m] \delta(k - \frac{m}{N\Delta{x}}) \end{split} \end{equation} cmf^dtft(k)=T1∫TfS(x)e−i2πmx/Tdx=T1∫−2−ΔxT−2−ΔxfT(x)SΔx(x)e−i2πmx/Tdx=T1∫−2−ΔxT−2−ΔxfT(x)n=−∞∑∞δ(x−nΔx)e−i2πmx/Tdx=T1∫−2−ΔxT−2−ΔxfT(x)n=0∑N−1δ(x−nΔx)e−i2πmx/Tdx=T1n=0∑N−1∫−2−ΔxT−2−ΔxfT(x)e−i2πmx/Tδ(x−nΔx)dx=T1n=0∑N−1fT(nΔx)e−i2πnmΔx/T=T1n=0∑N−1fT(nΔx)e−i2πnmΔx/(NΔx))=T1n=0∑N−1fT(nΔx)e−i2πnm/N)=T1n=0∑N−1f[n]e−i2πnm/N=F(fS(x))=m=−∞∑∞cmδ(k−m/T)=T1m=−∞∑∞(n=0∑N−1f[n]e−i2πnm/N)δ(k−m/T)=T1m=−∞∑∞f^[m]δ(k−m/T)=NΔx1m=−∞∑∞f^[m]δ(k−NΔxm) Note we use a trick with a range between −−Δx2-\frac{-\Delta{x}}{2}−2−Δx and T−−Δx2T-\frac{-\Delta{x}}{2}T−2−Δxto make sure only N points are available for δ(x−mΔx)\delta(x-m\Delta{x})δ(x−mΔx) in the integral equation, and introduce a new symbol f^[m]=∑n=0N−1f[n]e−i2πnm/N\hat{f}[m] = \sum_{n=0}^{N-1} f[n] e^{-i 2\pi n m / N}f^[m]=∑n=0N−1f[n]e−i2πnm/N, which is referred to as the discrete Fourier transform (explained latter). So the discrete-time Fourier transform of a periodic sequence can be seen as the Fourier transform of a periodic function with exact sampling interval (Δx=TN\Delta x=\frac{T}{N}Δx=NT). The result is a summation of delta functions, where the weights are the discrete Fourier transform values. f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k) converges to zero everywhere except at integer multiples of 1T\frac{1}{T}T1, known as harmonic frequencies. And the period 1/Δx1/\Delta{x}1/Δx still holds for f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k): f^dtft(k+1/Δx)=1NΔx∑m=−∞∞f^[m]δ(k+1/Δx−m/(NΔx))=1NΔx∑m=−∞∞f^[m]δ(k−(m−N)/(NΔx)), n=m−N=1NΔx∑n=−∞∞f^[n+N]δ(k−n/(NΔx))=1NΔx∑n=−∞∞f^[n]δ(k−n/(NΔx))=f^dtft(k)\begin{equation} \begin{split} \hat{f}_{dtft}(k + 1/\Delta{x}) &= \frac{1}{N\Delta{x}} \sum_{m=-\infty}^{\infty} \hat{f}[m] \delta(k + 1/\Delta{x} - m/(N\Delta{x}))\\ &= \frac{1}{N\Delta{x}} \sum_{m=-\infty}^{\infty} \hat{f}[m] \delta(k - (m-N)/(N\Delta{x})),\ n=m-N\\ &= \frac{1}{N\Delta{x}} \sum_{n=-\infty}^{\infty} \hat{f}[n+N] \delta(k - n/(N\Delta{x}))\\ &= \frac{1}{N\Delta{x}} \sum_{n=-\infty}^{\infty} \hat{f}[n] \delta(k - n/(N\Delta{x}))\\ &= \hat{f}_{dtft}(k) \end{split} \end{equation} f^dtft(k+1/Δx)=NΔx1m=−∞∑∞f^[m]δ(k+1/Δx−m/(NΔx))=NΔx1m=−∞∑∞f^[m]δ(k−(m−N)/(NΔx)), n=m−N=NΔx1n=−∞∑∞f^[n+N]δ(k−n/(NΔx))=NΔx1n=−∞∑∞f^[n]δ(k−n/(NΔx))=f^dtft(k) Substituting f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k) into the inverse discrete-time Fourier transform formula (note that ks=1Δxk_s=\frac{1}{\Delta{x}}ks=Δx1 is the sampling frequency), we can verify that: 1ks∫ksf^dtft(k)ei2πnkksdk=1ks∫−ks2Nks−ks2NksN∑m=−∞∞f^dft[m]δ(k−mks/N)ei2πnkksdk=1N∫−ks2Nks−ks2N∑m=0N−1f^dft[m]δ(k−mks/N)ei2πnkksdk=1N∑m=0N−1∫−ks2Nks−ks2Nf^dft[m]ei2πnkksδ(k−mks/N)dk=1N∑m=0N−1f^dft[m]ei2πnmN=1N∑m=0N−1∑l=0N−1f[l]e−i2πmlNei2πnmN=1N∑l=0N−1f[l]∑m=0N−1ei2π(n−l)mN=1N∑l=0N−1f[l][eiπN−1N(n−l)sin(π(n−l))sin(πn−lN)]=1N∑l=0N−1f[l]g[l]=f[n]\begin{equation} \begin{split} \frac{1}{k_s} \int_{k_s} \hat{f}_{dtft}(k) e^{i 2\pi n \frac{k}{k_s}} dk &= \frac{1}{k_s} \int_{-\frac{k_s}{2N}}^{k_s - \frac{k_s}{2N}} \frac{k_s}{N} \sum_{m=-\infty}^{\infty} \hat{f}_{dft}[m] \delta(k - m k_s / N) e^{i 2\pi n \frac{k}{k_s}} dk\\ &= \frac{1}{N} \int_{-\frac{k_s}{2N}}^{k_s - \frac{k_s}{2N}} \sum_{m=0}^{N-1} \hat{f}_{dft}[m] \delta(k - m k_s / N) e^{i 2\pi n \frac{k}{k_s}} dk\\ &= \frac{1}{N} \sum_{m=0}^{N-1} \int_{-\frac{k_s}{2N}}^{k_s - \frac{k_s}{2N}} \hat{f}_{dft}[m] e^{i 2\pi n \frac{k}{k_s}} \delta(k - m k_s / N) dk\\ &= \frac{1}{N} \sum_{m=0}^{N-1} \hat{f}_{dft}[m] e^{i 2\pi n \frac{m}{N}} \\ &= \frac{1}{N} \sum_{m=0}^{N-1} \sum_{l=0}^{N-1} f[l] e^{-i 2\pi m \frac{l}{N}} e^{i 2\pi n \frac{m}{N}}\\ &= \frac{1}{N} \sum_{l=0}^{N-1} f[l] \sum_{m=0}^{N-1} e^{i 2\pi (n-l) \frac{m}{N}}\\ &= \frac{1}{N} \sum_{l=0}^{N-1} f[l] \left[e^{i \pi \frac{N-1}{N} (n-l)} \frac{\sin(\pi (n-l))}{\sin(\pi \frac{n-l}{N})}\right]\\ &= \frac{1}{N} \sum_{l=0}^{N-1} f[l] g[l]\\ &= f[n] \end{split} \end{equation} ks1∫ksf^dtft(k)ei2πnkskdk=ks1∫−2Nksks−2NksNksm=−∞∑∞f^dft[m]δ(k−mks/N)ei2πnkskdk=N1∫−2Nksks−2Nksm=0∑N−1f^dft[m]δ(k−mks/N)ei2πnkskdk=N1m=0∑N−1∫−2Nksks−2Nksf^dft[m]ei2πnkskδ(k−mks/N)dk=N1m=0∑N−1f^dft[m]ei2πnNm=N1m=0∑N−1l=0∑N−1f[l]e−i2πmNlei2πnNm=N1l=0∑N−1f[l]m=0∑N−1ei2π(n−l)Nm=N1l=0∑N−1f[l][eiπNN−1(n−l)sin(πNn−l)sin(π(n−l))]=N1l=0∑N−1f[l]g[l]=f[n] where g[l]=eiπN−1N(n−l)sin(π(n−l))sin(πn−lN)g[l]=e^{i \pi \frac{N-1}{N} (n-l)} \frac{\sin(\pi (n-l))}{\sin(\pi \frac{n-l}{N})}g[l]=eiπNN−1(n−l)sin(πNn−l)sin(π(n−l)) only has a value NNN when l=nl=nl=n otherwise 0. Discrete-Time Fourier Series Definition For a N-periodic sequence f[n]f[n]f[n], it has the following series representation: cm=∑n=0N−1f[n]e−i2πmnNf[n]=1N∑m=0N−1cmei2πnmN\begin{equation} \begin{split} c_m &= \sum_{n=0}^{N-1} f[n] e^{-i2\pi\frac{mn}{N}}\\ f[n] &= \frac{1}{N} \sum_{m=0}^{N-1} c_m e^{i2\pi \frac{nm}{N}} \end{split} \end{equation} cmf[n]=n=0∑N−1f[n]e−i2πNmn=N1m=0∑N−1cmei2πNnm Here cmc_mcm’s are the DTFS coefficients and they are periodic with period NNN. The series representation of f[n]f[n]f[n] is called the inverse DTFS. Connection to the DTFT As we proved in DTFT of Periodic Sequences, DTFS is actually a discretized version of DTFT of periodic sequences. Given a signal f(x)f(x)f(x) with period TTT and its N-periodic sequence f[n]f[n]f[n], we first compute its DTFS coefficients cmc_mcm, then the DTFT f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k) of f[n]f[n]f[n] is represented by: f^dtft(k)=1T∑m=−∞∞cmδ(k−m/T)\begin{equation} \begin{split} \hat{f}_{dtft}(k) &= \frac{1}{T} \sum_{m=-\infty}^{\infty} c_m \delta(k - m/T)\\ \end{split} \end{equation} f^dtft(k)=T1m=−∞∑∞cmδ(k−m/T) The original sequence f[n]f[n]f[n] can be recovered from the inverse DTFT, which is also the inverse DTFS. Discrete Fourier Transform TL;DR I found that Professor Jeffery Fessler’s note gives a clear picture of relations of different transforms. In general, the DFT is a simpified way to transform sampled time-domain sequences to sampled frequency-domain sequences. Definition For a sequence of NNN complex numbers {f[n]}n=0N−1{\{f[n]\}}_{n=0}^{N-1}{f[n]}n=0N−1, the discrete Fourier Transform transforms the sequence into another sequence of complex numbers {f^[m]}m=0N−1{\{\hat{f}[m]\}}_{m=0}^{N-1}{f^[m]}m=0N−1: f^[m]=∑n=0N−1f[n]e−i2πknN\begin{equation} \hat{f}[m] = \sum_{n=0}^{N-1} f[n] e^{-i 2\pi k \frac{n}{N}} \end{equation} f^[m]=n=0∑N−1f[n]e−i2πkNn The inverse discrete Fourier transform is given by: f[n]=1N∑m=0N−1f^[m]ei2πnmN\begin{equation} f[n] = \frac{1}{N} \sum_{m=0}^{N-1} \hat{f}[m] e^{i 2\pi n \frac{m}{N}} \end{equation} f[n]=N1m=0∑N−1f^[m]ei2πnNm Connection to the DTFT and DTFS The DFT can be motivated by the sampling of the DTFT. Consider sampling the DTFT with Δk\Delta{k}Δk sampling interval such that one period was sampled with exactly NNN points (NΔk=1/ΔxN \Delta{k} = 1/\Delta{x}NΔk=1/Δx): f^dtft[m]=f^dtft(mΔk)=∑n=−∞∞f[n]e−i2πnΔxmΔk=∑n=−∞∞f[n]e−i2πnmN\begin{equation} \begin{split} \hat{f}_{dtft}[m] &= \hat{f}_{dtft}(m\Delta{k})\\ &= \sum_{n=-\infty}^{\infty} f[n] e^{-i2\pi n\Delta{x} m\Delta{k}}\\ &= \sum_{n=-\infty}^{\infty} f[n] e^{-i2\pi \frac{nm}{N}} \end{split} \end{equation} f^dtft[m]=f^dtft(mΔk)=n=−∞∑∞f[n]e−i2πnΔxmΔk=n=−∞∑∞f[n]e−i2πNnm Let’s break f[n]f[n]f[n] into NNN-length segments such as f[0]...f[N−1]f[0] ... f[N-1]f[0]...f[N−1], f[N]...f[2N−1]f[N] ... f[2N-1]f[N]...f[2N−1], let n=l−jNn=l-jNn=l−jN where l∈[0,N−1]l \in [0, N-1]l∈[0,N−1] and j∈[−∞,∞]j \in [-\infty, \infty]j∈[−∞,∞], then f^dtft[m]\hat{f}_{dtft}[m]f^dtft[m] can be redefined with NNN-point periodic superposition fps[l]=∑j=−∞∞f[l−jN]f_{ps}[l] = \sum_{j=-\infty}^{\infty} f[l-jN]fps[l]=∑j=−∞∞f[l−jN]: f^dtft[m]=∑n=−∞∞f[n]e−i2πnmN=∑j=−∞∞∑l=0N−1f[l−jN]e−i2π(l−jN)mN=∑j=−∞∞∑l=0N−1f[l−jN]e−i2πlmN=∑l=0N−1(∑j=−∞∞f[l−jN])e−i2πlmN=∑l=0N−1fps[l]e−i2πlmN\begin{equation} \begin{split} \hat{f}_{dtft}[m] &= \sum_{n=-\infty}^{\infty} f[n] e^{-i2\pi \frac{nm}{N}}\\ &= \sum_{j=-\infty}^{\infty} \sum_{l=0}^{N-1} f[l-jN] e^{-i2\pi (l-jN) \frac{m}{N}}\\ &= \sum_{j=-\infty}^{\infty} \sum_{l=0}^{N-1} f[l-jN] e^{-i2\pi l \frac{m}{N}}\\ &= \sum_{l=0}^{N-1} \left(\sum_{j=-\infty}^{\infty} f[l-jN]\right) e^{-i2\pi l \frac{m}{N}}\\ &= \sum_{l=0}^{N-1} f_{ps}[l] e^{-i2\pi l \frac{m}{N}} \end{split} \end{equation} f^dtft[m]=n=−∞∑∞f[n]e−i2πNnm=j=−∞∑∞l=0∑N−1f[l−jN]e−i2π(l−jN)Nm=j=−∞∑∞l=0∑N−1f[l−jN]e−i2πlNm=l=0∑N−1(j=−∞∑∞f[l−jN])e−i2πlNm=l=0∑N−1fps[l]e−i2πlNm Obviously, fps[l]f_{ps}[l]fps[l] is a NNN-periodic sequence. Since fps[l]f_{ps}[l]fps[l] is NNN-periodic, it can be represented with the DTFS: cm=∑l=0N−1fps[l]e−i2πlmNfps[l]=1N∑m=0N−1cmei2πmlN\begin{equation} \begin{split} c_m &= \sum_{l=0}^{N-1} f_{ps}[l] e^{-i 2\pi \frac{lm}{N}}\\ f_{ps}[l] &= \frac{1}{N} \sum_{m=0}^{N-1} c_m e^{i 2\pi \frac{ml}{N}} \end{split} \end{equation} cmfps[l]=l=0∑N−1fps[l]e−i2πNlm=N1m=0∑N−1cmei2πNml Comparing the DTFS coefficients and the above DTFT samples, we see that: cm=f^dtft[m]\begin{equation} \begin{split} c_m &= \hat{f}_{dtft}[m] \end{split} \end{equation} cm=f^dtft[m] Thus, we can recover the periodic sequence fps[l]f_{ps}[l]fps[l] from those DTFT samples with the inverse DTFS: fps[l]=1N∑m=0N−1f^dtft[m]ei2πmlN\begin{equation} \begin{split} f_{ps}[l] &= \frac{1}{N} \sum_{m=0}^{N-1} \hat{f}_{dtft}[m] e^{i 2\pi \frac{ml}{N}} \end{split} \end{equation} fps[l]=N1m=0∑N−1f^dtft[m]ei2πNml However, this recovery does not ensure that we can recover the original sequence f[n]f[n]f[n] with those DTFT samples. fps[l]f_{ps}[l]fps[l] is a sum of shifted replicates of f[n]f[n]f[n]. Thus there is no perfect reconstruction for non time-limited sequences since time-domain replicates overlap and aliasing occurs. There is a special case where time-domain replicates do not overlap. Considering a time-limited sequence f[n]f[n]f[n] with duration LLL which it has nonzero values only in the interval 0,...,L−10,..., L-10,...,L−1. If N≥LN \ge LN≥L, then there is no overlap in the replicates. The original sequence f[n]f[n]f[n] can be recovered from fps[l]f_{ps}[l]fps[l]: f[n]={fps[n]n∈[0,L−1)0otherwise\begin{equation} f[n] = \begin{cases} f_{ps}[n] & n \in [0, L-1)\\ 0 & \text{otherwise} \end{cases} \end{equation} f[n]={fps[n]0n∈[0,L−1)otherwise In fact, if f[n]f[n]f[n] is time-limited, the DTFT samples simplifies to a limited summation: f^[m]=f^dtft[m]=∑n=0L−1f[n]e−i2πnmM=∑n=0N−1f[n]e−i2πnmM\begin{equation} \begin{split} \hat{f}[m] &= \hat{f}_{dtft}[m]\\ &= \sum_{n=0}^{L-1} f[n] e^{-i2\pi n \frac{m}{M}}\\ &= \sum_{n=0}^{N-1} f[n] e^{-i2\pi n \frac{m}{M}} \end{split} \end{equation} f^[m]=f^dtft[m]=n=0∑L−1f[n]e−i2πnMm=n=0∑N−1f[n]e−i2πnMm where f[n]=0f[n]=0f[n]=0 for n≥Ln \ge Ln≥L and the above formula is called DFT and f[n]f[n]f[n] can be recovered from the inverse DFT formula: f[n]={1N∑m=0N−1f^[m]ei2πmnNn=0,...,N−10otherwise\begin{equation} \begin{split} f[n] = \begin{cases} \frac{1}{N} \sum_{m=0}^{N-1} \hat{f}[m] e^{i 2\pi \frac{mn}{N}} & n=0, ..., N-1\\ 0 & \text{otherwise} \end{cases} \end{split} \end{equation} f[n]={N1∑m=0N−1f^[m]ei2πNmn0n=0,...,N−1otherwise Properties TODO Common DFT Pairs TODO Non-uniform Fast Fourier Transform TODO
Psychopy事件响应
Psychopy事件响应 Psychopy提供了很多IO交互方式,当然,最根本的还是键盘和鼠标。本节介绍Psychopy鼠标和键盘的编程技巧。 全局按键响应 编写刺激界面免不了要反复调试,要看看字体颜色对不对、图形大小合不合适,一旦发现刺激界面需要改进就得退出程序修改源代码。 如果采用普通的按键检测方式,则需要在一个循环体内检查按键状态,这显然有可能造成不可知的错误(比如在检测按键前进入一个死循环函数,程序永远无法退出啦),这个时候全局按键响应就很有用了。 Psychopy用psychopy.event.globalkeys来设置全局按键,官方文档里没有如何使用全局按键的说明,但在coder的Demo里有演示global_event_keys.py。 global_event.keys.py程序注册了三个按键,按键“b”调用python的setattr函数,设置rect对象的填充颜色为蓝色,按键“ctrl”+“r”调用python的setattr函数,设置rect对象的填充颜色为红色。按键“q”调用core.quit方法终止程序退出。 # -*- coding: utf-8 -*-from psychopy import core, event, visual, monitorsif __name__=='__main__': mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息 mon.setSizePix((1920, 1080)) # 设置显示器分辨率 mon.save() # 保存显示器信息 win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=False) rect = visual.Rect(win, fillColor='blue', pos=(0, -0.2)) text = visual.TextStim( win, pos=(0, 0.5), text=('Press\n\n' 'B for blue rectangle,\n' 'CTRL + R for red rectangle,\n' 'Q or ESC to quit.')) # Add an event key. event.globalKeys.add(key='b', func=setattr, func_args=(rect, 'fillColor', 'blue'), name='blue rect') # Add an event key with a "modifier" (CTRL). event.globalKeys.add(key='r', modifiers=['ctrl'], func=setattr, func_args=(rect, 'fillColor', 'red'), name='red rect') # Add multiple shutdown keys "at once". for key in ['q', 'escape']: event.globalKeys.add(key, func=core.quit) # Print all currently defined global event keys. print(event.globalKeys) print(repr(event.globalKeys)) while True: text.draw() rect.draw() win.flip() 以下是event.globalkeys.add()方法的参数介绍 event.globalkeys.add(key, func, func_args=(), func_kwargs=None, modifiers=(), name=None) parameters type description key string 按键字符串 func function 按键时执行的函数 func_args iterable 函数的args参数 func_kwargs dict 函数的kwargs参数 modifiers iterable 组合按键字符串列表,例如’shift’,‘ctrl’,‘alt’,‘capslock’,'scrollock’等 name string 按键事件的名称 此外还有event.globalkeys.remove()方法以移除全局按键 event.globalkeys.remove(key, modifiers=()) parameters type description key string 按键字符串 modifiers iterable 组合按键字符串列表,例如’shift’,‘ctrl’,‘alt’,‘capslock’,'scrollock’等 等待按键和检测按键 除了全局按键响应,Psychopy还提供了等待按键响应和检测按键响应两种方式。 以下为等待按键函数event.waitKeys()的演示程序,按’esc’或五次其他按键退出程序。 # -*- coding: utf-8 -*-from psychopy import core, event, visual, monitorsif __name__=='__main__': mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息 mon.setSizePix((1920, 1080)) # 设置显示器分辨率 mon.save() # 保存显示器信息 win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=False) msg = visual.TextStim(win, text='press a key\n < esc > to quit') msg.draw() win.flip() k = [''] count = 0 while k[0] not in ['escape', 'esc'] and count < 5: k = event.waitKeys() print(k) count += 1 win.close() core.quit() event.waitKeys()阻塞函数进程直到被试按键,以下是event.waitKeys()方法的参数介绍 event.waitKeys(maxWait=inf, keyList=None, modifiers=False, timeStamped=False, clearEvents=True) parameters type description maxWait numeric value 最大等待时间,默认为inf keyList iterable 指定函数检测的按键名称,函数仅在按指定键时返回 modifiers bool 如果True,返回(keyname, modifiers)的tuple timeStamped bool 如果True,返回(keyname, time) clearEvents bool 如果True,在检测新的按键前清理event buffer return type description keys iterable 按键列表;超时返回None 等待按键会阻塞进程,Psychopy还提供了另一种非阻塞检测方式event.getKeys()。 以下代码如下不断检测按键并输出,直到按’escape’键退出程序。 # -*- coding: utf-8 -*-from psychopy import core, event, visual, monitorsif __name__=='__main__': mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息 mon.setSizePix((1920, 1080)) # 设置显示器分辨率 mon.save() # 保存显示器信息 win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=False) msg = visual.TextStim(win, text='press a key\n < esc > to quit') msg.draw() win.flip() count = 0 while True: k = event.getKeys() if k: if 'escape' in k: break print(k) win.close() core.quit() 以下是event.getKeys()方法的参数介绍 event.getKeys(keyList=None, modifiers=False, timeStamped=False) parameters type description keyList iterable 指定函数检测的按键名称,函数仅在按指定键时返回 modifiers bool 如果True,返回(keyname, modifiers)的tuple timeStamped bool 如果True,返回(keyname, time) return type description keys iterable 按键列表;超时返回None 鼠标事件 Psychopy提供event.Mouse类来处理鼠标相关的事件,官方文档对此有详细的介绍。以下的代码显示了一个含有矩形的窗,在矩形内部单击左右键可以改变颜色,而按中央滚轮键则退出程序。 # -*- coding: utf-8 -*-from psychopy import core, event, visual, monitorsif __name__=='__main__': mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息 mon.setSizePix((1920, 1080)) # 设置显示器分辨率 mon.save() # 保存显示器信息 win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=True) rect = visual.Rect(win, fillColor='blue', pos=(0, 0)) # 创建Mouse类 mouse = event.Mouse(visible=True, newPos=(0, 0), win=win) while True: # 重置单击事件状态 mouse.clickReset() # 检测左键是否在矩形内单击 if mouse.isPressedIn(rect, buttons=[0]): rect.fillColor = 'red' # 检测右键是否在矩形内单击 if mouse.isPressedIn(rect, buttons=[2]): rect.fillColor = 'blue' # 检测是否单击滚轮键 # button1: left click # button2: middle click # button3: right click button1, button2, button3 = mouse.getPressed(getTime=False) if button2: break rect.draw() win.flip() core.quit()