This post summarizes the ESPIRiT algorithm for estimating coil sensitivies in MRI. It took me months to grasp the theory behind it.
A short introduction
MRI is a common imaging technique for medical diagnosis nowadays. It’s nearly harmless for humans in contrast to other radioactive imaging techniques. In MRI, when a patient is exposed to a magnetic field, hydrogen atoms align with the field. This alignment could be disrupt by a radiofrequency pulse. As the atoms return to their original alignment, they emit signals captured by sensors and used to create medical images.
The signals captured by a sensor c (or coil c) can be approximately formulated as:
yc(k(t))=∫x(r)Bc(r)e−i2πk(t)⋅rdr
where x(r) is an unkown tissue value at location r, Bc(r) is some value related to the receiver coil c, often named as coil sensitivities, k(t) is k-space coordinates along with the time t. It’s common to have dozens of coils in clinical settings.
This formula indicates that the signals captured by the coil c are fourier transform of interested tissue values which are multipled by the sensitivity map of the coil c. A concise way of the above formula is to represent it with operators:
yc=FScx
where F denotes fourier transform, Sc denotes the sensitivity map of the coil c, yc and x are both vectorized.
Most of the time, coil sensitivities are not available beforehand. We have to estimate them from acquired signals. ESPIRiT is such a method for estimating coil sensitivities , supported by soild theoretical analysis.
About GRAPPA
Before diving into ESPIRiT, it’s important to first understand GRAPPA, a well-established parallel imaging method used in MRI. The core of GRAPPA is to interpolate missing k-space points by linearly weighting neighboring acquired points, with the weighting coefficients determined from the calibration region.
Again, a concise way to describe GRAPPA is using operators:
xc(r)=(PkRry)Tgc
where y represents the vectorized multicoil k-space data with unacquired data set to zero, xc(r) is the estimated value for coil c at location r (for simplicity, the vector notation is omitted, but keep in mind that r is a discrete spatial vector index), gc denotes weighting coefficients for coil c, Rr is a patch operator that selects a local block of data around location r and Pk is a sampling operator that selects only the acquired points within the patch (the subscript k implies that there may exist many sampling patterns).
We can easily construct linear equations using the data from the calibration region, with the same undersamping pattern Pk:
ycAC=APkTgc
where ycAC denotes the vectorized data in the calibration region, A is a so-called calibration matrix, that each row is a local patch around the current acquired point. In this case, gc can be efficiently solved by many numerical methods.
By construction , one of the columns of A is actually ycAC itself. Write this as Aec=ycAC, where ec is a vector with 1 in the appropriate location to select the corresponding column of A. We note that:
The existence of null-space implies redundancy in row-space of A. gc in GRAPPA are very specific null-space vectors. Instead of using this specific form, ESPIRiT analyzes the calibration matrix A directly.
Key assumptions in ESPIRiT
Since the matrix A is constructed using local patches in the calibration region, ESPIRiT assumes that all patches, whether in the calibration region or not, lie in the row-space of A:
V⊥TRrx=0,∀r
where V⊥ is the basis that spans the null-space and x is the vectorized multicoil k-space data. Note that “lying in the row-space” is equivalent to being orthogonal to the null-space.
I wrote the formula as V⊥HRrx, following the same notation used in the paper. However, this ultimatley leads to an algorithm that conflicts with most correct implementations, which require flipping the kernels before applying the Fourier transform. Neither the paper nor subsequent research discussed this operation in detail. This confused me for months and it was truly frustrating.
I think the mismatch arises from how we handle the inner product of complex vectors. The rows of the matrix A are flattened row vectors extracted from local patches, while Rrx is a flattened column vector.
Since the standard Hermitian inner product is defined for column vectors, when we want to compute the inner product of two row vectors a,b∈C1×N, we firstly convert them into column vectors:
⟨a,b⟩=⟨aT,bT⟩=(aT)HbT=a∗bT
Thus the inner product of V⊥H and Rrx is V⊥TRrx instead of V⊥HRrx from the original paper.
V⊥ is computed by singular value decomposition of A:
A=UΣVH=UΣ[V∥V⊥]H
where V∥ corresponds to non-zero singular values and V⊥ corresponds to zero singular values.
Solving the nullspace constraint in the least-square framework
xargminr∑21∥V⊥TRrx∥F2
which leads to the following equations by zeroing out the gradient:
r∑RrH(V⊥V⊥H)∗Rrx=0
Since V is unitary, we have I=V⊥V⊥H+V∥V∥H. By substituting into the above equations, we get:
r∑RrH(V∥V∥H)∗Rrx=r∑RrHRrx
The effect of RrHRr is to restore the patch to its original location while zeroing out others elements. The overall effect of ∑rRrHRr is a diagonal matrix, where each diagonal element equals to M, indicating the number of times it appears in the neighboring patches. With M, the equation can be transformed to:
M−1r∑RrH(V∥V∥H)∗Rrx=x
which is equation (9) in the paper except for the conjugation.
“Obvious” convolution
The most challenging part for me is to derive equation (16) Gqsq=sq from the paper. The authors mentioned that the operator W=M−1∑rRrHV∥V∥HRr (in our case, W=M−1∑rRrH(V∥V∥H)∗Rr) is a matrix-valued convolution, which can be decoupled into point-wise matrix operations in the image domain. The sensitivity maps are obtained by the eigenvalue decomposition of all Gq’s choosing only the eigenvectors corresponding to eigenvalue “=1”.
The intutive is that we could use the Fourier Convolution Theorem to get what we want, which is more straightforward in scalar cases. However, since W is a high-dimensional operator (k-space locations and channels), applying the theorem directly is not straightforward. A better approach would be to transform W into a form that resembles something familiar in 1D.
To simplify the derivation, we first introduce some notations:
P=(V∥V∥H)∗: P operates on patches
P(c,α),(d,β): a scalar value, where (c,α) represents the c-th channel and location α in the output (row index), and (d,β) represents the d-th channel and location β in the input (column index)
xc(r): a scalar value, where c represents the c-th channel and r reprsents location r in the k-space
(Rrx)(d,β): a scalar value, the d-th coil value at location β in the patch centered at location r
(PRrx)(c,α): a scalar value, the c-th coil value at location α of the output
(RrHPRrx)c(m): a scalar value, the c-th coil value at location m in the k-space
As we have discussed above, Rr extracts a patch centered at location r:
(Rrx)(d,β)=xd(r+β)
where β represents the relative shift with respect to the central location r.
The effect of P is to perform a weighted summation across the channels and locations of the input:
where Kc,d(u)=∑α,β,α−β=uP(c,α),(d,β). Recall that u is determined by the relative shifts of α and β. What we are doing here is actually re-grouping the pairs (α,β) according to their common difference u. We don’t need to worry about how to compute Kc,d(u) right now; we only need to understand that it is a scalar value associated with c,d and the difference u.
Finally, we got something that looks like a convolution. ∑uKc,d(u)xd(m−u) is a convolution of Kc,d(m) and xd(m). It’s worth noting that u cannot exceed twice the size of the patch, as it represents the difference between the relative shifts. Kc,d(m) is only meaningful in this range.
Calculate coil sensitivies
We should know that xc(m)=F[sc(n)x^(n)](m), as we have discussed in the introduction. Let’s add it to the right-hand side of the equation:
(Wx)c(m)M−1d∑Kc,d(m)∗xd(m)=xc(m)=xc(m)
Apply the inverse Fourier transform on both sides:
M−1d∑K^c,d(n)sd(n)x^(n)=sc(n)x^(n)
where x^(n) is in image-space, sd(n) is the d-th coil sensitivity value at the location n. With all coils at the location n, we have:
where xrev∗ is the conjugate of the reversed signal x. If N is odd, our implementation is equivalent to the kernel flipping operation. However, if N is even, there exists a linear phase term e−i2πNm between the two methods.
I am still investigating the differences between the two methods. However, based on the assumptions in the original paper, I ended up with algorithmic steps that, for unknown reasons, do not match those in other offical implementaions such as bart.
I believe Professor Haldar’s linear-predictability framework may help explain the discrepancy here. In that work, the authors present a connection between ESPIRiT and the nullspace algorithm direclty, but without providing a proof. Surprisingly, one of their acceleration techniques(Section IV, Part C) is quite similar to my relative shift trick in equation (18). I plan to explore the underlying mathematical relationships in my next blog post.