This post contains some userful tensor notation and tricks which I have seen and collected. The majority of the content is from Tamara Kolda and Brett Bader’s review, Tensor Decompositions and Applications.
What is A Tensor
The mathematical definition of tensor is hard to understand for me. I would prefer viewing a tensor as a multi-dimensional array. An order-N tensor X∈RI1×I2×⋯IN is a N dimensional array and Ik is the degree of the k-th dimension. So matrix are order-2 tensors and vectors are order-1 tensors. Elements in tensor X are denoted as xi1,i2,⋯,iN, where ik is the index running from 1 to Ik.
Norm, Inner Product, and Rank-One
The inner product of two same-sized tensors X,Y∈RI1×I2×⋯IN is the sum of the product of their elements:
Continuous k-mode products with matrices A1∈RJ1×I1,⋯,AN∈RJN×IN are denoted as:
Y=X×1A1×2A2⋯×NAN
Intuitively, if no two matrices act on the same mode, then the order of product between them is interchangeable.
If two matrices share the same mode, then
X×nA×nB=X×n(BA)
Mode-K Unfolding
The mode-k unfolding of a tensor is a matricization process, that is:
X(k)=[⋯v⋯]
where X(k)∈RIk×I1⋯Ik−1Ik+1⋯IN. Each column v is called a fiber, which is just elements along the k-th dimension given other indices. For simplicity, let’s assume that this reshape operation follows the row-major layout or C-like order, with the index of the last axis changing the fastest. And the k-rank of X is defined as the rank of the mode-k unfolding of X.
With the mode-k unfolding, the k-mode product can be expressed as a normal matrix product:
Y(k)=AX(k)
or for continuous k-mode products:
Y(k)=AkX(k)(A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN)T
where ⊗ is the Kronecker product and A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN∈RJ1J2⋯JN×I1I2⋯IN is a super big matrix (you would find that Kolda & Bader matricized the tensor with the column-major layout and the order of Kronecker products was reversed).
The Kronecker product of matrices A∈RI×J and B∈RK×L is defined by:
Two most common tensor decompositions are CP and Tucker decompositions, considered to be higher-order generalizations of the matrix SVD and PCA, separately.
CP Decomposition
CP decomposition (cnanonical decomposition/parallel factors, CANDECOMP/PARAFAC) is to express a tensor as the sum of a finite number of rank-one tensors. For example, given a 3-order tensor X∈RI×J×K, CP decomposes it as:
X≈i=1∑Rai∘bi∘ci
where the combination of the column vectors are called factor matrices, i.e., A=[a1⋯aR]. Let’s assume that each vector is normalized to value one with a weight vector λ∈RR so that the decomposition is:
X≈i=1∑Rλiai∘bi∘ci
For a general N-order tensor X∈RI1×I2×⋯×IN, with the weight vector λ and factor matrices Ak∈RIk×R, the CP decomposition is:
X≈i=1∑Rλiai1∘ai2∘⋯∘aiN
and the mode-k matricized version is:
X(k)≈AkΛ(A1⊙⋯Ak−1⊙Ak+1⊙⋯AN)T
where Λ∈RR×R is a diagonal matrix of the weight vector λ and ⊙ is the Khatri-Rao product.
The Khatri-Rao product of two matrices A∈RI×K and B∈RJ×K is defined by:
A⊙B=[a1⊗b1a2⊗b2⋯aK⊗bK]
where the resulting matrix is in RIJ×K.
The rank of a tensor is defined as the smallest number R to achieve an exact CP decomposition in Equation (14). The tensor rank is an analogue to the matrix rank, but it has many different properites:
the tensor rank may be different over R and C;
there is no straightforward algorithm to determine the rank of a given tensor yet;
the rank decomposition is generally unique with the exception of permuation and scaling operations.
It’s well-known that the best rank-k approximation of a rank-R matrix A is the leading k components of its SVD:
A≈i=1∑kλiui∘vi,withλ1≥λ2≥⋯≥λk
but this does not hold true for high-order tensors. In fact, the best rank-k approximation of a tensor may not exist.
Since there is no easy way to determine the rank of a tensor, most CP decomposition procedures would fit multiple models until finding one that is good enough to approximate the given tensor. The Alternating least squares (ALS) method is one of them to compute a CP decomposition by solving one factor matrix each time with others fixed:
where ∗ is the elementwise product (Hadamard product) and † is the pseudo-inverse. λ can be computed by normalizing the columns of Ak. The ALS method iterates until the stopping criteria satisfied. However, the ALS method is not guaranteed to converge to a global minimum or even a stationary point.
Tucker Decomposition
The Tucker decomposition decomposes a tensor into a core tensor multiplied by a factor matrix along each mode:
X=G×1A1×2A2⋯×NAN
where G∈RR1×⋯×RN, Ak∈RIk×Rk and Ak is orthogonal. It can be seen that the CP decomposition is a special case of the Tucker decomposition where the core tensor is superdiagonal and R1=R2=⋯=RN.
An important concept of Tucker Decomposition is the n-rank of X, denoted rankn(X), which is the rank of X(n). The Tucker decomposition requires a rank group (R1,R2,⋯,RN), which is also the size of the core tensor G.
Higher-order orthogonal iteration (HOOI) solves Ak with others fixed, as the ALS method: