category : cuda
Total 2 articles
Total 2 articles
上一篇blog实现了一个基本的cuda程序,本篇记录如何实现矩阵乘法并加速。主要内容是对blog内容的学习和复现,目的是逐步逼近cuBLAS实现的单精度矩阵乘法: C=αAB+βC\mathbf{C} = \alpha \mathbf{A}\mathbf{B} + \beta \mathbf{C} C=αAB+βC 其中A∈RM×K,B∈RK×N,C∈RM×N\mathbf{A}\in \mathbb{R}^{M \times K},\mathbf{B}\in \mathbb{R}^{K \times N},\mathbf{C}\in \mathbb{R}^{M \times N}A∈RM×K,B∈RK×N,C∈RM×N。源代码在这里。 Naive vs cuBLAS 因为Ci,j=α∑k=0K−1Ai,kBk,j+βCi,j\mathbf{C}{i,j} = \alpha \sum_{k=0}^{K-1} \mathbf{A}_{i,k} \mathbf{B}_{k,j} + \beta \mathbf{C}{i,j}Ci,j=α∑k=0K−1Ai,kBk,j+βCi,j,很容易我们可以想到为C\mathbf{C}C的每个元素分配一个thread计算值并输出: __global__ void sgemm_naive( int M, int N, int K, const float alpha, const float *A, const float *B, const float beta, float *C){ const int i = blockIdx.x * blockDim.x + threadIdx.x; const int j = blockIdx.y * blockDim.y + threadIdx.y; // avoids memory access error if threads are more than elements if (i < M && j < N) { float sum = 0.0f; for (int k = 0; k < K; ++k) { sum += A[i * K + k] * B[k * N + j]; } C[i * N + j] = alpha * sum + beta * C[i * N + j]; }} dim3 gridDim((M - 1) / 32 + 1, (N - 1) / 32 + 1);dim3 blockDim(32, 32, 1);sgemm_naive<<<gridDim, blockDim>>>(M, N, K, alpha, d_A, d_B, beta, d_C); block的数目要确保了能覆盖C\mathbf{C}C的所有元素,当M,NM,NM,N不能被32(这里简单的让x和y方向上的thread数目相等,当然也可以分别选择不同的值)整除时,边缘处总有一些threads在空跑,这是无法避免的。sgemm_naive的各项指标如下: 总浮点数操作:MN(2K+3)MN(2K+3)MN(2K+3) 总显存访问次数:MN(2K+1)MN(2K+1)MN(2K+1) 总显存占用:(MN+MK+KM)∗4B(MN+MK+KM)*4B(MN+MK+KM)∗4B 当M=N=K=4096M=N=K=4096M=N=K=4096时,总浮点数操作为137.5GFLOPS,GTX1080的FP32运算能力是8.873TFLOPS,最好情况下可以在15.5ms左右完成sgemm。要存储矩阵并完成计算,理想情况下总显存占用为201.3MB,GTX1080显存8GB,因此显存大小是没有问题的。不过,总显存访问次数计算出的显存通量是549.8GB,而GTX1080的带宽峰值是320GB/s,因此完成这么多次显存访问总共需要1.718s,远远大于实际计算时间。sgemm_naive的运行时间完全由显存访问主导,大部分时间GPU核心都在等待数据。实际运行中只会更慢,在我的电脑上,sgemm_naive大概需要2900ms。 而cublasSgemm呢,只需要大约20ms,接近15ms的理论极限,快的惊人! Memory Coalescing 在基本概念中我们提到GPU会把block中每32个thread组成一个warp交由硬件调度,考虑一个32x32的block,那么到底是row上的32个threads组成warp还是column上的32个threads组成warp?事实上,针对多维情况,会首先转化成1维,并取连续的32个threads组成warp。对于三元组(x,y,z)CUDA的规则是x优先,每个block中的全局threadId由下式确定: threadId=threadIdx.x+blockDim.x∗threadIdx.y+blockDim.x∗blockDim.y∗threadIdx.zthreadId = threadIdx.x + blockDim.x * threadIdx.y + blockDim.x * blockDim.y * threadIdx.z threadId=threadIdx.x+blockDim.x∗threadIdx.y+blockDim.x∗blockDim.y∗threadIdx.z warp的特点之一是如果这些threads读取的是同样的数据,会触发一种within-warp broadcast的共享机制,实际上不需要每个thread都重新读取数据,因此会显著减小开销。另一个特点是这些threads读取显存时可以一次性读取大量连续数据(比如32B、64B、128B)到更快的内存上(例如L2 cache)。例如一个warp中的32个thread可以选择各自读取1个浮点数(4B),需要操作32次,但如果他们读的是一块连续的内存,实际上可以用一个128B的读取指令等价,只需操作一次,直接加速32倍。满足memory coalescing的程序就会去尽量利用这种机制,让一个warp里的threads去利用一块相近的而内存区域。 那么sgemm_naive是不是memory coalescing的呢?首先我们要明确,在当前的例子中,C++采取row-major的存储顺序,即矩阵的每一行在内存中是连续的。我们当前的block大小是32x32,正好每32个threads组成一个warp。而这个wrap中的threads都是threadIdx.y不变而threadIdx.x递增。根据我们对于每个thread与元素的映射规则: const int i = blockIdx.x * blockDim.x + threadIdx.x;const int j = blockIdx.y * blockDim.y + threadIdx.y; i递增而j不变,因此程序会不断读取B\mathbf{B}B的第j列和A\mathbf{A}A不同i行。根据warp的特点,读取B\mathbf{B}B的列会触发within-warp broadcast,因此尽管列中的元素不是连续存储的,threads读取这些元素的开销很小;但是对于A\mathbf{A}A,尽管每一行是连续的内存,但每个thread用的是来自不同行的数据,同一时刻读取的元素(每一列中的32个元素)在内存里是不连续的,自然无法满足memory coalescing的要求。 因此,应当尽量使threadIdx.x的变化同内存排序一致,我们只需要交换xy的方向: const int j = blockIdx.x * blockDim.x + threadIdx.x;const int i = blockIdx.y * blockDim.y + threadIdx.y; 此时,同一个warp里j递增而i不变,程序会反复读取A\mathbf{A}A的第i行和B\mathbf{B}B的不同列。读取A\mathbf{A}A的行会触发within-warp broadcast,而对于B\mathbf{B}B而言,同一时刻每个thread读取的元素在内存中是连续的,满足memory coalescing的要求。 __global__ void sgemm_coalesce( int M, int N, int K, const float alpha, const float *A, const float *B, const float beta, float *C){ const int j = blockIdx.x * blockDim.x + threadIdx.x; const int i = blockIdx.y * blockDim.y + threadIdx.y; // avoids memory access error if threads are more than elements if (i < M && j < N) { float sum = 0.0f; for (int k = 0; k < K; ++k) { sum += A[i * K + k] * B[k * N + j]; } C[i * N + j] = alpha * sum + beta * C[i * N + j]; }} sgemm_coalesce只需要392ms。 Tiled Matrix Multiply with Shared Memory sgemm_coalesce的主要开销还是由从全局内存读取数据的次数决定。一种改进思路是用shared memory实现tiled matrix multiply。tiled matrix multiply通过把数据分块,一次性读取分块数据到更快的内存上(认为更快的内存访问时间可以忽略不计),从而减小整体全局内存访问的开销。假设缓存的大小是B x B,则总显存访问次数为MN(2K/B+1)MN(2K/B+1)MN(2K/B+1),因此B越大,访问全局内存的时间开销就越少。 #define BLOCKSIZE 32__global__ void sgemm_shared( int M, int N, int K, const float alpha, const float *A, const float *B, const float beta, float *C){ __shared__ float As[BLOCKSIZE * BLOCKSIZE]; __shared__ float Bs[BLOCKSIZE * BLOCKSIZE]; A += blockIdx.y * BLOCKSIZE * K; B += blockIdx.x * BLOCKSIZE; C += blockIdx.y * BLOCKSIZE * N + blockIdx.x * BLOCKSIZE; const int j = blockIdx.x * BLOCKSIZE + threadIdx.x; const int i = blockIdx.y * BLOCKSIZE + threadIdx.y; // avoids memory access error if threads are more than elements if (i < M && j < N) { float fSum = 0.0f; // stores result of (threadIdx.y, threadIdx.x) on each block for (int iBlkIdx = 0; iBlkIdx < K; iBlkIdx += BLOCKSIZE) { if (iBlkIdx + threadIdx.x < K) { As[threadIdx.y * BLOCKSIZE + threadIdx.x] = A[threadIdx.y * K + threadIdx.x]; } if (iBlkIdx + threadIdx.y < K) { Bs[threadIdx.y * BLOCKSIZE + threadIdx.x] = B[threadIdx.y * N + threadIdx.x]; } __syncthreads(); // syncronize until all caches are fulfilled // updates to the next chunk A += BLOCKSIZE; B += BLOCKSIZE * N; // dot product on caches for (int iInnerLoop = 0; iInnerLoop < BLOCKSIZE; ++iInnerLoop) { if (iBlkIdx + iInnerLoop < K) { fSum += As[threadIdx.y * BLOCKSIZE + iInnerLoop] * Bs[iInnerLoop * BLOCKSIZE + threadIdx.x]; } } __syncthreads(); } C[threadIdx.y * N + threadIdx.x] = alpha * fSum + beta * C[threadIdx.y * N + threadIdx.x]; }} 上面的核函数各为A\mathbf{A}A和B\mathbf{B}B分配了32 x 32的shared memory,同一个block中的threads都可以访问。这里我们假设block的大小就是(32, 32, 1),因此block中的每一个thread可以通过(threadIdx.x, threadIdx.y)对应shared memory中的一个元素,也对应C\mathbf{C}C中的一个输出元素。 block中的thread有三种工作:1、读取A\mathbf{A}A和B\mathbf{B}B中的对应块位置元素到shared memory,等待直到所有threads完成元素读取;2、block中的每个thread使用shared memory执行分块矩阵乘法,并累加计算的结果;3、移动分块在A\mathbf{A}A和B\mathbf{B}B中的位置,重复1和2的工作,直到block所对应的A\mathbf{A}A的行和B\mathbf{B}B的列都遍历结束,最终将值写入C\mathbf{C}C中。 此外,也需要考虑矩阵大小不能被block大小整除的情况,通过边界条件来控制thread的计算。sgemm_shared的执行时间为147ms。
CMake is a popular cross-platform build system that allows developers to use a common interface to define a set of rules to build the source code with different compilers, such as GCC, Clang and Visual Studio. In fact, CMake supports not only C/C++, but also other languages such as C# and Fortran. CMake is an essential tool for building software projects and making the whole process easier for anyone. I put my code here. Basic How to start After you install CMake, the starting point is to create a CMakeLists.txt file. Three basic commands are required for the minimum build process. Below is a basic CMakeLists.txt file template: cmake_minimum_required(VERSION 3.12)project(HelloWorld VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)add_executable(helloworld main.cpp) Don’t forget to create a source file named main.cpp alongside with the CMakeLists.txt file: #include <iostream>int main(int argc, char *argv[]){ std::cout << "HELLO CMAKE!" << std::endl; return 0;} To build the project, open the terminal where you create your CMakeList.txt file, create a new folder mkdir build && cd build and just type cmake ... CMake would output a bunch of files for building the system, the last step is to actually compile&link the project. Depending on the underlying generators, we could build the project with make or ninja or a more general command cmake --build . --config Release. The default behavior is to generate an executable object with release configs on the current folder. Type ./helloworld to run the program. CMake uses the jargon target refering to those final executable objects or libraries, e.g. helloworld in this demo. LANGUAGES C CXX specifies the programming languages needed to build the project. CMAKE_CXX_STANDARD specifies the c++ standard to build the project. How to find headers So far we have only compiled one source file, but what if we have hundreds of source files and header files scattered across multiple directories? I created a folder structure looks like this: includes/ algo.halgo.cppmain.cppCMakeLists.txt To compile this project, we tell CMake that we also need to compile algo.cpp and algo.h: cmake_minimum_required(VERSION 3.12)project(HelloWorld VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)add_executable(helloworld main.cpp algo.cpp)target_include_directories(helloworld PRIVATE includes) target_include_directories refers to where to seach for additional headers, the includes directory in this case. And PRIVATE refers to that these header files are only visible for the target helloworld, not for any target linking against the target defined here (if helloworld is a library). include_directories acts as the same purpose but for all targets in the current CMakeLists. How to collect all sources Instead of manual settings, an easy way to get all cpp files is to use file command: cmake_minimum_required(VERSION 3.12)project(HelloWorld VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)file(GLOB SOURCES *.cpp utils/*.cpp)add_executable(helloworld ${SOURCES})target_include_directories(helloworld PRIVATE includes) which collects files under the current and utils directories with the postfix .cpp and stores into the variable SOURCES. Another way to collect all source files is to use command aux_source_directory like this: cmake_minimum_required(VERSION 3.12)project(HelloWorld VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)aux_source_directory(. utils SOURCES)add_executable(helloworld ${SOURCES})target_include_directories(helloworld PRIVATE includes) How to add definitions It’s normal to add preprocessor definitions to control the compiling process. Before version 3.12, CMake used command add_definitions which affects all targets in the current directory and subdirectories, even if you don’t want to. In version 3.12, CMake introduced a new command add_compile_definitions which is more specific to targets in the current CMakeLists. It’s generally recommended to move towards to the new command. The following code defines EXPORT_DLL: add_definitions(-DEXPORT_DLL)add_compile_definitions(EXPORT_DLL) It’s also possible to add definitions to a specific target: target_compile_definitions(helloworld PRIVATE EXPORT_DLL) How to pass compiler flags Sometimes we want to control the compiler behaviors more precisely. CMake has add_compile_options for all targets in the current directory and subdirectories, and targe_compile_options for a specific target. The following code tells gcc/g++ to use O3 optimization level and generate debug information to be used by GDB: add_compile_options(-O3)target_compile_options(helloworld PRIVATE -g) How to designate output folders There are some predefined variables to specifies the output directories: CMAKE_RUNTIME_OUTPUT_DIRECTORY refers to where to put executable files (.exe) created by add_executable or .dll created by add_library with SHARED on Windows platform CMAKE_LIBRARY_OUTPUT_DIRECTORY refers to where to put shared libraries (.so) created by add_library with SHARED option on Linux platform CMAKE_ARCHIVE_OUTPUT_DIRECTORY refers to where to put static libraries (.a, .lib) created by add_library with STATIC option or .lib created by add_library with SHARED option on Windows platform CMake also has a few useful variables to specify directories: CMAKE_SOURCE_DIR contains the directory where the called CMakeLists exists CMAKE_BINARY_DIR contains the directory where you generate those temporary files to build the project For our helloworld project, the executable helloworld would be put under the lib folder: cmake_minimum_required(VERSION 3.12)project(HelloWorld VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)file(GLOB SOURCES "*.cpp" "utils/*.cpp")set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/lib)add_executable(helloworld ${SOURCES})target_include_directories(helloworld PRIVATE includes) How to compile static libraries Just use add_library instead of add_executable: cmake_minimum_required(VERSION 3.12)project(stalib VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/lib)aux_source_directory(. SOURCES)add_library(stalib STATIC ${SOURCES}) This CMakeLists builds a static library libstalib.a. How to compile shared libraries I also made a new folder dynlib under my helloword project. CMake compiles shared libraries by add_library command with SHARED option: cmake_minimum_required(VERSION 3.12)project(dynlib VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/lib)aux_source_directory(. SOURCES)add_library(dynlib SHARED ${SOURCES})target_compile_definitions(dynlib PRIVATE EXPORT_DLL)target_compile_options(dynlib PRIVATE -fvisibility=hidden -fvisibility-inlines-hidden) The above config generates libdynlib.so, which can be linked into a larger program. By default, Linux exports all symbols when compiles a shared library. However, Windows does the opposite thing. To control the visibility of symbols on Linux, we could add -fvisibility=hidden -fvisibility-inlines-hidden to the compiler to make sure all symbols unvisible, until we make it explicitly by adding __attribute__((visibility('default'))) before symbols (Windows uses __declspec(dllexport) and __declspec(dllimport)). This would greatly reduce the chance of symbol collision. For compatible code, we could use macro definitions: #if defined _WIN32 || defined __CYGWIN__ #define HELPER_DLL_IMPORT __declspec(dllimport) #define HELPER_DLL_EXPORT __declspec(dllexport) #define HELPER_DLL_LOCAL#else #if __GNUC__ >= 4 #define HELPER_DLL_IMPORT __attribute__ ((visibility ("default"))) #define HELPER_DLL_EXPORT __attribute__ ((visibility ("default"))) #define HELPER_DLL_LOCAL __attribute__ ((visibility ("hidden"))) #else #define HELPER_DLL_IMPORT #define HELPER_DLL_EXPORT #define HELPER_DLL_LOCAL #endif#endif#ifdef EXPORT_ALL #define DLL_API HELPER_DLL_EXPORT#else #define DLL_API HELPER_DLL_IMPORT#endifDLL_API void hello();void hello(int i); Use command nm -C -D libdynlib.so to list all exported symbols: w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable0000000000001199 T hello() U std::ostream::operator<<(std::ostream& (*)(std::ostream&))@GLIBCXX_3.4 U std::ostream::operator<<(int)@GLIBCXX_3.4 we can see that void hello(int i) is not exported. How to link libraries Use target_link_libraries to link static and shared libraries for a single target: cmake_minimum_required(VERSION 3.12)project(HelloWorld VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)file(GLOB SOURCES "*.cpp" "utils/*.cpp")set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/lib)add_executable(helloworld ${SOURCES})target_include_directories(helloworld PRIVATE includes)target_link_libraries(helloworld PRIVATE stalib dynlib) or link_libraries which affects all targets created later in the current directory and subdirectories. How to add library seaching paths It’s also common to add library seaching paths. CMake uses link_directories and target_link_directories (available in version 3.13) to add library paths, for targets created later and a single target, respectively. cmake_minimum_required(VERSION 3.12)project(HelloWorld VERSION 1.0 LANGUAGES C CXX)set(CMAKE_CXX_STANDARD 11)file(GLOB SOURCES "*.cpp" "utils/*.cpp")set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/lib)add_executable(helloworld ${SOURCES})target_include_directories(helloworld PRIVATE includes)target_link_directories(helloworld PRIVATE ${CMAKE_SOURCE_DIR}/lib)target_link_libraries(helloworld PRIVATE stalib dynlib) These library searching paths can be found in a preset variable CMAKE_LIBRARY_PATH. How to compile multiple targets on a single CMakeLists add_subdirectory searches available CMakeLists on the subdirectory so that it can be compiled on the top-level called CMakeLists: cmake_minimum_required(VERSION 3.13)project(HelloWorld VERSION 1.0 LANGUAGES C CXX)add_subdirectory(dynlib)add_subdirectory(stalib)set(CMAKE_CXX_STANDARD 11)file(GLOB SOURCES "*.cpp" "utils/*.cpp")set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/lib)add_executable(helloworld ${SOURCES})target_include_directories(helloworld PRIVATE includes)target_link_directories(helloworld PRIVATE ${CMAKE_SOURCE_DIR}/lib)target_link_libraries(helloworld PRIVATE stalib dynlib) This CMakeLists firstly builds two libraries and then build the executable object helloworld, which links these libraries. How to compile a CUDA program CMake automatically triggers nvcc for compiling files with the .cu extension. I would compile a cuda kernel funtion into dynlib and call it in the main program. To do that, I created a new dyncuda.cu file: #include "dynlib.h"#include <iostream>#define CHECK_CUDA_ERROR(val) check((val), #val, __FILE__, __LINE__)void check(cudaError_t err, const char *const func, const char *const file, const int line){ if (err != cudaSuccess) { std::cerr << "CUDA Runtime Error at: " << file << ":" << line << std::endl; std::cerr << cudaGetErrorString(err) << " " << func << std::endl; // We don't exit when we encounter CUDA errors in this example. // std::exit(EXIT_FAILURE); }}#define CHECK_LAST_CUDA_ERROR() checkLast(__FILE__, __LINE__)void checkLast(const char *const file, const int line){ cudaError_t const err{cudaGetLastError()}; if (err != cudaSuccess) { std::cerr << "CUDA Runtime Error at: " << file << ":" << line << std::endl; std::cerr << cudaGetErrorString(err) << std::endl; // We don't exit when we encounter CUDA errors in this example. // std::exit(EXIT_FAILURE); }}__global__ void cudahello(){ printf("Hello cuda\n");}void cudaHelloLaunch(){ cudahello<<<1, 1>>>(); CHECK_CUDA_ERROR(cudaDeviceSynchronize()); CHECK_LAST_CUDA_ERROR();} and its header looks like: __global__ void cudahello();DLL_API void cudaHelloLaunch(); In this way, the cuda kernel function would be launched by a cpp wrapper function exported to other programs. CMakeLists compiling the CUDA program looks like this: cmake_minimum_required(VERSION 3.13)project(dynlib VERSION 1.0 LANGUAGES C CXX CUDA)set(CMAKE_CXX_STANDARD 11)if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES) set(CMAKE_CUDA_ARCHITECTURES 61;70)endif()set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/lib)aux_source_directory(. SOURCES)add_library(dynlib SHARED ${SOURCES})target_compile_definitions(dynlib PRIVATE EXPORT_DLL)target_include_directories(dynlib PUBLIC /usr/local/cuda/include)target_compile_options(dynlib PRIVATE -fvisibility=hidden -fvisibility-inlines-hidden)target_link_directories(dynlib PUBLIC /usr/local/cuda/lib64)target_link_libraries(dynlib PRIVATE cudart) which specifies CUDA architectures with CMAKE_CUDA_ARCHITECTURES. CMake would automatically search for cuda headers and libraries if they are installed on system paths. An old way to trigger CUDA compiling behavior is to use find_package(CUDA).
当今科学计算越来越复杂,各种模型越来越大,使用SVM做分类的时代早已一去不复返,通过GPU加速应用的重要性不言而喻。我在MRI领域工作的这几年,经常碰高度复杂的MRI应用,Matlab计算时间在几小时到几天不等,即使高度优化的C++的程序也需要10至30分钟左右。这些应用的CPU利用率已然接近100%,因此通过GPU进一步优化程序成为最可行的手段。 GPU市场依然是NVIDIA一家独大,AMD想要与之抗衡还需要时间。尤其在生态方面,NVIDIA基于CUDA的护城河太深,以至于科学计算大部分都使用N家的GPU。如果你已经拥有N家的GPU,恐怕使用CUDA编写核函数加速应用仍然是最好的选择。 在开始之前,介绍下我的基本配置和驱动情况: CPU: AMD Ryzen5 1600 RAM: 8G x 4 GPU: NVIDIA GTX 1080 x 2 (Driver: 550.67 CUDA: 12.4) System: Manjaro Linux 基本编程模型 CUDA为GPU的硬件结构做了和CPU类似的抽象。以我的CPU配置为例,从硬件抽象来说,Ryzen5 1600有8个核心,每个核心有2个线程,因此可以同时运行16个线程。从软件抽象来说,我们则可以创建上百个逻辑线程,每一时刻最多有16个线程同时交给CPU运行,线程间的调度由操作系统负责,保证所有线程都有执行的可能性。 CUDA的GPU编程模型和CPU类似,thread是模型中的最小运行单位。从软件抽象来说,CUDA划分了grid、block和thread三个不同的层级,block包含一组固定数目的thread,而grid包含一组固定数目的block。比如以64个threads组成一个block,以10个blocks组成一个grid,那么这个grid实际上有10*64=640个thread。每个thread都会基于其所在的block有一个唯一的索引号(从0开始索引),而每个block都会基于所在的grid同样有一个唯一的索引号(从0开始索引)。因此每个thread在grid中的全局索引号是可以唯一确定下来的。block和grid总共可以有xyz三个维度(可以把它们想象成立方体)。 这种grid、block、thread的设计主要是为了耦合科学计算,因为科学计算的本质是去操作向量、矩阵等高维度的数组。比如我们希望相加两个32×3232 \times 3232×32的矩阵A,B\mathbf{A},\mathbf{B}A,B,那么可以设计为1个2维block,每个block在xy方向各含有32个threads,每个thread只负责一次简单标量加法Ci,j=Ai,j+Bi,j\mathbf{C}_{i,j} = \mathbf{A}_{i,j}+\mathbf{B}_{i,j}Ci,j=Ai,j+Bi,j。理想情况下,我们可以让GPU同时执行这32×3232 \times 3232×32个线程,在一次运算周期内完成矩阵加法的操作(相比让CPU每个线程执行标量加法的模型,我们总共需要循环32∗32/16=6432 * 32 / 16=6432∗32/16=64次运算周期来覆盖所有的元素,所以说多就是快,多就是好)。 GPU在硬件上的抽象则没有CPU那么好懂,有几个NVIDIA的基本概念需要澄清。streaming multiprocessor(SM) 类比于CPU的核心,是互相独立的组件;cuda core是物理上运行1个thread的计算单元,类比于CPU的线程;一个SM可以包含很多的cuda cores(类比于CPU核心有两个线程)。与CPU不同的地方在于,GPU对SM里的cuda cores做了进一步的划分,每32个cuda cores视作一个warp,它是启动thread计算的最小单元。需要16个threads来做计算,GPU会启动一个warp(32个threads)。需要48个threads?GPU会启动两个warps(64个threads)。多出来的threads会执行同样的计算,只不过运行的结果会被丢弃掉。 实际运行时,一个或多个blocks由GPU负责调度交给一个SM去运行,硬件会将block中的threads按32划分成warps并执行。一个SM能同时执行多少个blocks取决于程序的设计方式和对资源的占用多少,这也是CUDA优化的重点之一,即尽量让SM的利用率变高。比如GTX 1080总共有20个SMs,每个SM包含128个cuda cores,因此每个SM可以最多同时运行4个warps。如果我的程序需要40个blocks,每个block包含48个threads,并且每个thread占用的资源都很少,理论上来说,GTX 1080可以同时在一个SM上启动2个block的计算,所有的SMs都能有效利用,只不过每个SM里有32个threads是在做无用的计算。相比之下,RTX 4080有76个SMs,因此可以同时处理更多的blocks。 CPU和GPU有独立的内存空间,因此数据必须先从CPU移动到GPU上,处理结束后再从GPU移动到CPU上。类比于熟知的DRAM、L3 cache、L2 cache和L1 cache,GPU分为global memory、local memory、L2 cache、shared memory、L1 cache和registers。最主要的区别当然是存取速度有着数量级上的提升。global memory是最慢的,但是所有thread都能访问,我们使用cudaMalloc分配内存也都在global memory上,host(CPU)也能够访问global memory。L2 cache充当global memory和SMs的桥梁,所有的SMs都能访问。shared memory只有同一个block中的thread能够访问,它的速度大概是global memory的100倍以上。L1 cache和shared memory也类似。registers则是每个thread独有的,速度最快。local memory则有点特别,它是在register放不下时候的缓存区,本质上就是global memory,每个thread只能访问自己的部分。 NVCC编译 CUDA的核函数使用NVIDIA自定义的C++扩展语法编写,不属于C++标准的一部分,因此常见的编译器是不支持CUDA代码的编译,必须使用NVIDIA的nvcc编译器。nvcc编译器做的事情类似于对常规编译器的扩展: 先将CUDA核函数与常规C++代码分离开 将核函数编译成汇编(PTX)或者二进制(cubin) 用CUDA runtime函数替换原始代码中的核函数调用部分 调用常规C++编译器处理剩下的编译工作 最基本的nvcc命令如下所示,将vecadd.cu编译成可执行二进制文件vecadd: nvcc vecadd.cu -gencode arch=compute_50,code=sm_50 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=\"compute_70,sm_70\" -o vecadd 其中arch用来控制基于何种虚拟架构生成汇编代码,而code控制基于汇编如何生成和优化目标架构的二进制文件。所有的虚拟架构都以compute_xy表示,而真实架构都以sm_xy表示。我们主要重点关注code的部分即可。上述命令的意思是生成目标架构为sm_50的二进制,然后生成目标架构为sm_61的二进制,最后生成目标架构为sm_70的二进制和虚拟架构为compute_70的PTX代码。这些二进制和代码都被封装在同一个文件中,在实际执行时由CUDA运行时负责选择最接近的架构。 搞这么多复杂的架构主要还是为了兼容性的问题。NVIDIA有以下规定: code=sm_xy生成的二进制只能在符合compute capability的x.y设备上运行,跨版本的二进制文件是不允许的 基于arch=compute_ab生成的PTX可以编译更高版本的二进制文件code=sm_xy 基于arch=compute_xy生成的PTX不可以编译低版本的二进制文件code=sm_ab 如果code=compute_xy,则可以依赖JIT编译在compute capability的x.z设备上运行,其中z>=y 如果code=compute_ab,则可以依赖JIT编译在compute capability的x.y设备上运行,其中x.y的版本要大于a.b版本 基于arch=compute_ab生成的PTX不能编译高版本code=compute_xy,因为此时已经没办法附加更高版本的PTX代码进入最终文件,同理高版本arch=compute_xy也没有办法编译低版本code=compute_ab GTX 1080的Compute Capability只到6.1,只支持最基本硬件特性。因此,上述编译的可执行文件可以在我的GPU上运行(因为存在和我的GPU架构匹配的二进制)。根据规则,下面命令编译的文件也可以通过JIT在我的GPU上运行,尽管二进制的目标架构是sm_70: nvcc vecadd.cu -gencode arch=compute_61,code=\"compute_61,sm_70\" -o vecadd 在实施最简单的demo前,首先让我们看下GTX 1080的基本参数: SMs: 20 Warps(CUDA Cores)/SM: 4(128) Global Memory(GB): 8 L2 Cache(KB): 2048 Shared Memory(KB)/SM: 96 L1 Cache(KB)/SM: 48 矢量加法Demo 一个最简单的矢量加法demo如下: #include <iostream>#include <cuda_runtime.h>#define N 10000__global__ void addvec(int *a, int *b, int *c){ int iThreadIdx = threadIdx.x + blockIdx.x * blockDim.x; int iStride = blockDim.x * gridDim.x; for (int i = iThreadIdx; i < N; i += iStride) { c[i] = a[i] + b[i]; }}void initArray(int *p, int value){ for (int i = 0; i < N; ++i) { p[i] = value; }}int main(){ int *a, *b, *c; a = (int *)malloc(N * sizeof(int)); b = (int *)malloc(N * sizeof(int)); c = (int *)malloc(N * sizeof(int)); initArray(a, 1); initArray(b, 2); initArray(c, 0); int *da, *db, *dc; cudaMalloc(&da, N * sizeof(int)); cudaMalloc(&db, N * sizeof(int)); cudaMalloc(&dc, N * sizeof(int)); cudaMemcpy(da, a, N * sizeof(int), cudaMemcpyHostToDevice); cudaMemcpy(db, b, N * sizeof(int), cudaMemcpyHostToDevice); int iThreads = 64; int iBlocks = (N - 1) / iThreads + 1; std::cout << "Threads: " << iThreads << " Blocks: " << iBlocks << std::endl; addvec<<<iBlocks, iThreads>>>(da, db, dc); cudaDeviceSynchronize(); cudaMemcpy(c, dc, N * sizeof(int), cudaMemcpyDeviceToHost); for (int i = 0; i < 10; ++i) { std::cout << c[i] << " "; } std::cout << std::endl; cudaFree(da); cudaFree(db); cudaFree(dc); free(a); free(b); free(c); return 0;} 带有__global__的返回值为空的函数addvec就是在GPU上每个thread都运行的核函数,这里的threadIdx、blockIdx、blockDim和gridDim都是CUDA自己定义的内部变量,是uint的三元组,分别提供前面编程模型中提到的thread、block的索引以及block中thread的数量、block的数量。核函数就是利用这些信息让每个thread做相同的标量加法运算,但作用于不同的元素位置。此外,因为我们的元素数目要大于实际可用的cuda cores,所以要利用循环分批次处理数据。 main函数中分别在host(CPU)和device(GPU)上申请内存,并将CPU的内容复制到GPU内存中。addvec<<<iBlocks, iThreads>>>(da, db, dc)调用核函数并立即返回(异步执行),通过cudaDeviceSynchronize来等待所有thread计算完毕。最后将计算完成的内容copy回host的内存上。考虑到GTX 1080的架构,每个block我选择64个threads(2个warps),这样理想情况下每个SM可以塞进两个blocks的计算。 用Nsight分析程序性能 NVIDIA提供Nsight Systems工具来帮助分析程序性能,执行以下命令: nsys profile --stats=true ./vecadd 会对我们的vecadd可执行文件做详细的分析,输出各种运行时间、调用次数的报告。需要注意的是,nsys执行过程相当漫长,terminal上会显示“The target application terminated. One or more process it created re-parented. Waiting for termination of re-parented processes.”的提示,我一开始还以为是程序出错,其实是正常现象,只要耐心等待就好。执行完毕后,会在当前文件夹下生成.nsys-rep和.sqlite文件,以便后续追踪分析。 Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name -------- --------------- --------- -------- -------- -------- -------- ----------- --------------------------- 100.0 2881 1 2881.0 2881.0 2881 2881 0.0 addvec(int *, int *, int *) 在上例中,我们的addvec核函数显示总运行时间是2.8us;如果稍作修改,比如iThreads=48,运行时间则为: Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name -------- --------------- --------- -------- -------- -------- -------- ----------- --------------------------- 100.0 3007 1 3007.0 3007.0 3007 3007 0.0 addvec(int *, int *, int *) 可以看出时间稍长了一些,符合我们的预期,因为有些warp里的thread计算被浪费掉了。
The Bregman method is an iterative method to solve convex optimization problems with equality constraints. The linearized Bregman method approximates subproblems in Bregman iterations with a linearized version. The split Bregman method decouples variables in the original problem to make subproblems in Bregman iterations much easier to solve, utilizing both the advantages of alternating minimization and Bregman method. In this post, I’ll use the denominator layout for gradients. What is Bregman Divergence The Bregman divergence (or Bregman distance) of a convex function F(x)F(\mathbf{x})F(x) at point y\mathbf{y}y is defined as the difference between itself and its 1st-order Talyor expansion: DFp(x,y)=F(x)−F(y)−pT(x−y)\begin{equation} \begin{split} D_F^{\mathbf{p}}(\mathbf{x}, \mathbf{y}) = F(\mathbf{x}) - F(\mathbf{y}) - \mathbf{p}^T(\mathbf{x} - \mathbf{y}) \end{split} \end{equation} DFp(x,y)=F(x)−F(y)−pT(x−y) where p∈∂F(y)\mathbf{p} \in \partial F(\mathbf{y})p∈∂F(y) is a subgradient. Here I list some of its properties and more can be found on wiki: DFp(x,y)≥0D_F^{\mathbf{p}}(\mathbf{x}, \mathbf{y}) \ge 0DFp(x,y)≥0 DFp(y,y)=0D_F^{\mathbf{p}}(\mathbf{y}, \mathbf{y}) = 0DFp(y,y)=0 DFp(x,y)+DFq(y,z)−DFq(x,z)=(p−q)T(y−x)D_F^{\mathbf{p}}(\mathbf{x}, \mathbf{y}) + D_F^{\mathbf{q}}(\mathbf{y}, \mathbf{z}) - D_F^{\mathbf{q}}(\mathbf{x}, \mathbf{z}) = (\mathbf{p}-\mathbf{q})^T(\mathbf{y}-\mathbf{x})DFp(x,y)+DFq(y,z)−DFq(x,z)=(p−q)T(y−x) Bregman Method Consider the following constrained problem: arg minx F(x)s.t.G(x)=0\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ &F(\mathbf{x}) \\ &s.t. G(\mathbf{x}) = 0 \end{split} \end{equation} xargmin F(x)s.t.G(x)=0 where F(x)F(\mathbf{x})F(x) and G(x)G(\mathbf{x})G(x) are convex, G(x)G(\mathbf{x})G(x) is differentiable, and minxG(x)=0\min_{\mathbf{x}} G(\mathbf{x}) = 0minxG(x)=0. An intuitive way to solve the above problem is to transform it into an unconstrained form: arg minx F(x)+λG(x)\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ F(\mathbf{x}) + \lambda G(\mathbf{x}) \end{split} \end{equation} xargmin F(x)+λG(x) where λ→∞\lambda \rightarrow \inftyλ→∞ to enforce that G(x)≈0G(\mathbf{x}) \approx 0G(x)≈0. However, for many problems, a large λ\lambdaλ would make the problem numerically unstable. It’s also difficult to determine how “large” is large for λ\lambdaλ. The Bregman method turns Equation (2) into a sequence of unconstrained problems. Rather than choosing a large λ\lambdaλ, the Bregman method approaches the equality with arbitrary precision by solving subproblems iteratively, with a fixed but smaller λ\lambdaλ. Instead of solving Equation (3), the Bregman method solves the following problem: xk+1=arg minx DFpk(x,xk)+λG(x)=arg minx F(x)−pkTx+λG(x)\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ D_F^{\mathbf{p}_k}(\mathbf{x}, \mathbf{x}_k) + \lambda G(\mathbf{x})\\ &= \argmin_{\mathbf{x}}\ F(\mathbf{x}) - \mathbf{p}_k^T\mathbf{x} + \lambda G(\mathbf{x}) \end{split} \end{equation} xk+1=xargmin DFpk(x,xk)+λG(x)=xargmin F(x)−pkTx+λG(x) By the subgradient optimality condition, we know that: 0∈∂F(xk+1)−pk+λ∇G(xk+1)\begin{equation} \begin{split} 0 \in \partial F(\mathbf{x}_{k+1}) - \mathbf{p}_k + \lambda \nabla G(\mathbf{x}_{k+1})\\ \end{split} \end{equation} 0∈∂F(xk+1)−pk+λ∇G(xk+1) Let pk+1∈∂F(xk+1)\mathbf{p}_{k+1} \in \partial F(\mathbf{x}_{k+1})pk+1∈∂F(xk+1), the Bregman iteration is going to like: xk+1=arg minx F(x)−pkTx+λG(x)pk+1=pk−λ∇G(xk+1)\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ F(\mathbf{x}) - \mathbf{p}_k^T\mathbf{x} + \lambda G(\mathbf{x})\\ \mathbf{p}_{k+1} &= \mathbf{p}_k - \lambda \nabla G(\mathbf{x}_{k+1}) \end{split} \end{equation} xk+1pk+1=xargmin F(x)−pkTx+λG(x)=pk−λ∇G(xk+1) where x0=arg minxG(x)\mathbf{x}_0 = \argmin_{\mathbf{x}} G(\mathbf{x})x0=argminxG(x) and pk=0\mathbf{p}_k = 0pk=0. It can prove that as k→∞k \rightarrow \inftyk→∞, G(xk)→0G(\mathbf{x}_k) \rightarrow 0G(xk)→0, thus the minimization problem in Equation (6) approximates the original problem in Equation (2). If G(xk)=12∥Ax−b∥22G(\mathbf{x}_k) = \frac{1}{2}\|\mathbf{A}\mathbf{x} -\mathbf{b} \|_2^2G(xk)=21∥Ax−b∥22, then ∇G(xk+1)=AT(Axk+1−b)\nabla G(\mathbf{x}_{k+1}) = \mathbf{A}^T\left(\mathbf{A}\mathbf{x}_{k+1} - \mathbf{b}\right)∇G(xk+1)=AT(Axk+1−b). Equation (6) can be transformed into a simpler form: xk+1=arg minx F(x)+λ2∥Ax−bk∥22bk+1=bk+b−Axk\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ F(\mathbf{x}) + \frac{\lambda}{2} \|\mathbf{A}\mathbf{x} -\mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k + \mathbf{b} - \mathbf{A}\mathbf{x}_k \end{split} \end{equation} xk+1bk+1=xargmin F(x)+2λ∥Ax−bk∥22=bk+b−Axk which simply adds the error in the constraint back to the right-hand side. Equation (7) can be proved by merging pkTx\mathbf{p}_k^T\mathbf{x}pkTx into G(x)G(\mathbf{x})G(x) and substituting pk\mathbf{p}_kpk with its explict form pk=p0−λ∑i=1k∇G(xi)\mathbf{p}_k = \mathbf{p}_0 - \lambda \sum_{i=1}^{k} \nabla G(\mathbf{x}_i)pk=p0−λ∑i=1k∇G(xi). Linearized Bregman Method The Bregman method doesn’t reduce the complexity of solving Equation (3), especially when F(x)F(\mathbf{x})F(x) is not differentiable (e.g. l1l_1l1 norm) and not separable (its elements are coupled with each other). Suppose that F(x)F(\mathbf{x})F(x) is separable, it would be easier to solve the problem if we could separate elements in G(x)G(\mathbf{x})G(x) either. That is what the linearized Bregman method does. It linearizes G(x)G(\mathbf{x})G(x) with its 1st-order Talyor expansion at xk\mathbf{x}_kxk: xk+1=arg minx DFpk(x,xk)+λG(xk)+λ∇G(xk)T(x−xk)+λμ2∥x−xk∥22=arg minx DFpk(x,xk)+λμ2∥x−(xk−1μ∇G(xk))∥22\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ D_F^{\mathbf{p}_k}(\mathbf{x}, \mathbf{x}_k)\\ &+ \lambda G(\mathbf{x}_k) + \lambda\nabla G(\mathbf{x}_k)^T(\mathbf{x} - \mathbf{x}_k) + \frac{\lambda\mu}{2} \|\mathbf{x} -\mathbf{x}_k\|_2^2\\ &= \argmin_{\mathbf{x}}\ D_F^{\mathbf{p}_k}(\mathbf{x}, \mathbf{x}_k) \\&+ \frac{\lambda\mu}{2} \|\mathbf{x} -\left(\mathbf{x}_k - \frac{1}{\mu} \nabla G(\mathbf{x}_k) \right)\|_2^2\\ \end{split} \end{equation} xk+1=xargmin DFpk(x,xk)+λG(xk)+λ∇G(xk)T(x−xk)+2λμ∥x−xk∥22=xargmin DFpk(x,xk)+2λμ∥x−(xk−μ1∇G(xk))∥22 where μ2∥x−xk∥22\frac{\mu}{2} \|\mathbf{x} -\mathbf{x}_k\|_2^22μ∥x−xk∥22 is a penalty term since this approximation only works when x\mathbf{x}x nears xk\mathbf{x}_kxk. Note that ∇G(xk)\nabla G(\mathbf{x}_k)∇G(xk) is merged into the l2l_2l2 norm as the same Equation (6) in post RPCA. Let the l2l_2l2 norm term as a new function, then we can derive pk+1\mathbf{p}_{k+1}pk+1 by Equation (6): pk+1=pk−λμ(x−xk+1μ∇G(xk))\begin{equation} \begin{split} \mathbf{p}_{k+1} = \mathbf{p}_{k} -\lambda\mu (\mathbf{x} - \mathbf{x}_k + \frac{1}{\mu} \nabla G(\mathbf{x}_k)) \end{split} \end{equation} pk+1=pk−λμ(x−xk+μ1∇G(xk)) Equation (8) is much easier to solve since all elements of x\mathbf{x}x are separable. Split Bregman Method The split Bregman method is used when F(x)F(\mathbf{x})F(x) is not obviously separable. The key idea of the split Bregman is to decouple l1l_1l1 and l2l_2l2 terms by equality constraints. Consider the following optimization problem: arg minx ∥F(x)∥1+λG(x)\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ \|F(\mathbf{x})\|_1 + \lambda G(\mathbf{x}) \end{split} \end{equation} xargmin ∥F(x)∥1+λG(x) where F(x)F(\mathbf{x})F(x) and G(x)G(\mathbf{x})G(x) are both convex and differentiable. Let d=F(x)\mathbf{d} = F(\mathbf{x})d=F(x), then we transform the original problem into a constrained problem: arg minx,d ∥d∥1+λG(x)s.t. F(x)−d=0\begin{equation} \begin{split} \argmin_{\mathbf{x}, \mathbf{d}}\ &\|\mathbf{d}\|_1 + \lambda G(\mathbf{x})\\ &s.t.\ F(\mathbf{x}) - \mathbf{d} = 0 \end{split} \end{equation} x,dargmin ∥d∥1+λG(x)s.t. F(x)−d=0 which is a joint optimization over x\mathbf{x}x and b\mathbf{b}b. This can be further transformed into an unconstrained problem with a penalty as the Bregman method did: arg minx,d ∥d∥1+λG(x)+μ2∥F(x)−d∥22\begin{equation} \begin{split} \argmin_{\mathbf{x}, \mathbf{d}}\ &\|\mathbf{d}\|_1 + \lambda G(\mathbf{x}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}\|_2^2\\ \end{split} \end{equation} x,dargmin ∥d∥1+λG(x)+2μ∥F(x)−d∥22 Let H(x,d)=∥d∥1+λG(x)H(\mathbf{x}, \mathbf{d}) = \|\mathbf{d}\|_1 + \lambda G(\mathbf{x})H(x,d)=∥d∥1+λG(x) and A(x,d)=F(x)−dA(\mathbf{x}, \mathbf{d}) = F(\mathbf{x}) - \mathbf{d}A(x,d)=F(x)−d (both separable for x\mathbf{x}x and d\mathbf{d}d), by Equation (6), we have: xk+1,dk+1=arg minx,d H(x,d)−px,kTx−pd,kTd+μ2∥A(x,d)∥22=arg minx,d H(x,d)−px,kTx−pd,kTd+μ2∥F(x)−d∥22px,k+1=px,k−μ(∇xA(xk+1,dk+1))TA(xk+1,dk+1)=px,k−μ(∇F(xk+1))T(F(xk+1)−dk+1)pd,k+1=pd,k−μ(∇dA(xk+1,dk+1))TA(xk+1,dk+1)=pd,k−μ(dk+1−F(xk+1))\begin{equation} \begin{split} \mathbf{x}_{k+1},\mathbf{d}_{k+1} &= \argmin_{\mathbf{x}, \mathbf{d}}\ H(\mathbf{x}, \mathbf{d}) - \mathbf{p}_{\mathbf{x},k}^T\mathbf{x} - \mathbf{p}_{\mathbf{d},k}^T\mathbf{d} \\&+ \frac{\mu}{2}\|A(\mathbf{x}, \mathbf{d})\|_2^2\\ &= \argmin_{\mathbf{x}, \mathbf{d}}\ H(\mathbf{x}, \mathbf{d}) - \mathbf{p}_{\mathbf{x},k}^T\mathbf{x} - \mathbf{p}_{\mathbf{d},k}^T\mathbf{d} \\&+ \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}\|_2^2\\ \mathbf{p}_{\mathbf{x},k+1} &= \mathbf{p}_{\mathbf{x},k} - \mu \left(\nabla_{\mathbf{x}} A(\mathbf{x}_{k+1}, \mathbf{d}_{k+1})\right)^T A(\mathbf{x}_{k+1}, \mathbf{d}_{k+1})\\ &= \mathbf{p}_{\mathbf{x},k} - \mu \left(\nabla F(\mathbf{x}_{k+1})\right)^T(F(\mathbf{x}_{k+1}) - \mathbf{d}_{k+1})\\ \mathbf{p}_{\mathbf{d},k+1} &= \mathbf{p}_{\mathbf{d},k} - \mu \left(\nabla_{\mathbf{d}} A(\mathbf{x}_{k+1}, \mathbf{d}_{k+1})\right)^T A(\mathbf{x}_{k+1}, \mathbf{d}_{k+1})\\ &= \mathbf{p}_{\mathbf{d},k} - \mu (\mathbf{d}_{k+1}- F(\mathbf{x}_{k+1}))\\ \end{split} \end{equation} xk+1,dk+1px,k+1pd,k+1=x,dargmin H(x,d)−px,kTx−pd,kTd+2μ∥A(x,d)∥22=x,dargmin H(x,d)−px,kTx−pd,kTd+2μ∥F(x)−d∥22=px,k−μ(∇xA(xk+1,dk+1))TA(xk+1,dk+1)=px,k−μ(∇F(xk+1))T(F(xk+1)−dk+1)=pd,k−μ(∇dA(xk+1,dk+1))TA(xk+1,dk+1)=pd,k−μ(dk+1−F(xk+1)) If F(x)F(\mathbf{x})F(x) is linear, then the above iteration can be simplified as that in Equation (7): xk+1,dk+1=arg minx,d H(x,d)+μ2∥F(x)−d−bk∥22bk+1=bk−F(xk+1)+dk+1\begin{equation} \begin{split} \mathbf{x}_{k+1},\mathbf{d}_{k+1} &= \argmin_{\mathbf{x}, \mathbf{d}}\ H(\mathbf{x}, \mathbf{d}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - F(\mathbf{x}_{k+1}) + \mathbf{d}_{k+1} \end{split} \end{equation} xk+1,dk+1bk+1=x,dargmin H(x,d)+2μ∥F(x)−d−bk∥22=bk−F(xk+1)+dk+1 Equation (14) is still a joint minimizaiton problem, which can be solved by (alternating minimization or coordinate descent) minimizing with respect to x\mathbf{x}x and d\mathbf{d}d, separately: xk+1=arg minx λG(x)+μ2∥F(x)−dk−bk∥22dk+1=arg mind ∥d∥1+μ2∥F(xk+1)−d−bk∥22bk+1=bk−F(xk+1)+dk+1\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ \lambda G(\mathbf{x}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}_k - \mathbf{b}_k \|_2^2\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}}\ \|\mathbf{d}\|_1 + \frac{\mu}{2}\|F(\mathbf{x}_{k+1}) - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - F(\mathbf{x}_{k+1}) + \mathbf{d}_{k+1} \end{split} \end{equation} xk+1dk+1bk+1=xargmin λG(x)+2μ∥F(x)−dk−bk∥22=dargmin ∥d∥1+2μ∥F(xk+1)−d−bk∥22=bk−F(xk+1)+dk+1 The first subproblem can be solved by setting the gradient to zero. The second subproblem has an explicit solution with soft-thresholding operator in this post. Note that these two subproblem are just one iteration of the alternating minimization method. To achieve the same convergence rate of Equation (14), it requires more iterations (which are called inner loops and the Bregman iterations are outer loops). For most applications, one iteration is only needed.
Total variation (TV) denoising, also known as TV regularization or TV filtering, is a powerful technique widely used in various fields, including medical imaging, computer vision, etc. It removes noises while preserving most important structural features. The first image of black hole, captured by Event Horizon Telescope (EHT), was processed and revealed with this technique in 2019. The concept was proposed in 1992 by Rudin, Osher and Fatemi, known as the ROF model which is a continuous form of TV denoising problem. What is Total Variation The term, total variation, refers to a mathematical concept that is a little hard to understand for me. Nonrigorously, the TV of a function u(x)u(x)u(x) is the integral of its derivative magnitude within its bouned domain Ω\OmegaΩ: ∥u∥TV=∫Ω∣∇u(x)∣dx\begin{equation} \begin{split} \|u\|_{TV} = \int_{\Omega} |\nabla u(x)| dx \end{split} \end{equation} ∥u∥TV=∫Ω∣∇u(x)∣dx For practical use, a discretized version is more favorable. There are two common discretized TVs in literatures, the isotropic and anisotropic TVs. Suppose that u(X)u(\mathcal{X})u(X) is a function of an order-NNN tensor grid X∈RI1×I2×⋯IN\mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots I_N}X∈RI1×I2×⋯IN, the isotropic TV is defined as: ∥u∥TV=∑i1∑i2⋯∑iN(∇I1u)i1i2⋯iN2+⋯+(∇INu)i1i2⋯iN2\begin{equation} \begin{split} \|u\|_{TV} = \sum_{i_1} \sum_{i_2} \cdots \sum_{i_N} \sqrt{\left(\nabla_{I_1} u \right)_{i_1 i_2 \cdots i_N}^2 + \cdots + \left(\nabla_{I_N} u \right)_{i_1 i_2 \cdots i_N}^2} \end{split} \end{equation} ∥u∥TV=i1∑i2∑⋯iN∑(∇I1u)i1i2⋯iN2+⋯+(∇INu)i1i2⋯iN2 where ∇Iku\nabla_{I_k} u∇Iku is the derivatives along the IkI_kIk dimension. The isotropic TV is invariant to rotation of the domain that is if you rotate the image arbitrarily, the isotropic TV would not change. The anisotropic TV is defined as: ∥u∥TV=∑i1∑i2⋯∑iN∣(∇I1u)i1i2⋯iN∣+⋯+∣(∇INu)i1i2⋯iN∣\begin{equation} \begin{split} \|u\|_{TV} = \sum_{i_1} \sum_{i_2} \cdots \sum_{i_N} |\left(\nabla_{I_1} u \right)_{i_1 i_2 \cdots i_N}| + \cdots + |\left(\nabla_{I_N} u \right)_{i_1 i_2 \cdots i_N}| \end{split} \end{equation} ∥u∥TV=i1∑i2∑⋯iN∑∣(∇I1u)i1i2⋯iN∣+⋯+∣(∇INu)i1i2⋯iN∣ which is not rotation invariant. There is no difference between the isotropic and anisotropic TVs for 1D signals. Discrete Derivatives There are many ways to define derivatives for discretized signals. For better understanding, I will use a 1-d signal u(x)u(x)u(x) (and its discrete form ui,i=0,1,2,⋯ ,N−1u_i,i=0,1,2,\cdots,N-1ui,i=0,1,2,⋯,N−1) as an example to illustrate all these ways. Let ∇xu\nabla_{x} u∇xu denote the derivatives of u(x)u(x)u(x) in the x direction, (∇x+u)i=ui+1−ui(\nabla_{x}^+ u)_i = u_{i+1} - u_i(∇x+u)i=ui+1−ui denote the forward difference, and (∇x−u)i=ui−ui−1(\nabla_{x}^- u)_i = u_{i} - u_{i-1}(∇x−u)i=ui−ui−1 denote the backward difference. A few definitions of derivatives are: one-sided difference, (∇xu)i2=(∇x+u)i2(\nabla_{x}u)^2_{i} = (\nabla_{x}^+ u)^2_i(∇xu)i2=(∇x+u)i2 central difference, (∇xu)i2=(((∇x+u)i+(∇x−u)i)/2)2(\nabla_{x}u)^2_{i} = (((\nabla_{x}^+ u)_i + (\nabla_{x}^- u)_i)/2)^2(∇xu)i2=(((∇x+u)i+(∇x−u)i)/2)2 geometric average, (∇xu)i2=((∇x+u)i2+(∇x−u)i2)/2(\nabla_{x}u)^2_{i} = ((\nabla_{x}^+ u)^2_i + (\nabla_{x}^- u)^2_i)/2(∇xu)i2=((∇x+u)i2+(∇x−u)i2)/2 minmod difference, (∇xu)i2=m((∇x+u)i,(∇x−u)i)(\nabla_{x}u)^2_{i} = m((\nabla_{x}^+ u)_i, (\nabla_{x}^- u)_i)(∇xu)i2=m((∇x+u)i,(∇x−u)i), where m(a,b)=(sign(a)+sign(b)2)min(∣a∣,∣b∣)m(a,b)=\left(\frac{sign(a) + sign(b)}{2}\right)min(|a|, |b|)m(a,b)=(2sign(a)+sign(b))min(∣a∣,∣b∣) upwind discretization, (∇xu)i2=(max((∇x+u)i,0)2+max((∇x−u)i,0)2)/2(\nabla_{x}u)^2_{i} = (max((\nabla_{x}^+ u)_i, 0)^2+max((\nabla_{x}^- u)_i, 0)^2)/2(∇xu)i2=(max((∇x+u)i,0)2+max((∇x−u)i,0)2)/2 The one-sided difference (or forward difference) may be the most common way to discretize the derivatives but it’s not symmetric. Although the central difference is symmetric, it is not able to reveal thin and small structures (considering a single point with one and others are zero, the central difference at this point would be zero which is counter-intuitive). The geometric average, minmod difference and upwind discretization are both symmectic and being able to reveal thin structures at the cost of nonlinearity. I will use one-sided difference in the following content. Since we are dealing with finite signals, special handling is needed on the boundaries. I have seen two common ways to do that in literatures. One is reflective extension and the other is circulant extension. The reflective extention assumes that the signals extend in a reflective way and the circulant extension assumes that the signals extend in a circulant way. For the reflective extension, the matricized differential operator and its adjoint operator are: D=[−11−11⋱⋱−110]D∗=−[1−11⋱⋱−11−10]=DT\begin{equation} \begin{split} \mathbf{D} &= \begin{bmatrix} -1&1&&&\\ &-1&1&&\\ & &\ddots&\ddots&\\ & & &-1&1\\ &&&&0 \end{bmatrix}\\ \mathbf{D}^{\ast} &= - \begin{bmatrix} 1&&&&\\ -1&1&&&\\ &\ddots&\ddots&&\\ & &-1&1&\\ &&&-1&0 \end{bmatrix} = \mathbf{D}^T\\ \end{split} \end{equation} DD∗=−11−11⋱⋱−110=−1−11⋱⋱−11−10=DT For the circulant extension, the matricized differential operator and its adjoint operator are: D=[−11−11⋱⋱−111−1]D∗=−[1−1−11⋱⋱−11−11]=DT\begin{equation} \begin{split} \mathbf{D} &= \begin{bmatrix} -1&1&&&\\ &-1&1&&\\ & &\ddots&\ddots&\\ & & &-1&1\\ 1&&&&-1 \end{bmatrix}\\ \mathbf{D}^{\ast} &= - \begin{bmatrix} 1&&&&-1\\ -1&1&&&\\ &\ddots&\ddots&&\\ & &-1&1&\\ &&&-1&1 \end{bmatrix} = \mathbf{D}^T\\ \end{split} \end{equation} DD∗=−111−11⋱⋱−11−1=−1−11⋱⋱−11−1−11=DT As we’ve seen, the adjoint operators of differential operators are the negative backward differences. In practice, we don’t need to construct explicit matrices for those operators as long as we know how to perform the corresponding operations. Total Variation Denoising A general TV denoising problem minimizes the following objective function: arg minu∥u∥TV+λ2∥A(u)−v∥2\begin{equation} \begin{split} \argmin_{u} \|u\|_{TV} + \frac{\lambda}{2} \|A(u) - v\|^2 \end{split} \end{equation} uargmin∥u∥TV+2λ∥A(u)−v∥2 where u,vu, vu,v are functions of a given tensor grid X\mathcal{X}X, AAA is a linear operator, and λ\lambdaλ controls how much smoothing is performed. 1D TV Let’s make Equation (6) more concise for the 1D case: arg minu∈RN∥Du∥1+λ2∥Au−v∥22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^N} \|\mathbf{D} \mathbf{u} \|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation} u∈RNargmin∥Du∥1+2λ∥Au−v∥22 where A∈RM×N\mathbf{A} \in \mathbb{R}^{M \times N}A∈RM×N and D\mathbf{D}D is the differential operator defined above. Iterative Clipping The iterative clipping method solves the 1D TV denoising problem by solving its dual form. A useful fact is that the absolute value ∣x∣|x|∣x∣ can be written as an optimization problem and the l1l_1l1 norm likewisely$: ∣x∣=max∣z∣≤1zx∥x∥1=max∣z∣≤1zTx\begin{equation} \begin{split} |x| &= \max_{|z| \le 1} zx \\ \|\mathbf{x}\|_1 &= \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{x} \end{split} \end{equation} ∣x∣∥x∥1=∣z∣≤1maxzx=∣z∣≤1maxzTx where ∣z∣≤1|\mathbf{z}| \le 1∣z∣≤1 denotes each element of z\mathbf{z}z is less than or equals to 1. Let F(u)=∥Du∥1+λ2∥Au−v∥22F(\mathbf{u}) = \|\mathbf{D} \mathbf{u} \|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2F(u)=∥Du∥1+2λ∥Au−v∥22, the objective function can be written as: F(u)=max∣z∣≤1zTDu+λ2∥Au−v∥22\begin{equation} \begin{split} F(\mathbf{u}) &= \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation} F(u)=∣z∣≤1maxzTDu+2λ∥Au−v∥22 thus the Equation (7) can be written as: argminumax∣z∣≤1zTDu+λ2∥Au−v∥22\begin{equation} \begin{split} \arg &\min_{\mathbf{u}} \max_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2}\|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation} argumin∣z∣≤1maxzTDu+2λ∥Au−v∥22 The minmax theorem is that if the function f(x,y)f(x, y)f(x,y) is concave in x and convex in y, then the following equlity holds: maxxminyf(x,y)=minymaxxf(x,y)\begin{equation} \begin{split} \max_{x} \min_{y} f(x, y) = \min_{y} \max_{x} f(x, y) \end{split} \end{equation} xmaxyminf(x,y)=yminxmaxf(x,y) then we can change the order of minmax in Equation (10): argmax∣z∣≤1minuzTDu+λ2∥Au−v∥22\begin{equation} \begin{split} \arg \max_{|\mathbf{z}| \le 1} \min_{\mathbf{u}} \mathbf{z}^T\mathbf{D} \mathbf{u} + \frac{\lambda}{2}\|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2\\ \end{split} \end{equation} arg∣z∣≤1maxuminzTDu+2λ∥Au−v∥22 which is a dual form of the original problem. The solution of the inner minimization problem is: uk+1=(ATA)†(ATv−1λDTzk)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\left(\mathbf{A}^T\mathbf{v} - \frac{1}{\lambda}\mathbf{D}^T\mathbf{z}_k\right) \end{split} \end{equation} uk+1=(ATA)†(ATv−λ1DTzk) Substituting Equation (13) back into Equation (12) gives: argmin∣z∣≤1zTD(ATA)†DTz−2λ(DA†v)Tz\begin{equation} \begin{split} \arg \min_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\mathbf{z} - 2\lambda \left(\mathbf{D}\mathbf{A}^{\dag}\mathbf{v}\right)^T\mathbf{z} \end{split} \end{equation} arg∣z∣≤1minzTD(ATA)†DTz−2λ(DA†v)Tz There are many ways to solve this quadratic form, one is by the majorization-minimization (MM) method. Given a function F(x)F(x)F(x), the MM method chooses an auxiliary function Gk(x)G_k(x)Gk(x) such that Gk(x)≥F(x), Gk(xk)=F(xk)G_k(x) \ge F(x),\ G_k(x_k) = F(x_k)Gk(x)≥F(x), Gk(xk)=F(xk), then solves xk+1=arg minxGk(x)x_{k+1}=\argmin_{x} G_k(x)xk+1=argminxGk(x). The sequence xkx_kxk converges to the minimizer of F(x)F(x)F(x) when F(x)F(x)F(x) is convex. We construct such a function Gk(z)G_k(\mathbf{z})Gk(z) by adding (z−zk)T(αI−D(ATA)†DT)(z−zk)\left(\mathbf{z} - \mathbf{z}_k\right)^T\left(\alpha\mathbf{I} - \mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\right)(\mathbf{z} - \mathbf{z}_k)(z−zk)T(αI−D(ATA)†DT)(z−zk) to Equation (14): Gk(z)=zTD(ATA)†DTz−2λ(DA†v)Tz+(z−zk)T(αI−D(ATA)†DT)(z−zk)\begin{equation} \begin{split} G_k(\mathbf{z}) &= \mathbf{z}^T\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\mathbf{z} - 2\lambda \left(\mathbf{D}\mathbf{A}^{\dag}\mathbf{v}\right)^T\mathbf{z}\\ &+\left(\mathbf{z} - \mathbf{z}_k\right)^T\left(\alpha\mathbf{I} - \mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T\right)(\mathbf{z} - \mathbf{z}_k) \end{split} \end{equation} Gk(z)=zTD(ATA)†DTz−2λ(DA†v)Tz+(z−zk)T(αI−D(ATA)†DT)(z−zk) where α≥λ0(D(ATA)†DT)\alpha \ge \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T)α≥λ0(D(ATA)†DT) (the max eigenvalue) such that this term is always positive-semidefinite. Simplifing Equation (15) gives: argmin∣z∣≤1zTz−2(zk+λαDuk+1)Tz\begin{equation} \begin{split} \arg \min_{|\mathbf{z}| \le 1} \mathbf{z}^T\mathbf{z} - 2\left(\mathbf{z}_k + \frac{\lambda}{\alpha} \mathbf{D} \mathbf{u}_{k+1}\right)^T\mathbf{z} \end{split} \end{equation} arg∣z∣≤1minzTz−2(zk+αλDuk+1)Tz which is a separable quadratic optimization problem for each element of z\mathbf{z}z. The solution of the above problem is: zk+1=clip(zk+λαDuk+1,1)\begin{equation} \begin{split} \mathbf{z}_{k+1} &= clip(\mathbf{z}_k + \frac{\lambda}{\alpha} \mathbf{D}\mathbf{u}_{k+1}, 1) \end{split} \end{equation} zk+1=clip(zk+αλDuk+1,1) where clip(x,T)clip(x, T)clip(x,T) is a function clippling the input xxx: clip(x,T)={∣x∣∣x∣≤Tsign(x)Totherwise\begin{equation} \begin{split} clip(x, T) = \begin{cases} |x| & |x| \le T\\ sign(x)T & otherwise\\ \end{cases} \end{split} \end{equation} clip(x,T)={∣x∣sign(x)T∣x∣≤Totherwise We can scale z\mathbf{z}z by 1λ\frac{1}{\lambda}λ1, then the iterative clipping method iterates as the following: uk+1=(ATA)†(ATv−DTzk)zk+1=clip(zk+1αDuk+1,1λ)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\left(\mathbf{A}^T\mathbf{v} - \mathbf{D}^T\mathbf{z}_k\right)\\ \mathbf{z}_{k+1} &= clip(\mathbf{z}_k + \frac{1}{\alpha} \mathbf{D}\mathbf{u}_{k+1}, \frac{1}{\lambda})\\ \end{split} \end{equation} uk+1zk+1=(ATA)†(ATv−DTzk)=clip(zk+α1Duk+1,λ1) where α≥λ0(D(ATA)†DT)\alpha \ge \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T)α≥λ0(D(ATA)†DT). It turns out that the method can be accelerated with a smaller α\alphaα value by contraction mapping principle (basically, it means that there is a fixed point such that f(x)=xf(x)=xf(x)=x). To make zk+1αDuk+1\mathbf{z}_k + \frac{1}{\alpha} \mathbf{D}\mathbf{u}_{k+1}zk+α1Duk+1 a contraction function, we need to make sure I−1αD(ATA)†DT\mathbf{I} - \frac{1}{\alpha}\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^TI−α1D(ATA)†DT is a contraction function. It suggests that α>λ0(D(ATA)†DT)/2\alpha \gt \lambda_0(\mathbf{D}\left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T)/2α>λ0(D(ATA)†DT)/2, which halves the original value. If A=I\mathbf{A} = \mathbf{I}A=I, then Equation (7) becomes the naive TV denoising problem. It turns out that λ0(DDT)\lambda_0(\mathbf{D}\mathbf{D}^T)λ0(DDT) is less than 4 regardless of NNN for both definitions in Equation (4) and (5), thus α=2.3\alpha = 2.3α=2.3 is an appropriate option for A=I\mathbf{A} = \mathbf{I}A=I. Majorization Minimization We could also derive an algorithm with the MM method directly. Given F(x)=∣x∣F(x) = |x|F(x)=∣x∣ for scalar xxx, then G(x)=12∣xk∣x2+12∣xk∣G(x) = \frac{1}{2|x_k|}x^2 + \frac{1}{2}|x_k|G(x)=2∣xk∣1x2+21∣xk∣ such that G(x)≥F(x)G(x) \ge F(x)G(x)≥F(x) and G(xk)=F(xk)G(x_k) = F(x_k)G(xk)=F(xk). For the l1l_1l1 norm, G(x)G(\mathbf{x})G(x) is: G(x)=12xTΛk−1x+12∥xk∥1\begin{equation} \begin{split} G(\mathbf{x}) &= \frac{1}{2} \mathbf{x}^T\mathbf{\Lambda}_k^{-1}\mathbf{x} + \frac{1}{2}\|\mathbf{x}_k\|_1 \end{split} \end{equation} G(x)=21xTΛk−1x+21∥xk∥1 where Λk=diag(∣xk∣)\mathbf{\Lambda}_k=diag(|\mathbf{x}_k|)Λk=diag(∣xk∣). A majorizer of the TV cost function in Equation (7) is: arg minu12uTΛk−1u+12∥uk∥1+λ2∥Au−v∥22\begin{equation} \begin{split} \argmin_{\mathbf{u}} \frac{1}{2} \mathbf{u}^T\mathbf{\Lambda}_k^{-1}\mathbf{u} + \frac{1}{2}\|\mathbf{u}_k\|_1 + \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation} uargmin21uTΛk−1u+21∥uk∥1+2λ∥Au−v∥22 Since the l1l_1l1 term above is now a constant, it’s easier to derive an explicit solution: uk+1=(ATA+1λDTΛk−1D)†ATv\begin{equation} \begin{split} \mathbf{u}_{k+1} = \left(\mathbf{A}^T\mathbf{A} + \frac{1}{\lambda} \mathbf{D}^T\mathbf{\Lambda}_k^{-1}\mathbf{D}\right)^{\dag} \mathbf{A}^T \mathbf{v} \end{split} \end{equation} uk+1=(ATA+λ1DTΛk−1D)†ATv A problem with this iteration form is that as the iterations progress, some values of Λk\mathbf{\Lambda}_kΛk will go to zero, causing division-by-zero errors. This issue can be solved with the matrix inverse lemma: uk+1=((ATA)†−(ATA)†DT(λΛk+D(ATA)†DT)†D(ATA)†)ATv=A†v−(ATA)†DT(λΛk+D(ATA)†DT)†DA†v\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \left(\left(\mathbf{A}^T\mathbf{A}\right)^{\dag} - \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \right)^{\dag} \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \right) \mathbf{A}^T \mathbf{v}\\ &= \mathbf{A}^{\dag}\mathbf{v} - \left(\mathbf{A}^T\mathbf{A}\right)^{\dag}\mathbf{D}^T \left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right)^{\dag} \mathbf{D}^T \right)^{\dag} \mathbf{D} \mathbf{A}^{\dag} \mathbf{v} \end{split} \end{equation} uk+1=((ATA)†−(ATA)†DT(λΛk+D(ATA)†DT)†D(ATA)†)ATv=A†v−(ATA)†DT(λΛk+D(ATA)†DT)†DA†v The complexity of the algorithm would depends on how quick to solve linear equations (λΛk+D(ATA)DT)x=DA†v\left(\lambda \mathbf{\Lambda}_k + \mathbf{D} \left(\mathbf{A}^T\mathbf{A}\right) \mathbf{D}^T\right) \mathbf{x} = \mathbf{D} \mathbf{A}^{\dag} \mathbf{v}(λΛk+D(ATA)DT)x=DA†v. If A=I\mathbf{A} = \mathbf{I}A=I, then λΛk+DDT\lambda \mathbf{\Lambda}_k + \mathbf{D}\mathbf{D}^TλΛk+DDT is a banded matrix which can be solved fastly. The matrix inverse lemma is: (A+BCD)−1=A−1−A−1B(C−1+DA−1B)−1DA−1\begin{equation} \begin{split} (\mathbf{A} + \mathbf{B}\mathbf{C}\mathbf{D})^{-1} = \mathbf{A}^{-1} - \mathbf{A}^{-1}\mathbf{B}(\mathbf{C}^{-1} + \mathbf{D}\mathbf{A}^{-1}\mathbf{B})^{-1}\mathbf{D}\mathbf{A}^{-1} \end{split} \end{equation} (A+BCD)−1=A−1−A−1B(C−1+DA−1B)−1DA−1 Split Bregman I wrote more about the split Bregman method in another post. Here I briefly give the method. To solve the following problem: arg minx ∥F(x)∥1+λG(x)\begin{equation} \begin{split} \argmin_{\mathbf{x}}\ \|F(\mathbf{x})\|_1 + \lambda G(\mathbf{x}) \end{split} \end{equation} xargmin ∥F(x)∥1+λG(x) where F(x)F(\mathbf{x})F(x) and G(x)G(\mathbf{x})G(x) are both convex and differentiable. F(x)F(\mathbf{x})F(x) is also linear. By setting d=F(x)\mathbf{d} = F(\mathbf{x})d=F(x),the split Bregman method has the following iterations: xk+1=arg minx λG(x)+μ2∥F(x)−dk−bk∥22dk+1=arg mind ∥d∥1+μ2∥F(xk+1)−d−bk∥22bk+1=bk−F(xk+1)+dk+1\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}}\ \lambda G(\mathbf{x}) + \frac{\mu}{2}\|F(\mathbf{x}) - \mathbf{d}_k - \mathbf{b}_k \|_2^2\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}}\ \|\mathbf{d}\|_1 + \frac{\mu}{2}\|F(\mathbf{x}_{k+1}) - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - F(\mathbf{x}_{k+1}) + \mathbf{d}_{k+1} \end{split} \end{equation} xk+1dk+1bk+1=xargmin λG(x)+2μ∥F(x)−dk−bk∥22=dargmin ∥d∥1+2μ∥F(xk+1)−d−bk∥22=bk−F(xk+1)+dk+1 Now back to our 1D TV denoising problem in Equation (7), let d=Du\mathbf{d}=\mathbf{D}\mathbf{u}d=Du and apply the above split Bregman method, we have: uk+1=arg minu λ2∥Au−v∥22+μ2∥Du−dk−bk∥22dk+1=arg mind ∥d∥1+μ2∥Duk+1−d−bk∥22bk+1=bk−Duk+1+dk+1\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}}\ \frac{\lambda}{2} \|\mathbf{A} \mathbf{u} - \mathbf{v}\|_2^2 + \frac{\mu}{2}\|\mathbf{D}\mathbf{u} - \mathbf{d}_k - \mathbf{b}_k \|_2^2\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}}\ \|\mathbf{d}\|_1 + \frac{\mu}{2}\|\mathbf{D}\mathbf{u}_{k+1} - \mathbf{d} - \mathbf{b}_k \|_2^2\\ \mathbf{b}_{k+1} &= \mathbf{b}_k - \mathbf{D}\mathbf{u}_{k+1} + \mathbf{d}_{k+1} \end{split} \end{equation} uk+1dk+1bk+1=uargmin 2λ∥Au−v∥22+2μ∥Du−dk−bk∥22=dargmin ∥d∥1+2μ∥Duk+1−d−bk∥22=bk−Duk+1+dk+1 The first subproblem can be solved by setting the derivative to zero and solving equations: (λATA+μDTD)uk+1=λATv+μDT(dk+bk)\begin{equation} \begin{split} \left(\lambda \mathbf{A}^T\mathbf{A} + \mu \mathbf{D}^T\mathbf{D}\right)\mathbf{u}_{k+1} = \lambda \mathbf{A}^T\mathbf{v} + \mu \mathbf{D}^T(\mathbf{d}_k + \mathbf{b}_k) \end{split} \end{equation} (λATA+μDTD)uk+1=λATv+μDT(dk+bk) and the second subproblem has an explicit solution: dk+1=S1/μ[Duk+1+bk]\begin{equation} \begin{split} \mathbf{d}_{k+1} = S_{1/\mu}[\mathbf{D}\mathbf{u}_{k+1} + \mathbf{b}_k] \end{split} \end{equation} dk+1=S1/μ[Duk+1+bk] where St(⋅)S_t(\cdot)St(⋅) is the soft-thresholding operator introduced in this post. ADMM Alternating Direction Method of Multipliers (ADMM) is definitely the most widely used algorithm to solve L1-regularized problems, especially in MRI. For a separable problem with equality constraints: arg minx,y F(x)+G(y)s.t. Ax+By=c\begin{equation} \begin{split} \argmin_{\mathbf{x}, \mathbf{y}}&\ F(\mathbf{x}) + G(\mathbf{y})\\ &s.t.\ \mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} = \mathbf{c} \end{split} \end{equation} x,yargmin F(x)+G(y)s.t. Ax+By=c with its augmented Lagrangian form Lρ(x,y,v)=F(x)+G(y)+vT(Ax+By−c)+ρ2∥Ax+By−c∥22L_{\rho}(\mathbf{x}, \mathbf{y}, \mathbf{v})= F(\mathbf{x}) + G(\mathbf{y}) + \mathbf{v}^T(\mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} - \mathbf{c}) + \frac{\rho}{2} \|\mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} - \mathbf{c}\|_2^2Lρ(x,y,v)=F(x)+G(y)+vT(Ax+By−c)+2ρ∥Ax+By−c∥22, the ADMM iteration is like: xk+1=arg minxLρ(x,yk,vk)yk+1=arg minyLρ(xk+1,y,vk)vk+1=vk+ρ(Axk+1+Byk+1−c)\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \argmin_{\mathbf{x}} L_{\rho}(\mathbf{x}, \mathbf{y}_k, \mathbf{v}_k)\\ \mathbf{y}_{k+1} &= \argmin_{\mathbf{y}} L_{\rho}(\mathbf{x}_{k+1}, \mathbf{y}, \mathbf{v}_k)\\ \mathbf{v}_{k+1} &= \mathbf{v}_k + \rho (\mathbf{A}\mathbf{x}_{k+1} + \mathbf{B}\mathbf{y}_{k+1} - \mathbf{c}) \end{split} \end{equation} xk+1yk+1vk+1=xargminLρ(x,yk,vk)=yargminLρ(xk+1,y,vk)=vk+ρ(Axk+1+Byk+1−c) Let d=Du\mathbf{d}=\mathbf{D}\mathbf{u}d=Du and we would find that our 1D TV denoising problem falls within the ADMM form, thus we have the following iteration: uk+1=arg minuLρ(u,dk,yk)dk+1=arg mindLρ(uk+1,d,yk)yk+1=yk+ρ(Dyk+1−dk+1)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} L_{\rho}(\mathbf{u}, \mathbf{d}_k, \mathbf{y}_k)\\ \mathbf{d}_{k+1} &= \argmin_{\mathbf{d}} L_{\rho}(\mathbf{u}_{k+1}, \mathbf{d}, \mathbf{y}_k)\\ \mathbf{y}_{k+1} &= \mathbf{y}_k + \rho (\mathbf{D}\mathbf{y}_{k+1} -\mathbf{d}_{k+1}) \end{split} \end{equation} uk+1dk+1yk+1=uargminLρ(u,dk,yk)=dargminLρ(uk+1,d,yk)=yk+ρ(Dyk+1−dk+1) where Lρ(u,dk,yk)L_{\rho}(\mathbf{u}, \mathbf{d}_k, \mathbf{y}_k)Lρ(u,dk,yk) is: Lρ(u,d,y)=∥d∥1+λ2∥Au−v∥22+yT(Du−d)+ρ2∥Du−d∥22\begin{equation} \begin{split} L_{\rho}(\mathbf{u}, \mathbf{d}, \mathbf{y}) &= \|\mathbf{d}\|_1 + \frac{\lambda}{2} \|\mathbf{A}\mathbf{u} - \mathbf{v}\|_2^2 \\&+ \mathbf{y}^T(\mathbf{D}\mathbf{u} - \mathbf{d}) + \frac{\rho}{2} \|\mathbf{D}\mathbf{u} - \mathbf{d} \|_2^2 \end{split} \end{equation} Lρ(u,d,y)=∥d∥1+2λ∥Au−v∥22+yT(Du−d)+2ρ∥Du−d∥22 The first subproblem can be solved by setting the derivative to zero: (λATA+ρDTD)uk+1=λATv+DT(ρdk−yk)\begin{equation} \begin{split} (\lambda\mathbf{A}^T\mathbf{A} + \rho \mathbf{D}^T\mathbf{D})\mathbf{u}_{k+1} &= \lambda\mathbf{A}^T\mathbf{v} + \mathbf{D}^T(\rho \mathbf{d}_k - \mathbf{y}_k) \end{split} \end{equation} (λATA+ρDTD)uk+1=λATv+DT(ρdk−yk) and the solution of the second subproblem is: dk+1=S1/ρ[Duk+1+1ρyk]\begin{equation} \begin{split} \mathbf{d}_{k+1} = S_{1/\rho}[\mathbf{D}\mathbf{u}_{k+1} + \frac{1}{\rho} \mathbf{y}_k] \end{split} \end{equation} dk+1=S1/ρ[Duk+1+ρ1yk] 2D TV As we’ve already known in previous section, there are two types of TVs for high-dimensional data. Equation (6) for the 2D case with the isotrophic TV is like: arg minu∈RMN∑i∑j(∇Iu)i,j2+(∇Ju)i,j2+λ2∥Au−v∥22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} \sum_{i} \sum_{j} \sqrt{\left(\nabla_{I} \mathbf{u} \right)_{i,j}^2 + \left(\nabla_{J} \mathbf{u} \right)_{i,j}^2} + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation} u∈RMNargmini∑j∑(∇Iu)i,j2+(∇Ju)i,j2+2λ∥Au−v∥22 where A(⋅) :RMN⟼RKA(\cdot)\colon \mathbb{R}^{MN} \longmapsto \mathbb{R}^{K}A(⋅):RMN⟼RK is a linear function acts on vectorized matrix u\mathbf{u}u and v∈RK\mathbf{v} \in \mathbb{R}^{K}v∈RK is measured signals. With the anisotrophic TV, Equation (6) is going to be like: arg minu∈RMN∑i∑j(∣∇Iu∣i,j+∣∇Ju∣i,j)+λ2∥Au−v∥22\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} \sum_{i} \sum_{j} \left(|\nabla_{I} \mathbf{u} |_{i,j} + |\nabla_{J} \mathbf{u} |_{i,j}\right) + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 \end{split} \end{equation} u∈RMNargmini∑j∑(∣∇Iu∣i,j+∣∇Ju∣i,j)+2λ∥Au−v∥22 Note that the differential operator ∇I\nabla_{I}∇I acts on vectorized matrices and returns two-dimensional matrices. It’s easier to derive gradients with vectorized matrices rather than matrices themselves. Keep in mind that We don’t actually construct explit differential matrices for ∇I\nabla_{I}∇I and its adjoint. For example, we could reshape vectors into matrices and perform forward differential operations along the corresponding dimension. For ∇I∗\nabla_{I}^*∇I∗, which acts on matrices and returns vectorized matrices, we could perform backward differential operations along the corresponding dimension, then negative all elements, finally reshape matrices into vectors. Split Bregman To apply the split Bregman method to Equation (36), we need to simplify the TV term by setting DI=∇Iu\mathbf{D}_I = \nabla_{I} \mathbf{u}DI=∇Iu and DJ=∇Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}DJ=∇Ju: arg minu∈RMN ∑i∑j(DI)i,j2+(DJ)i,j2+λ2∥Au−v∥22s.t DI=∇Iu DJ=∇Ju\begin{equation} \begin{split} \argmin_{\mathbf{u} \in \mathbb{R}^{MN}} &\ \sum_{i} \sum_{j} \sqrt{\left(\mathbf{D}_{I} \right)_{i,j}^2 + \left(\mathbf{D}_{J} \right)_{i,j}^2} + \frac{\lambda}{2} \|\mathbf{A}\mathbf{u} - \mathbf{v}\|_2^2\\ s.t &\ \mathbf{D}_I = \nabla_{I} \mathbf{u}\\ &\ \mathbf{D}_J = \nabla_{J} \mathbf{u} \end{split} \end{equation} u∈RMNargmins.t i∑j∑(DI)i,j2+(DJ)i,j2+2λ∥Au−v∥22 DI=∇Iu DJ=∇Ju The iteration is like: uk+1=arg minuλ2∥Au−v∥22+μ2∥[DI,kDJ,k]−[∇Iu∇Ju]−[BI,kBJ,k]∥F2DI,k+1,DJ,k+1=arg minDI,DJ∑i∑j(DI)i,j2+(DJ)i,j2+μ2∥[DIDJ]−[∇Iuk+1∇Juk+1]−[BI,kBJ,k]∥F2[BI,k+1BJ,k+1]=[BI,kBJ,k]+[∇Iuk+1∇Juk+1]−[DI,k+1DJ,k+1]\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 + \frac{\mu}{2} \|\begin{bmatrix}\mathbf{D}_{I,k}\\ \mathbf{D}_{J,k}\end{bmatrix} - \begin{bmatrix}\nabla_I \mathbf{u} \\ \nabla_J \mathbf{u}\end{bmatrix} - \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix}\|_F^2\\ \mathbf{D}_{I, k+1}, \mathbf{D}_{J, k+1} &= \argmin_{\mathbf{D}_{I}, \mathbf{D}_{J}} \sum_{i} \sum_{j} \sqrt{\left(\mathbf{D}_{I} \right)_{i,j}^2 + \left(\mathbf{D}_{J} \right)_{i,j}^2} + \frac{\mu}{2} \|\begin{bmatrix}\mathbf{D}_{I}\\ \mathbf{D}_{J}\end{bmatrix} - \begin{bmatrix}\nabla_I \mathbf{u}_{k+1} \\ \nabla_J \mathbf{u}_{k+1}\end{bmatrix} - \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix}\|_F^2\\ \begin{bmatrix}\mathbf{B}_{I,k+1} \\ \mathbf{B}_{J,k+1} \end{bmatrix} &= \begin{bmatrix}\mathbf{B}_{I,k} \\ \mathbf{B}_{J,k} \end{bmatrix} + \begin{bmatrix}\nabla_I \mathbf{u}_{k+1} \\ \nabla_J \mathbf{u}_{k+1}\end{bmatrix} - \begin{bmatrix}\mathbf{D}_{I,k+1}\\ \mathbf{D}_{J,k+1}\end{bmatrix} \end{split} \end{equation} uk+1DI,k+1,DJ,k+1[BI,k+1BJ,k+1]=uargmin2λ∥Au−v∥22+2μ∥[DI,kDJ,k]−[∇Iu∇Ju]−[BI,kBJ,k]∥F2=DI,DJargmini∑j∑(DI)i,j2+(DJ)i,j2+2μ∥[DIDJ]−[∇Iuk+1∇Juk+1]−[BI,kBJ,k]∥F2=[BI,kBJ,k]+[∇Iuk+1∇Juk+1]−[DI,k+1DJ,k+1] The first subproblem can be solved with setting its derivative to zero: (λA∗A+μ(∇I∗∇I+∇J∗∇J))u=λA∗v+μ(∇I∗ZI,k+∇J∗ZJ,k)ZI,k=DI,k−BI,kZJ,k=DJ,k−BJ,k\begin{equation} \begin{split} \left(\lambda \mathbf{A}^*\mathbf{A} + \mu\left( \nabla_I^*\nabla_I + \nabla_J^*\nabla_J\right)\right)\mathbf{u} &= \lambda\mathbf{A}^*\mathbf{v} + \mu \left(\nabla_I^*\mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k}\right)\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{B}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{B}_{J,k}\\ \end{split} \end{equation} (λA∗A+μ(∇I∗∇I+∇J∗∇J))uZI,kZJ,k=λA∗v+μ(∇I∗ZI,k+∇J∗ZJ,k)=DI,k−BI,k=DJ,k−BJ,k The second subproblem can be solved by letting wi,j=[(DI)i,j(DJ)i,j]\mathbf{w}_{i,j} = \begin{bmatrix}(\mathbf{D}_{I})_{i,j}\\ (\mathbf{D}_{J})_{i,j}\end{bmatrix}wi,j=[(DI)i,j(DJ)i,j], then the subproblem can be transformed into: arg minallwi,js ∑i∑j∥wi,j∥2+μ2∑i∑j∥wi,j−yi,j∥22 yi,j=[(∇Iuk+1−BI,k)i,j(∇Juk+1−BJ,k)i,j]\begin{equation} \begin{split} \argmin_{all \mathbf{w}_{i,j}s} &\ \sum_{i} \sum_{j} \|\mathbf{w}_{i,j}\|_2 + \frac{\mu}{2} \sum_{i} \sum_{j} \|\mathbf{w}_{i,j} - \mathbf{y}_{i,j}\|_2^2\\ &\ \mathbf{y}_{i,j} = \begin{bmatrix} (\nabla_I \mathbf{u}_{k+1}-\mathbf{B}_{I,k})_{i,j} \\ (\nabla_J \mathbf{u}_{k+1}-\mathbf{B}_{J,k})_{i,j}\end{bmatrix} \end{split} \end{equation} allwi,jsargmin i∑j∑∥wi,j∥2+2μi∑j∑∥wi,j−yi,j∥22 yi,j=[(∇Iuk+1−BI,k)i,j(∇Juk+1−BJ,k)i,j] We should know that for the 2nd-norm proximal problem: arg minx ∥x∥2+12t∥x−y∥22\begin{equation} \begin{split} \argmin_{\mathbf{x}} &\ \|\mathbf{x}\|_2 + \frac{1}{2t} \|\mathbf{x} - \mathbf{y}\|_2^2 \end{split} \end{equation} xargmin ∥x∥2+2t1∥x−y∥22 it has an explict solution known as vectorial shrinkage: x∗=St[∥y∥2]y∥y∥2\begin{equation} \begin{split} \mathbf{x}^* = S_t[\|\mathbf{y}\|_2] \frac{\mathbf{y}}{\|\mathbf{y}\|_2} \end{split} \end{equation} x∗=St[∥y∥2]∥y∥2y The equation (41) is obviously separable for all wi,j\mathbf{w}_{i,j}wi,js and each separable subproblem has the solution we list above: wi,j∗=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/μ[∥yi,j∥2]yi,j∥yi,j∥2\begin{equation} \begin{split} \mathbf{w}_{i,j}^* &= \begin{bmatrix}(\mathbf{D}_{I,k+1})_{i,j}\\ (\mathbf{D}_{J,k+1})_{i,j}\end{bmatrix}\\ &= S_{1/\mu}[\|\mathbf{y}_{i,j}\|_2] \frac{\mathbf{y}_{i,j}}{\|\mathbf{y}_{i,j}\|_2} \end{split} \end{equation} wi,j∗=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/μ[∥yi,j∥2]∥yi,j∥2yi,j To apply the split Bregman method to Equation (37) by letting DI=∇Iu\mathbf{D}_I = \nabla_{I} \mathbf{u}DI=∇Iu and DJ=∇Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}DJ=∇Ju, the only difference is the 2nd subproblem: DI,k+1,DJ,k+1=arg minDI,DJ∥vec(DI)∥1+∥vec(DJ)∥1+μ2∥vec(DI−∇Iuk+1−BI,k∥22)+μ2∥vec(DJ−∇Juk+1−BJ,k∥22)\begin{equation} \begin{split} \mathbf{D}_{I, k+1}, \mathbf{D}_{J, k+1} = \argmin_{\mathbf{D}_{I}, \mathbf{D}_{J}} &\|vec(\mathbf{D}_I)\|_1 + \|vec(\mathbf{D}_J)\|_1\\ &+ \frac{\mu}{2} \|vec(\mathbf{D}_I -\nabla_I\mathbf{u}_{k+1} - \mathbf{B}_{I,k}\|_2^2)\\ &+ \frac{\mu}{2} \|vec(\mathbf{D}_J -\nabla_J\mathbf{u}_{k+1} - \mathbf{B}_{J,k}\|_2^2) \end{split} \end{equation} DI,k+1,DJ,k+1=DI,DJargmin∥vec(DI)∥1+∥vec(DJ)∥1+2μ∥vec(DI−∇Iuk+1−BI,k∥22)+2μ∥vec(DJ−∇Juk+1−BJ,k∥22) which can be solved with soft-thresholding: DI,k+1=S1/μ[∇Iuk+1+BI,k]DJ,k+1=S1/μ[∇Juk+1+BJ,k]\begin{equation} \begin{split} \mathbf{D}_{I, k+1} &= S_{1/\mu}[\nabla_I \mathbf{u}_{k+1} + \mathbf{B}_{I, k}]\\ \mathbf{D}_{J, k+1} &= S_{1/\mu}[\nabla_J \mathbf{u}_{k+1} + \mathbf{B}_{J, k}]\\ \end{split} \end{equation} DI,k+1DJ,k+1=S1/μ[∇Iuk+1+BI,k]=S1/μ[∇Juk+1+BJ,k] ADMM To apply ADMM to equation (36) by letting DI=∇Iu\mathbf{D}_I = \nabla_{I} \mathbf{u}DI=∇Iu and DJ=∇Ju\mathbf{D}_J = \nabla_{J} \mathbf{u}DJ=∇Ju, the augmented Lagrangian form is: Lρ(u,DI,DJ,YI,YJ)=∑i∑j(DI)i,j2+(DJ)i,j2+λ2∥Au−v∥22+ρ2∥∇Iu−DI,k+YI,k∥F2+ρ2∥∇Ju−DJ,k+YJ,k∥F2\begin{equation} \begin{split} L_{\rho}(\mathbf{u}, \mathbf{D}_I,\mathbf{D}_J, \mathbf{Y}_I, \mathbf{Y}_J) = &\sum_i\sum_j \sqrt{(\mathbf{D}_I)_{i,j}^2 + (\mathbf{D}_J)_{i,j}^2} + \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2\\ &+ \frac{\rho}{2} \|\nabla_I\mathbf{u} - \mathbf{D}_{I,k} + \mathbf{Y}_{I,k}\|_F^2\\ &+ \frac{\rho}{2} \|\nabla_J\mathbf{u} - \mathbf{D}_{J,k} + \mathbf{Y}_{J,k}\|_F^2 \end{split} \end{equation} Lρ(u,DI,DJ,YI,YJ)=i∑j∑(DI)i,j2+(DJ)i,j2+2λ∥Au−v∥22+2ρ∥∇Iu−DI,k+YI,k∥F2+2ρ∥∇Ju−DJ,k+YJ,k∥F2 The ADMM iteration is like: uk+1=arg minuλ2∥Au−v∥22+ρ2∥∇Iu−DI,k+YI,k∥F2+ρ2∥∇Ju−DJ,k+YJ,k∥F2DI,k+1,DJ,k+1=arg minDI,DJ∑i∑j(DI)i,j2+(DJ)i,j2+ρ2∥∇Iuk+1−DI+YI,k∥F2+ρ2∥∇Juk+1−DJ+YJ,k∥F2YI,k+1=YI,k+(∇Iuk+1−DI,k+1)YJ,k+1=YJ,k+(∇Juk+1−DJ,k+1)\begin{equation} \begin{split} \mathbf{u}_{k+1} &= \argmin_{\mathbf{u}} \frac{\lambda}{2} \|A\mathbf{u} - \mathbf{v}\|_2^2 + \frac{\rho}{2} \|\nabla_I\mathbf{u} - \mathbf{D}_{I,k} + \mathbf{Y}_{I,k}\|_F^2 + \frac{\rho}{2} \|\nabla_J\mathbf{u} - \mathbf{D}_{J,k} + \mathbf{Y}_{J,k}\|_F^2\\ \mathbf{D}_{I,k+1}, \mathbf{D}_{J,k+1} &= \argmin_{\mathbf{D}_I, \mathbf{D}_J} \sum_i\sum_j \sqrt{(\mathbf{D}_I)_{i,j}^2 + (\mathbf{D}_J)_{i,j}^2} + \frac{\rho}{2} \|\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I} + \mathbf{Y}_{I,k}\|_F^2 + \frac{\rho}{2} \|\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J} + \mathbf{Y}_{J,k}\|_F^2\\ \mathbf{Y}_{I,k+1} &= \mathbf{Y}_{I,k} + (\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I, k+1})\\ \mathbf{Y}_{J,k+1} &= \mathbf{Y}_{J,k} + (\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J, k+1}) \end{split} \end{equation} uk+1DI,k+1,DJ,k+1YI,k+1YJ,k+1=uargmin2λ∥Au−v∥22+2ρ∥∇Iu−DI,k+YI,k∥F2+2ρ∥∇Ju−DJ,k+YJ,k∥F2=DI,DJargmini∑j∑(DI)i,j2+(DJ)i,j2+2ρ∥∇Iuk+1−DI+YI,k∥F2+2ρ∥∇Juk+1−DJ+YJ,k∥F2=YI,k+(∇Iuk+1−DI,k+1)=YJ,k+(∇Juk+1−DJ,k+1) The solution of the first subproblem is: (λA∗A+ρ(∇I∗∇I+∇J∗∇J))u=λA∗v+ρ(∇I∗ZI,k+∇J∗ZJ,k)ZI,k=DI,k−YI,kZJ,k=DJ,k−YJ,k\begin{equation} \begin{split} (\lambda \mathbf{A}^*\mathbf{A} + \rho (\nabla_I^*\nabla_I + \nabla_J^*\nabla_J))\mathbf{u} &= \lambda \mathbf{A}^*\mathbf{v} + \rho (\nabla_I^* \mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k})\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{Y}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{Y}_{J,k} \end{split} \end{equation} (λA∗A+ρ(∇I∗∇I+∇J∗∇J))uZI,kZJ,k=λA∗v+ρ(∇I∗ZI,k+∇J∗ZJ,k)=DI,k−YI,k=DJ,k−YJ,k The second subproblem can be solved as the same process in split Bregman by letting wi,j=[(DI)i,j(DJ)i,j]\mathbf{w}_{i,j} = \begin{bmatrix}(\mathbf{D}_{I})_{i,j}\\ (\mathbf{D}_{J})_{i,j}\end{bmatrix}wi,j=[(DI)i,j(DJ)i,j]: wi,j∗=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/ρ[∥yi,j∥2]yi,j∥yi,j∥2yi,j=[(∇Iuk+1+YI,k)i,j(∇Juk+1+YJ,k)i,j]\begin{equation} \begin{split} \mathbf{w}_{i,j}^* &= \begin{bmatrix}(\mathbf{D}_{I,k+1})_{i,j}\\ (\mathbf{D}_{J,k+1})_{i,j}\end{bmatrix}\\ &= S_{1/\rho}[\|\mathbf{y}_{i,j}\|_2] \frac{\mathbf{y}_{i,j}}{\|\mathbf{y}_{i,j}\|_2}\\ \mathbf{y}_{i,j} &= \begin{bmatrix} (\nabla_I \mathbf{u}_{k+1}+\mathbf{Y}_{I,k})_{i,j} \\ (\nabla_J \mathbf{u}_{k+1}+\mathbf{Y}_{J,k})_{i,j}\end{bmatrix} \end{split} \end{equation} wi,j∗yi,j=[(DI,k+1)i,j(DJ,k+1)i,j]=S1/ρ[∥yi,j∥2]∥yi,j∥2yi,j=[(∇Iuk+1+YI,k)i,j(∇Juk+1+YJ,k)i,j] The ADMM iteration to equation (37) is like: (λA∗A+ρ(∇I∗∇I+∇J∗∇J))u=λA∗v+ρ(∇I∗ZI,k+∇J∗ZJ,k)ZI,k=DI,k−YI,kZJ,k=DJ,k−YJ,kDI,k+1=S1/ρ[∇Iuk+1+YI,k]DJ,k+1=S1/ρ[∇Juk+1+YJ,k]YI,k+1=YI,k+(∇Iuk+1−DI,k+1)YJ,k+1=YJ,k+(∇Juk+1−DJ,k+1)\begin{equation} \begin{split} (\lambda \mathbf{A}^*\mathbf{A} + \rho (\nabla_I^*\nabla_I + \nabla_J^*\nabla_J))\mathbf{u} &= \lambda \mathbf{A}^*\mathbf{v} + \rho (\nabla_I^* \mathbf{Z}_{I,k} + \nabla_J^*\mathbf{Z}_{J,k})\\ \mathbf{Z}_{I,k} &= \mathbf{D}_{I,k} - \mathbf{Y}_{I,k}\\ \mathbf{Z}_{J,k} &= \mathbf{D}_{J,k} - \mathbf{Y}_{J,k}\\ \mathbf{D}_{I,k+1} &= S_{1/\rho}[\nabla_I\mathbf{u}_{k+1} + \mathbf{Y}_{I,k}]\\ \mathbf{D}_{J,k+1} &= S_{1/\rho}[\nabla_J\mathbf{u}_{k+1} + \mathbf{Y}_{J,k}]\\ \mathbf{Y}_{I,k+1} &= \mathbf{Y}_{I,k} + (\nabla_I\mathbf{u}_{k+1} - \mathbf{D}_{I, k+1})\\ \mathbf{Y}_{J,k+1} &= \mathbf{Y}_{J,k} + (\nabla_J\mathbf{u}_{k+1} - \mathbf{D}_{J, k+1}) \end{split} \end{equation} (λA∗A+ρ(∇I∗∇I+∇J∗∇J))uZI,kZJ,kDI,k+1DJ,k+1YI,k+1YJ,k+1=λA∗v+ρ(∇I∗ZI,k+∇J∗ZJ,k)=DI,k−YI,k=DJ,k−YJ,k=S1/ρ[∇Iuk+1+YI,k]=S1/ρ[∇Juk+1+YJ,k]=YI,k+(∇Iuk+1−DI,k+1)=YJ,k+(∇Juk+1−DJ,k+1)
This post contains some userful tensor notation and tricks which I have seen and collected. The majority of the content is from Tamara Kolda and Brett Bader’s review, Tensor Decompositions and Applications. What is A Tensor The mathematical definition of tensor is hard to understand for me. I would prefer viewing a tensor as a multi-dimensional array. An order-NNN tensor X∈RI1×I2×⋯IN\mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots I_N}X∈RI1×I2×⋯IN is a NNN dimensional array and IkI_kIk is the degree of the kkk-th dimension. So matrix are order-2 tensors and vectors are order-1 tensors. Elements in tensor X\mathcal{X}X are denoted as xi1,i2,⋯ ,iNx_{i_1,i_2,\cdots,i_N}xi1,i2,⋯,iN, where iki_kik is the index running from 1 to IkI_kIk. Norm, Inner Product, and Rank-One The inner product of two same-sized tensors X,Y∈RI1×I2×⋯IN\mathcal{X}, \mathcal{Y} \in \mathbb{R}^{I_1 \times I_2 \times \cdots I_N}X,Y∈RI1×I2×⋯IN is the sum of the product of their elements: ⟨X,Y⟩=∑i1∑i2⋯∑iNxi1,i2,⋯ ,iNyi1,i2,⋯ ,iN\begin{equation} \begin{split} \langle \mathcal{X}, \mathcal{Y} \rangle = \sum_{i_1} \sum_{i_2} \cdots \sum_{i_N} x_{i_1,i_2,\cdots,i_N} y_{i_1,i_2,\cdots,i_N} \end{split} \end{equation} ⟨X,Y⟩=i1∑i2∑⋯iN∑xi1,i2,⋯,iNyi1,i2,⋯,iN By this definition, the norm of a tensor X\mathcal{X}X is the square root of the inner product of X\mathcal{X}X and itself: ∥X∥=⟨X,X⟩\begin{equation} \begin{split} \|\mathcal{X}\| = \sqrt{\langle \mathcal{X}, \mathcal{X} \rangle} \end{split} \end{equation} ∥X∥=⟨X,X⟩ An order-NNN tensor X\mathcal{X}X is rank-one if it can be written as the outer product of NNN vectors: ∥X∥=a1∘⋯∘aN\begin{equation} \begin{split} \|\mathcal{X}\| = \mathbf{a}^1 \circ \cdots \circ \mathbf{a}^N \end{split} \end{equation} ∥X∥=a1∘⋯∘aN where xi1,i2,⋯ ,iN=ai11⋯aiNNx_{i_1,i_2,\cdots,i_N} = a_{i_1}^1 \cdots a_{i_N}^Nxi1,i2,⋯,iN=ai11⋯aiNN. K-mode Product As matrix product, the kkk-mode product between a tensor X\mathcal{X}X and a matrix A∈RJ×Ik\mathbf{A} \in \mathbb{R}^{J \times I_k}A∈RJ×Ik is defined as: Y=X×kA\begin{equation} \begin{split} \mathcal{Y} &= \mathcal{X} \times_k \mathbf{A} \end{split} \end{equation} Y=X×kA where the elements of tensor Y∈RI1×I2×J×⋯IN\mathcal{Y} \in \mathbb{R}^{I_1 \times I_2 \times J \times \cdots I_N}Y∈RI1×I2×J×⋯IN can be computed as: yi1,i2,⋯ ,j,⋯ ,iN=∑ik=1Ikxi1,i2,⋯ ,ik,⋯ ,iNaj,ik\begin{equation} \begin{split} y_{i_1,i_2,\cdots,j,\cdots,i_N} &= \sum_{i_k=1}^{I_k} x_{i_1,i_2,\cdots,i_k,\cdots,i_N} a_{j, i_k} \end{split} \end{equation} yi1,i2,⋯,j,⋯,iN=ik=1∑Ikxi1,i2,⋯,ik,⋯,iNaj,ik Continuous kkk-mode products with matrices A1∈RJ1×I1,⋯ ,AN∈RJN×IN\mathbf{A}^1 \in \mathbb{R}^{J_1 \times I_1},\cdots,\mathbf{A}^N \in \mathbb{R}^{J_N \times I_N}A1∈RJ1×I1,⋯,AN∈RJN×IN are denoted as: Y=X×1A1×2A2⋯×NAN\begin{equation} \begin{split} \mathcal{Y} &= \mathcal{X} \times_1 \mathbf{A}^1 \times_2 \mathbf{A}^2 \cdots \times_N \mathbf{A}^N \end{split} \end{equation} Y=X×1A1×2A2⋯×NAN Intuitively, if no two matrices act on the same mode, then the order of product between them is interchangeable. If two matrices share the same mode, then X×nA×nB=X×n(BA)\begin{equation} \begin{split} \mathcal{X} \times_n \mathbf{A} \times_n \mathbf{B} = \mathcal{X} \times_n \left(\mathbf{B} \mathbf{A}\right) \end{split} \end{equation} X×nA×nB=X×n(BA) Mode-K Unfolding The mode-kkk unfolding of a tensor is a matricization process, that is: X(k)=[⋯v⋯]\begin{equation} \begin{split} \mathcal{X}_{(k)} = \begin{bmatrix}\cdots & \mathbf{v} & \cdots \end{bmatrix} \end{split} \end{equation} X(k)=[⋯v⋯] where X(k)∈RIk×I1⋯Ik−1Ik+1⋯IN\mathcal{X}_{(k)} \in \mathbb{R}^{I_k \times I_1 \cdots I_{k-1}I_{k+1} \cdots I_N}X(k)∈RIk×I1⋯Ik−1Ik+1⋯IN. Each column v\mathbf{v}v is called a fiber, which is just elements along the kkk-th dimension given other indices. For simplicity, let’s assume that this reshape operation follows the row-major layout or C-like order, with the index of the last axis changing the fastest. And the kkk-rank of X\mathcal{X}X is defined as the rank of the mode-k unfolding of X\mathcal{X}X. With the mode-kkk unfolding, the kkk-mode product can be expressed as a normal matrix product: Y(k)=AX(k)\begin{equation} \begin{split} \mathcal{Y}_{(k)} &= \mathbf{A} \mathcal{X}_{(k)} \end{split} \end{equation} Y(k)=AX(k) or for continuous kkk-mode products: Y(k)=AkX(k)(A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN)T\begin{equation} \begin{split} \mathcal{Y}_{(k)} &= \mathbf{A}^k \mathcal{X}_{(k)} \left(\mathbf{A}^1 \otimes \cdots \otimes \mathbf{A}^{k-1} \otimes \mathbf{A}^{k+1} \otimes \cdots \otimes \mathbf{A}^{N} \right)^T \end{split} \end{equation} Y(k)=AkX(k)(A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN)T where ⊗\otimes⊗ is the Kronecker product and A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN∈RJ1J2⋯JN×I1I2⋯IN\mathbf{A}^1 \otimes \cdots \otimes \mathbf{A}^{k-1} \otimes \mathbf{A}^{k+1} \otimes \cdots \otimes \mathbf{A}^{N} \in \mathbb{R}^{J_1J_2\cdots J_N \times I_1I_2\cdots I_N }A1⊗⋯⊗Ak−1⊗Ak+1⊗⋯⊗AN∈RJ1J2⋯JN×I1I2⋯IN is a super big matrix (you would find that Kolda & Bader matricized the tensor with the column-major layout and the order of Kronecker products was reversed). The Kronecker product of matrices A∈RI×J\mathbf{A} \in \mathbb{R}^{I \times J}A∈RI×J and B∈RK×L\mathbf{B} \in \mathbb{R}^{K \times L}B∈RK×L is defined by: A⊗B=[a1,1B⋯a1,JB⋮⋱⋮aI,1B⋯aI,JB]=[a1⊗b1⋯a1⊗bLa2⊗b1⋯aJ⊗bL]\begin{equation} \begin{split} \mathbf{A} \otimes \mathbf{B} &= \begin{bmatrix} a_{1,1} \mathbf{B} & \cdots & a_{1, J} \mathbf{B}\\ \vdots & \ddots & \vdots\\ a_{I,1} \mathbf{B}& \cdots & a_{I,J} \mathbf{B} \end{bmatrix}\\ &= \begin{bmatrix} \mathbf{a}_1 \otimes \mathbf{b}_1 & \cdots & \mathbf{a}_1\otimes \mathbf{b}_L & \mathbf{a}_2\otimes \mathbf{b}_1 & \cdots & \mathbf{a}_J \otimes \mathbf{b}_L \end{bmatrix} \end{split} \end{equation} A⊗B=a1,1B⋮aI,1B⋯⋱⋯a1,JB⋮aI,JB=[a1⊗b1⋯a1⊗bLa2⊗b1⋯aJ⊗bL] Tensor Decompositions Two most common tensor decompositions are CP and Tucker decompositions, considered to be higher-order generalizations of the matrix SVD and PCA, separately. CP Decomposition CP decomposition (cnanonical decomposition/parallel factors, CANDECOMP/PARAFAC) is to express a tensor as the sum of a finite number of rank-one tensors. For example, given a 3-order tensor X∈RI×J×K\mathcal{X} \in \mathbb{R}^{I \times J \times K}X∈RI×J×K, CP decomposes it as: X≈∑i=1Rai∘bi∘ci\begin{equation} \begin{split} \mathcal{X} &\approx \sum_{i=1}^{R} \mathbf{a}_i \circ \mathbf{b}_i \circ \mathbf{c}_i \end{split} \end{equation} X≈i=1∑Rai∘bi∘ci where the combination of the column vectors are called factor matrices, i.e., A=[a1⋯aR]\mathbf{A} = \begin{bmatrix}\mathbf{a}_1 & \cdots & \mathbf{a}_R \end{bmatrix}A=[a1⋯aR]. Let’s assume that each vector is normalized to value one with a weight vector λ∈RR\mathbf{\lambda} \in \mathbb{R}^Rλ∈RR so that the decomposition is: X≈∑i=1Rλiai∘bi∘ci\begin{equation} \begin{split} \mathcal{X} &\approx \sum_{i=1}^{R} \lambda_i \mathbf{a}_i \circ \mathbf{b}_i \circ \mathbf{c}_i \end{split} \end{equation} X≈i=1∑Rλiai∘bi∘ci For a general NNN-order tensor X∈RI1×I2×⋯×IN\mathcal{X} \in \mathbb{R}^{I_1 \times I_2 \times \cdots \times I_N}X∈RI1×I2×⋯×IN, with the weight vector λ\mathbf{\lambda}λ and factor matrices Ak∈RIk×R\mathbf{A}^k \in \mathbb{R}^{I_k \times R}Ak∈RIk×R, the CP decomposition is: X≈∑i=1Rλiai1∘ai2∘⋯∘aiN\begin{equation} \begin{split} \mathcal{X} &\approx \sum_{i=1}^{R} \lambda_i \mathbf{a}_i^1 \circ \mathbf{a}_i^2 \circ \cdots \circ \mathbf{a}_i^N \end{split} \end{equation} X≈i=1∑Rλiai1∘ai2∘⋯∘aiN and the mode-kkk matricized version is: X(k)≈AkΛ(A1⊙⋯Ak−1⊙Ak+1⊙⋯AN)T\begin{equation} \begin{split} \mathcal{X}_{(k)} &\approx \mathbf{A}^k \mathbf{\Lambda} \left( \mathbf{A}^1 \odot \cdots \mathbf{A}^{k-1} \odot \mathbf{A}^{k+1} \odot \cdots \mathbf{A}^{N} \right)^T \end{split} \end{equation} X(k)≈AkΛ(A1⊙⋯Ak−1⊙Ak+1⊙⋯AN)T where Λ∈RR×R\mathbf{\Lambda} \in \mathbb{R}^{R \times R}Λ∈RR×R is a diagonal matrix of the weight vector λ\mathbf{\lambda}λ and ⊙\odot⊙ is the Khatri-Rao product. The Khatri-Rao product of two matrices A∈RI×K\mathbf{A} \in \mathbb{R}^{I \times K}A∈RI×K and B∈RJ×K\mathbf{B} \in \mathbb{R}^{J \times K}B∈RJ×K is defined by: A⊙B=[a1⊗b1a2⊗b2⋯aK⊗bK]\begin{equation} \begin{split} \mathbf{A} \odot \mathbf{B} &= \begin{bmatrix} \mathbf{a}_1 \otimes \mathbf{b}_1 & \mathbf{a}_2 \otimes \mathbf{b}_2 & \cdots & \mathbf{a}_K \otimes \mathbf{b}_K \end{bmatrix} \end{split} \end{equation} A⊙B=[a1⊗b1a2⊗b2⋯aK⊗bK] where the resulting matrix is in RIJ×K\mathbb{R}^{IJ \times K}RIJ×K. The rank of a tensor is defined as the smallest number RRR to achieve an exact CP decomposition in Equation (14). The tensor rank is an analogue to the matrix rank, but it has many different properites: the tensor rank may be different over R\mathbb{R}R and C\mathbf{C}C; there is no straightforward algorithm to determine the rank of a given tensor yet; the rank decomposition is generally unique with the exception of permuation and scaling operations. It’s well-known that the best rank-kkk approximation of a rank-RRR matrix A\mathbf{A}A is the leading kkk components of its SVD: A≈∑i=1kλiui∘vi, with λ1≥λ2≥⋯≥λk\begin{equation} \begin{split} \mathbf{A} &\approx \sum_{i=1}^{k} \lambda_i \mathbf{u}_i \circ \mathbf{v}_i,\ with\ \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_k \end{split} \end{equation} A≈i=1∑kλiui∘vi, with λ1≥λ2≥⋯≥λk but this does not hold true for high-order tensors. In fact, the best rank-kkk approximation of a tensor may not exist. Since there is no easy way to determine the rank of a tensor, most CP decomposition procedures would fit multiple models until finding one that is good enough to approximate the given tensor. The Alternating least squares (ALS) method is one of them to compute a CP decomposition by solving one factor matrix each time with others fixed: Ak=X(k)(A1⊙⋯Ak−1⊙Ak+1⋯⊙AN)((A1)TA1∗⋯∗(AN)TAN)†\begin{equation} \begin{split} \mathbf{A}^k = &\mathcal{X}_{(k)} \left(\mathbf{A}^1 \odot \cdots \mathbf{A}^{k-1} \odot \mathbf{A}^{k+1} \cdots \odot \mathbf{A}^N \right)\\ &\left( (\mathbf{A}^1)^T\mathbf{A}^1 \ast \cdots \ast (\mathbf{A}^N)^T\mathbf{A}^N\right)^\dag \end{split} \end{equation} Ak=X(k)(A1⊙⋯Ak−1⊙Ak+1⋯⊙AN)((A1)TA1∗⋯∗(AN)TAN)† where ∗\ast∗ is the elementwise product (Hadamard product) and †\dag† is the pseudo-inverse. λ\mathbf{\lambda}λ can be computed by normalizing the columns of Ak\mathbf{A}^kAk. The ALS method iterates until the stopping criteria satisfied. However, the ALS method is not guaranteed to converge to a global minimum or even a stationary point. Tucker Decomposition The Tucker decomposition decomposes a tensor into a core tensor multiplied by a factor matrix along each mode: X=G×1A1×2A2⋯×NAN\begin{equation} \begin{split} \mathcal{X} &= \mathcal{G} \times_1 \mathbf{A}^1 \times_2 \mathbf{A}^2 \cdots \times_N \mathbf{A}^N \end{split} \end{equation} X=G×1A1×2A2⋯×NAN where G∈RR1×⋯×RN\mathcal{G} \in \mathbb{R}^{R_1 \times \cdots \times R_N}G∈RR1×⋯×RN, Ak∈RIk×Rk\mathbf{A}^k \in \mathbb{R}^{I_k \times R_k}Ak∈RIk×Rk and Ak\mathbf{A}^kAk is orthogonal. It can be seen that the CP decomposition is a special case of the Tucker decomposition where the core tensor is superdiagonal and R1=R2=⋯=RNR_1 = R_2 = \cdots = R_NR1=R2=⋯=RN. An important concept of Tucker Decomposition is the nnn-rank of X\mathcal{X}X, denoted rankn(X)rank_n(\mathcal{X})rankn(X), which is the rank of X(n)\mathcal{X}_{(n)}X(n). The Tucker decomposition requires a rank group (R1,R2,⋯ ,RN)(R_1,R_2,\cdots, R_N)(R1,R2,⋯,RN), which is also the size of the core tensor G\mathcal{G}G. Higher-order orthogonal iteration (HOOI) solves Ak\mathbf{A}^kAk with others fixed, as the ALS method: Y=X×1(A1)T⋯×k−1(Ak−1)T×k+1(Ak+1)T⋯×N(AN)TY(k)=UΣVTAk=[u1u2⋯uRn]\begin{equation} \begin{split} \mathcal{Y} &= \mathcal{X} \times_1 (\mathbf{A}^1)^T \cdots \times_{k-1}\\ &(\mathbf{A}^{k-1})^T \times_{k+1} (\mathbf{A}^{k+1})^T \cdots \times_{N} (\mathbf{A}^{N})^T\\ \mathcal{Y}_{(k)} &= \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T\\ \mathbf{A}^k &= \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \cdots & \mathbf{u}_{R_n}\end{bmatrix} \end{split} \end{equation} YY(k)Ak=X×1(A1)T⋯×k−1(Ak−1)T×k+1(Ak+1)T⋯×N(AN)T=UΣVT=[u1u2⋯uRn] and the core tensor G=X×1(A1)T×⋯×(AN)T\mathcal{G} = \mathcal{X} \times_1 (\mathbf{A}^{1})^T \times \cdots \times (\mathbf{A}^{N})^TG=X×1(A1)T×⋯×(AN)T. Note that the Tucker decomposition is not unique.
This weekend I learned Preconditioned Conjugate Gradient method with Jonathan Richard Schewchuk’s lecture note “An Introduction to the Conjugate Gradient Method Without the Agonizing Path”. Here I document what I have learned from the note. TL;DR In short, all methods mentioned in this post are proposed to iteratively solve linear equations Ax=b\mathbf{A} \mathbf{x} = \mathbf{b}Ax=b, assuming A\mathbf{A}A is symmetric and positive-definite. Steepest Gradient Method (SG), which moves along the direciton of residual or negative gradient each step, converges slowly and requires more matrix-vector multiply operations. SG can be improved by moving along the conjugate direction instead of residual direction each step, which converges faster and reduces matrix-vector multiply operations efficiently. However, it requires construct conjugate directions in advance, which sometimes costs as much as with SG. Conjugate Gradient Method (CG) is an efficient algorithm to compute conjugate directions on the fly by constructing conjugate directions with residuals, making it a practical method to solve equations. Finally, Preconditioned Conjugate Method (PCG) is a further improvement to convergence rate by reducing the condition number of A\mathbf{A}A. Steepest Gradient Method Considering the following minimization problem: arg minx∈Rn 12xTAx−bTx+c\begin{equation} \begin{split} \argmin_{\mathbf{x} \in \mathbb{R}^{n}} &\ \frac{1}{2} \mathbf{x}^T \mathbf{A} \mathbf{x} - \mathbf{b}^T\mathbf{x} + c \end{split} \end{equation} x∈Rnargmin 21xTAx−bTx+c where A∈Rn×n\mathbf{A} \in \mathbb{R}^{n \times n}A∈Rn×n is symmetric and positive-definite. Since this is a convex problem, the global minimizer exists and can be derived with setting the gradient f′(x)=xTA−bT∈R1×nf'(\mathbf{x}) = \mathbf{x}^T\mathbf{A} - \mathbf{b}^T \in \mathbb{R}^{1 \times n}f′(x)=xTA−bT∈R1×n to zero. Solving the minimization problem above is equivalent to solving the linear equations Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}Ax=b. Recall that the residual rk\mathbf{r}_krk, error ek\mathbf{e}_kek are defined as follows: rk=b−Axkek=xk−x⋆\begin{equation} \begin{split} \mathbf{r}_k &= \mathbf{b} - \mathbf{A} \mathbf{x}_k \\ \mathbf{e}_k &= \mathbf{x}_k - \mathbf{x}^{\star}\\ \end{split} \end{equation} rkek=b−Axk=xk−x⋆ where x⋆\mathbf{x}^{\star}x⋆ is the solution of Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}Ax=b. The residual and error item are related by: Aek=A(xk−x⋆)=Axk−b+b−Ax⋆=−rk\begin{equation} \begin{split} \mathbf{A} \mathbf{e}_k &= \mathbf{A} \left( \mathbf{x}_k - \mathbf{x}^{\star} \right)\\ &= \mathbf{A} \mathbf{x}_k - \mathbf{b} + \mathbf{b} - \mathbf{A} \mathbf{x}^{\star}\\ &= -\mathbf{r}_{k} \end{split} \end{equation} Aek=A(xk−x⋆)=Axk−b+b−Ax⋆=−rk With these notations, Gradient Descent Method updates each point as follows: xk+1=xk+αkrk\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{r}_k \\ \end{split} \end{equation} xk+1=xk+αkrk where αk\alpha_kαk is the k-th step and rk\mathbf{r}_krk is equal to the negative gradient. SG uses line search to determine how big a step is taken. By the chain rule ddαkf(xk+1)=f′(xk+1)ddαkxk+1\frac{d}{d\alpha_k} f(\mathbf{x}_{k+1})= f'(\mathbf{x}_{k+1}) \frac{d}{d\alpha_k}\mathbf{x}_{k+1}dαkdf(xk+1)=f′(xk+1)dαkdxk+1 and setting the gradient to zero, we have the following αk\alpha_kαk: αk=rkTrkrkTArk\begin{equation} \begin{split} \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{r}_k}{\mathbf{r}_k^T \mathbf{A} \mathbf{r}_k} \\ \end{split} \end{equation} αk=rkTArkrkTrk The SG is: r0=b−Ax0αk=rkTrkrkTArkxk+1=xk+αkrkrk+1=b−Axk+1\begin{equation} \begin{split} \mathbf{r}_0 &= \mathbf{b} - \mathbf{A}\mathbf{x}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{r}_k}{\mathbf{r}_k^T \mathbf{A} \mathbf{r}_k} \\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{r}_k \\ \mathbf{r}_{k+1} &= \mathbf{b} - \mathbf{A} \mathbf{x}_{k+1} \end{split} \end{equation} r0αkxk+1rk+1=b−Ax0=rkTArkrkTrk=xk+αkrk=b−Axk+1 or eliminates one matrix-vector multiply (Axk+1\mathbf{A} \mathbf{x}_{k+1}Axk+1) by iterating residuals as follows: r0=b−Ax0αk=rkTrkrkTArkxk+1=xk+αkrkrk+1=rk−αkArk\begin{equation} \begin{split} \mathbf{r}_0 &= \mathbf{b} - \mathbf{A}\mathbf{x}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{r}_k}{\mathbf{r}_k^T \mathbf{A} \mathbf{r}_k} \\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{r}_k \\ \mathbf{r}_{k+1} &= \mathbf{r}_k - \alpha_k \mathbf{A} \mathbf{r}_k \\ \end{split} \end{equation} r0αkxk+1rk+1=b−Ax0=rkTArkrkTrk=xk+αkrk=rk−αkArk Conjugate Directions SG often finds itself taking similar directions. It would be better if we took a step and got it right the first time. An analogous example is that a person stands at the top left corner of a 5 by 5 grid, aiming to reach to the bottom right corner by moving either right or down a few grids in each step. We could move one grid at a time following a zig-zag path, or we could move to the bottom first and then move to the end with just two steps. In short, we’ll take exactly one step for each direction (totally n steps). Given n directions {dk}\{\mathbf{d}_k\}{dk}, the update formula is as follows: xk+1=xk+αkdk\begin{equation} \begin{split} \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{d}_k \\ \end{split} \end{equation} xk+1=xk+αkdk which is exactly the update formula of GD with replacing rk\mathbf{r}_krk by dk\mathbf{d}_kdk. These directions {dk}\{\mathbf{d}_k\}{dk} should be orthogonal to each other, and ek+1\mathbf{e}_{k+1}ek+1 should be orthogonal to dk\mathbf{d}_{k}dk, otherwise you would always find a direction aligned with previous directions in next updates. Notice that by subtracting x⋆\mathbf{x}^{\star}x⋆ on both sides, Equation (8) becomes: xk+1−x⋆=xk−x⋆+αkdkek+1=ek+αkdk\begin{equation} \begin{split} \mathbf{x}_{k+1} - \mathbf{x}^{\star} &= \mathbf{x}_k - \mathbf{x}^{\star} + \alpha_k \mathbf{d}_k \\ \mathbf{e}_{k+1} &= \mathbf{e}_{k} + \alpha_k \mathbf{d}_k \end{split} \end{equation} xk+1−x⋆ek+1=xk−x⋆+αkdk=ek+αkdk With orthogonal conditions and Equation (9), we have: dkTek+1=0dkT(ek+αkdk)=0αk=−dkTekdkTdk\begin{equation} \begin{split} \mathbf{d}_{k}^T \mathbf{e}_{k+1} &= 0\\ \mathbf{d}_{k}^T \left(\mathbf{e}_{k} + \alpha_k \mathbf{d}_k\right) &= 0\\ \alpha_k &= -\frac{\mathbf{d}_{k}^T \mathbf{e}_{k}}{\mathbf{d}_{k}^T\mathbf{d}_{k}} \end{split} \end{equation} dkTek+1dkT(ek+αkdk)αk=0=0=−dkTdkdkTek We haven’t known ek\mathbf{e}_kek yet, thus Equation (10) can’t be used to calculate αk\alpha_kαk. The solution is to use A-orthogonal instead of orthogonal. Two vectors di\mathbf{d}_idi and dj\mathbf{d}_jdj are A-orthogonal or conjugate, if: diTAdj=0\begin{equation} \begin{split} \mathbf{d}_{i}^T \mathbf{A} \mathbf{d}_{j} &= 0\\ \end{split} \end{equation} diTAdj=0 Once again, without steping into previous directions, the new requirement is that ek+1\mathbf{e}_{k+1}ek+1 be A-orthogonal to dk\mathbf{d}_kdk. This equation is equivalent to finding the minimum point along the search direction dk\mathbf{d}_{k}dk with the line search method: ddαkf(xk+1)=0f′(xk+1)ddαkxk+1=0−rk+1Tdk=0ek+1TAdk=0\begin{equation} \begin{split} \frac{d}{d\alpha_k} f(\mathbf{x}_{k+1}) &= 0\\ f'(\mathbf{x}_{k+1}) \frac{d}{d\alpha_k}\mathbf{x}_{k+1} &= 0\\ -\mathbf{r}_{k+1}^T \mathbf{d}_k &= 0\\ \mathbf{e}_{k+1}^T \mathbf{A} \mathbf{d}_k &= 0 \end{split} \end{equation} dαkdf(xk+1)f′(xk+1)dαkdxk+1−rk+1Tdkek+1TAdk=0=0=0=0 By Equation (9), αk\alpha_kαk is computed as: ek+1TAdk=0(ek+αkdk)TAdk=0αk=−ekTAdkdkTAdk=rkTdkdkTAdk\begin{equation} \begin{split} \mathbf{e}_{k+1}^T \mathbf{A} \mathbf{d}_k &= 0\\ \left( \mathbf{e}_k + \alpha_k \mathbf{d}_k \right)^T \mathbf{A} \mathbf{d}_k &= 0\\ \alpha_k &= - \frac{\mathbf{e}_k^T \mathbf{A} \mathbf{d}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}\\ &= \frac{\mathbf{r}_k^T \mathbf{d}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k} \end{split} \end{equation} ek+1TAdk(ek+αkdk)TAdkαk=0=0=−dkTAdkekTAdk=dkTAdkrkTdk Similar to the SG method, the iterative formulas are as follows: r0=b−Ax0αk=rkTdkdkTAdkxk+1=xk+αkdkrk+1=rk−αkAdk\begin{equation} \begin{split} \mathbf{r}_0 &= \mathbf{b} - \mathbf{A}\mathbf{x}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{d}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}\\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{d}_k\\ \mathbf{r}_{k+1} &= \mathbf{r}_k - \alpha_k \mathbf{A} \mathbf{d}_k\\ \end{split} \end{equation} r0αkxk+1rk+1=b−Ax0=dkTAdkrkTdk=xk+αkdk=rk−αkAdk We can prove that the error item e0\mathbf{e}_0e0 exactly converges to zero after n steps with these formulas. By Equation (9), we can have that ek=e0+Σi=0k−1αidi\mathbf{e}_k = \mathbf{e}_0 + \Sigma_{i=0}^{k-1} \alpha_i \mathbf{d}_iek=e0+Σi=0k−1αidi. Suppose e0=Σi=0nδidi\mathbf{e}_0 = \Sigma_{i=0}^{n} \delta_i \mathbf{d}_ie0=Σi=0nδidi ({dk}\{\mathbf{d}_k\}{dk} span the whole linear space), all the δi\delta_iδi values can be found by multiplying the expression by dkTA\mathbf{d}_k^T\mathbf{A}dkTA: dkTAe0=Σi=0nδidkTAdi=δkdkTAdkδk=dkTAe0dkTAdk=dkTA(e0+Σi=0k−1αidi)dkTAdk=dkTAekdkTAdk=−dkTrkdkTAdk=−αk\begin{equation} \begin{split} \mathbf{d}_k^T \mathbf{A} \mathbf{e}_0 &= \Sigma_{i=0}^{n} \delta_i \mathbf{d}_k^T\mathbf{A}\mathbf{d}_i\\ &= \delta_k \mathbf{d}_k^T\mathbf{A}\mathbf{d}_k\\ \delta_k &= \frac{\mathbf{d}_k^T \mathbf{A} \mathbf{e}_0}{\mathbf{d}_k^T\mathbf{A}\mathbf{d}_k}\\ &= \frac{\mathbf{d}_k^T \mathbf{A} \left(\mathbf{e}_0 + \Sigma_{i=0}^{k-1}\alpha_i \mathbf{d}_i \right)}{\mathbf{d}_k^T\mathbf{A}\mathbf{d}_k}\\ &= \frac{\mathbf{d}_k^T \mathbf{A} \mathbf{e}_k}{\mathbf{d}_k^T\mathbf{A}\mathbf{d}_k}\\ &= - \frac{\mathbf{d}_k^T \mathbf{r}_k}{\mathbf{d}_k^T\mathbf{A}\mathbf{d}_k}\\ &= - \alpha_k \end{split} \end{equation} dkTAe0δk=Σi=0nδidkTAdi=δkdkTAdk=dkTAdkdkTAe0=dkTAdkdkTA(e0+Σi=0k−1αidi)=dkTAdkdkTAek=−dkTAdkdkTrk=−αk where δk\delta_kδk is exactly equal to the −αk-\alpha_k−αk value. This fact demonstrates that we can eliminate one component of e0\mathbf{e}_0e0 each step. Consequently, the remaining error item ek\mathbf{e}_kek is: ek=e0+Σi=0k−1αidi=Σi=knδidi\begin{equation} \begin{split} \mathbf{e}_k &= \mathbf{e}_0 + \Sigma_{i=0}^{k-1} \alpha_i \mathbf{d}_i\\ &= \Sigma_{i=k}^{n} \delta_i \mathbf{d}_i \end{split} \end{equation} ek=e0+Σi=0k−1αidi=Σi=knδidi Conjugate Gradient Method The remaining question is that how to compute a set of A-orthogonal directions {dk}\{\mathbf{d}_k\}{dk}. Here we employ the Gram-Schmidt process. Suppose we have a set of n linearly independent vectors {uk}\{ \mathbf{u}_k \}{uk} and let d0=u0\mathbf{d}_0 = \mathbf{u}_0d0=u0, dk\mathbf{d}_kdk is constructed as: dk=uk+Σi=0k−1βk,idi\begin{equation} \begin{split} \mathbf{d}_k = \mathbf{u}_k + \Sigma_{i=0}^{k-1} \beta_{k,i} \mathbf{d}_{i} \end{split} \end{equation} dk=uk+Σi=0k−1βk,idi where βk,i\beta_{k,i}βk,i is computed as: diTAdk=diTA(uk+Σj=0k−1βk,jdj)0=diTAuk+βk,idiTAdiβk,i=−diTAukdiTAdi\begin{equation} \begin{split} \mathbf{d}_i^T\mathbf{A}\mathbf{d}_k &= \mathbf{d}_i^T\mathbf{A}\left(\mathbf{u}_k + \Sigma_{j=0}^{k-1} \beta_{k,j} \mathbf{d}_{j}\right)\\ 0 &= \mathbf{d}_i^T\mathbf{A}\mathbf{u}_k + \beta_{k,i} \mathbf{d}_i^T\mathbf{A}\mathbf{d}_i\\ \beta_{k,i} &= - \frac{\mathbf{d}_i^T\mathbf{A}\mathbf{u}_k}{\mathbf{d}_i^T\mathbf{A}\mathbf{d}_i} \end{split} \end{equation} diTAdk0βk,i=diTA(uk+Σj=0k−1βk,jdj)=diTAuk+βk,idiTAdi=−diTAdidiTAuk The drawback of the contruction process is that it keeps all older directions in memory to create a new direction. CG employs a more efficient way to constructing these directions by initializing the vectors with residuals uk=rk\mathbf{u}_k = \mathbf{r}_kuk=rk. With these starting points, we will see that most computations can be eliminated. By Equation (17) and (18), dk\mathbf{d}_kdk is constructed as: dk=rk+Σi=0k−1βk,idiβk,i=−diTArkdiTAdi\begin{equation} \begin{split} \mathbf{d}_k &= \mathbf{r}_k + \Sigma_{i=0}^{k-1} \beta_{k,i} \mathbf{d}_{i}\\ \beta_{k,i} &= - \frac{\mathbf{d}_i^T\mathbf{A}\mathbf{r}_k}{\mathbf{d}_i^T\mathbf{A}\mathbf{d}_i} \end{split} \end{equation} dkβk,i=rk+Σi=0k−1βk,idi=−diTAdidiTArk Let’s consider the numerator of βk,i\beta_{k,i}βk,i with Equation (14): diTArk=1αi(ri−ri+1)Trk=1αi(riTrk−ri+1Trk),i=0,1,...,k−1\begin{equation} \begin{split} \mathbf{d}_i^T\mathbf{A}\mathbf{r}_k &= \frac{1}{\alpha_i}(\mathbf{r}_i - \mathbf{r}_{i+1})^T \mathbf{r}_k\\ &= \frac{1}{\alpha_i}(\mathbf{r}_i^T\mathbf{r}_k - \mathbf{r}_{i+1}^T\mathbf{r}_k), i=0,1,...,k-1 \end{split} \end{equation} diTArk=αi1(ri−ri+1)Trk=αi1(riTrk−ri+1Trk),i=0,1,...,k−1 Luckily, we shall see that rk\mathbf{r}_krk is orthogonal to previous residuals. Refering to Equation (16), the error item ek=Σi=knδidi\mathbf{e}_k = \Sigma_{i=k}^{n} \delta_i \mathbf{d}_iek=Σi=knδidi, we have the following equation by multiplying −djTA-\mathbf{d}_j^T \mathbf{A}−djTA on both sides: −djTAek=−Σi=knδidjTAdi,j<kdjTrk=0,j<k\begin{equation} \begin{split} -\mathbf{d}_j^T \mathbf{A} \mathbf{e}_k &= -\Sigma_{i=k}^{n} \delta_i \mathbf{d}_j^T \mathbf{A} \mathbf{d}_i, j<k\\ \mathbf{d}_j^T \mathbf{r}_k &= 0, j<k \end{split} \end{equation} −djTAekdjTrk=−Σi=knδidjTAdi,j<k=0,j<k Thus the residuals are orthogonal to previous conjugate directions if we choose the residuals as starting vectors to construct these directions. According to Equation (19), we further conclude that the residuals are orthogonal to previous residuals: djTrk=0,j<k(rj+Σi=0j−1βj,idi)Trk=0rjTrk=0,j<k\begin{equation} \begin{split} \mathbf{d}_j^T \mathbf{r}_k &= 0, j<k\\ \left(\mathbf{r}_j + \Sigma_{i=0}^{j-1} \beta_{j,i} \mathbf{d}_{i}\right)^T\mathbf{r}_k &= 0\\ \mathbf{r}_j^T \mathbf{r}_k &= 0,j<k \end{split} \end{equation} djTrk(rj+Σi=0j−1βj,idi)TrkrjTrk=0,j<k=0=0,j<k By Equation (21) and (22), βk,i\beta_{k,i}βk,i is simplified as: βk,i={1αk−1rkTrkdk−1TAdk−1i=k−10otherwise\begin{equation} \begin{split} \beta_{k,i} = \begin{cases} \frac{1}{\alpha_{k-1}} \frac{\mathbf{r}_k^T\mathbf{r}_k}{\mathbf{d}_{k-1}^T\mathbf{A}\mathbf{d}_{k-1}} & i = k-1\\ 0 & otherwise\\ \end{cases} \end{split} \end{equation} βk,i={αk−11dk−1TAdk−1rkTrk0i=k−1otherwise where we only need to keep one previous direction instead of all directions. The method of CG is: d0=r0=b−Ax0αk=rkTdkdkTAdk=rkTrkdkTAdkxk+1=xk+αkdkrk+1=rk−αkAdkβk+1=1αkrk+1Trk+1dkTAdk=dkTAdkrkTrkrk+1Trk+1dkTAdk=rk+1Trk+1rkTrkdk+1=rk+1+βk+1dk\begin{equation} \begin{split} \mathbf{d}_0 = \mathbf{r}_0 &= \mathbf{b} - \mathbf{A}\mathbf{x}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{d}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k} = \frac{\mathbf{r}_k^T \mathbf{r}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}\\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{d}_k\\ \mathbf{r}_{k+1} &= \mathbf{r}_k - \alpha_k \mathbf{A} \mathbf{d}_k\\ \beta_{k+1} &= \frac{1}{\alpha_{k}} \frac{\mathbf{r}_{k+1}^T\mathbf{r}_{k+1}}{\mathbf{d}_{k}^T\mathbf{A}\mathbf{d}_{k}}\\ &= \frac{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}{\mathbf{r}_k^T \mathbf{r}_k}\frac{\mathbf{r}_{k+1}^T\mathbf{r}_{k+1}}{\mathbf{d}_{k}^T\mathbf{A}\mathbf{d}_{k}}\\ &= \frac{\mathbf{r}_{k+1}^T\mathbf{r}_{k+1}}{\mathbf{r}_k^T \mathbf{r}_k}\\ \mathbf{d}_{k+1} &= \mathbf{r}_{k+1} + \beta_{k+1} \mathbf{d}_{k}\\ \end{split} \end{equation} d0=r0αkxk+1rk+1βk+1dk+1=b−Ax0=dkTAdkrkTdk=dkTAdkrkTrk=xk+αkdk=rk−αkAdk=αk1dkTAdkrk+1Trk+1=rkTrkdkTAdkdkTAdkrk+1Trk+1=rkTrkrk+1Trk+1=rk+1+βk+1dk Preconditioned Conjugate Gradient Method The analysis of complexity of CG shows that a small condition number κ(A)\kappa(\mathbf{A})κ(A) would improve the convergence rate. Preconditioning is one such technique for reducing the condition number of a matrix. Let’s consider a symmetric, positive-definite matrix M\mathbf{M}M that approximates A\mathbf{A}A but is easier to invert. Instead of solving Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}Ax=b, we solve: M−1Ax=M−1b\begin{equation} \begin{split} \mathbf{M}^{-1}\mathbf{A}\mathbf{x} = \mathbf{M}^{-1}\mathbf{b} \end{split} \end{equation} M−1Ax=M−1b by ensuring that κ(M−1A)≪κ(A)\kappa(\mathbf{M}^{-1}\mathbf{A}) \ll \kappa(\mathbf{A})κ(M−1A)≪κ(A). The problem is that M−1A\mathbf{M}^{-1}\mathbf{A}M−1A is generally neither symmetric nor positive-definite, even thought M\mathbf{M}M and A\mathbf{A}A are. The solution is to decompose M\mathbf{M}M with Cholesky decompostion. The Cholesky decompostion is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, denoted as M=LLT\mathbf{M}=\mathbf{L}\mathbf{L}^TM=LLT. We then solve the following equation: L−1A(L−1)TLTx=L−1bL−1A(L−1)Tx^=b^\begin{equation} \begin{split} \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T \mathbf{L}^T \mathbf{x} &= \mathbf{L}^{-1}\mathbf{b}\\ \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T \hat{\mathbf{x}} &= \hat{\mathbf{b}}\\ \end{split} \end{equation} L−1A(L−1)TLTxL−1A(L−1)Tx^=L−1b=b^ where x^=LTx\hat{\mathbf{x}} = \mathbf{L}^T \mathbf{x}x^=LTx and b^=L−1b\hat{\mathbf{b}} = \mathbf{L}^{-1}\mathbf{b}b^=L−1b. Notice that L−1A(L−1)T\mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^TL−1A(L−1)T is symmetric and positive-definite, allowing us to apply CG directly. Now the method of PCG is: d^0=r^0=b−L−1A(L−1)Tx^0αk=r^kTr^kd^kTL−1A(L−1)Td^kx^k+1=x^k+αkd^kr^k+1=r^k−αkL−1A(L−1)Td^kβk+1=r^k+1Tr^k+1r^kTr^kd^k+1=r^k+1+βk+1d^k\begin{equation} \begin{split} \hat{\mathbf{d}}_0 = \hat{\mathbf{r}}_0 &= \mathbf{b} - \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T\hat{\mathbf{x}}_0\\ \alpha_k &= \frac{\hat{\mathbf{r}}_k^T \hat{\mathbf{r}}_k}{\hat{\mathbf{d}}_k^T \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T \hat{\mathbf{d}}_k}\\ \hat{\mathbf{x}}_{k+1} &= \hat{\mathbf{x}}_k + \alpha_k \hat{\mathbf{d}}_k\\ \hat{\mathbf{r}}_{k+1} &= \hat{\mathbf{r}}_k - \alpha_k \mathbf{L}^{-1}\mathbf{A} \left(\mathbf{L}^{-1}\right)^T \hat{\mathbf{d}}_k\\ \beta_{k+1} &= \frac{\hat{\mathbf{r}}_{k+1}^T\hat{\mathbf{r}}_{k+1}}{\hat{\mathbf{r}}_k^T \hat{\mathbf{r}}_k}\\ \hat{\mathbf{d}}_{k+1} &= \hat{\mathbf{r}}_{k+1} + \beta_{k+1} \hat{\mathbf{d}}_{k}\\ \end{split} \end{equation} d^0=r^0αkx^k+1r^k+1βk+1d^k+1=b−L−1A(L−1)Tx^0=d^kTL−1A(L−1)Td^kr^kTr^k=x^k+αkd^k=r^k−αkL−1A(L−1)Td^k=r^kTr^kr^k+1Tr^k+1=r^k+1+βk+1d^k With some substitutions, we can further eliminate L−1\mathbf{L}^{-1}L−1: r0=b−Ax0d0=M−1r0αk=rkTM−1rkdkTAdkxk+1=xk+αkdkrk+1=rk−αkAdkβk+1=rk+1TM−1rk+1rkTM−1rkdk+1=M−1rk+1+βk+1dk\begin{equation} \begin{split} \mathbf{r}_0 &= \mathbf{b} - \mathbf{A} \mathbf{x}_0\\ \mathbf{d}_0 &= \mathbf{M}^{-1} \mathbf{r}_0\\ \alpha_k &= \frac{\mathbf{r}_k^T \mathbf{M}^{-1} \mathbf{r}_k}{\mathbf{d}_k^T \mathbf{A} \mathbf{d}_k}\\ \mathbf{x}_{k+1} &= \mathbf{x}_k + \alpha_k \mathbf{d}_k\\ \mathbf{r}_{k+1} &= \mathbf{r}_k - \alpha_k \mathbf{A} \mathbf{d}_k\\ \beta_{k+1} &= \frac{\mathbf{r}_{k+1}^T \mathbf{M}^{-1} \mathbf{r}_{k+1}}{\mathbf{r}_k^T \mathbf{M}^{-1} \mathbf{r}_k}\\ \mathbf{d}_{k+1} &= \mathbf{M}^{-1} \mathbf{r}_{k+1} + \beta_{k+1} \mathbf{d}_{k}\\ \end{split} \end{equation} r0d0αkxk+1rk+1βk+1dk+1=b−Ax0=M−1r0=dkTAdkrkTM−1rk=xk+αkdk=rk−αkAdk=rkTM−1rkrk+1TM−1rk+1=M−1rk+1+βk+1dk M\mathbf{M}M is called a preconditioner and the effectiveness of PCG depends on finding a preconditioner that approximates A\mathbf{A}A well enough without too much cost of computing M−1\mathbf{M}^{-1}M−1. In fact, it’s unnecessary to know the explicit form of M−1\mathbf{M}^{-1}M−1, Instead, by directly computing M−1rk\mathbf{M}^{-1}\mathbf{r}_kM−1rk, which can be viewed as an operator with input rk\mathbf{r}_krk achieving the same effect of matrix-vector product. The implementation of PCG benefits from this operator concept, extending it to more general cases. Implementation Considerations Starting point If you have an estimate of x\mathbf{x}x, use it as the starting point x0\mathbf{x}_0x0; otherwise use the zero vector. Stopping criterion Theoretically, CG/PCG finds the solution when the residual becomes zero after n steps. Therefore, the maximum number of iterations would never exceed n; otherwise a division-by-zero error would occur. We may choose to stop the algorithm immediately when the residual falls below a specified threshold. A common criterion is ∥rk∥2<ϵ∥r0∥2\|\mathbf{r}_k\|_2 \lt \epsilon \|\mathbf{r}_0\|_2∥rk∥2<ϵ∥r0∥2. The accumulated roundoff error in the residul iteration formula may yield a false zero residual. To address this issue, once we have a residual falls below the stopping value, we recalculate the residual using the definition (Equation (2)) and double-check if it still falls below the stopping threshold. If not, we restart the iteration. Matlab’s implementation does a little more than that. It checks whether the current direction is small enough in each iteration with the criterion ∥αkdk∥2<ϵ∥xk∥2\|\alpha_k \mathbf{d}_k\|_2 \lt \epsilon \|\mathbf{x}_k\|_2∥αkdk∥2<ϵ∥xk∥2. If true (for 3 times), it believes that the current solution is stuck and terminiates the iteration with warning messages. Choose of Preconditioner M=A\mathbf{M} = \mathbf{A}M=A is an ideal but not practical preconditioner. Solving M−1\mathbf{M}^{-1}M−1 is equivalent to solving Mx=b\mathbf{M}\mathbf{x}=\mathbf{b}Mx=b. One simple preconditioner is a diagonal matrix whose diagonal elements are identical to those of A\mathbf{A}A. Another preconditioner is the incomplete Cholesky factorization. Incomplete Cholesky factorization is a sparse approximation of the Cholesky decomposition by setting elements to zero if the corresponding elements in A\mathbf{A}A are also zero. This preserves the sparsity of A\mathbf{A}A, making it suitable for problems with sparsity matrices. Yet I haven’t found such a function in numpy and scipy packages. I’ll look into it when I have time, refering to wiki and Julia’s code.
Last week I was attempting to implement an in-place fftshift function in c++. I hoped this function could perform shifting along any given dimension of N-dimensional data. similar to those found in Matlab and Numpy. Interestingly, both Matlab and Numpy call another function within it, known as the circshift function in Matlab and the roll function in Numpy. These functions circularly shift the elements in an array by K positions. Below are some illustrations about what circshift does. In fact, circshift is an array rotation problem that has been well-studied a few decades ago. Many problems in computer science eventually boil down to array rotation problem (That’s a rotate!). The array rotation problem is that, given an array with two parts [a, b], how to swap them to get [b, a]. Allocate and Copy As you already know, the optimal time complexity for rotating an array is O(n), where n represents the number of elements in the array. Implementing an array rotation algorithm with a buffer is intuitive: // left: the size of left part// right: the size of right partvoid rotate_with_buffer(int *pSrcDst, size_t left, size_t right){ int* pBuffer = new int[left + right]; // copy left-side elements to buffer memcpy(pBuffer, pSrcDst, left * sizeof(int)); // copy right-side elements to the beginning memmove(pSrcDst, pSrcDst+left, right * sizeof(int)); // copy buffer to the rest positions memcpy(pSrcDst+right, pBuffer, left * sizeof(int)); delete[] pBuffer;} Bridge Rotation rotation_with_buffer is quite simple, provided that you can pay for the cost of memory allocation. In fact, for small arrays, it may outperform other algorithms. Memory requirements can be further reduced by considering the overlap of two parts: Here we divide the array into three parts (assuming a is smaller than b): bl, which represents the overlap of a and b ,and the size of a equals to the size of br. If bl is smaller than a, then we move bl to the buffer and sequentially shift br and a to their correct positions. Finally, we move bl back to the beginning of array. If bl is bigger than a, then we revert to the previous algorithm, buffering a instead. The basic idea is that we always move the smallest part to the buffer. Hoven called it a bridge rotation algorithm. Triple Reversal Rotation Obviously, the above two are not turly in-place rotation. Triple reversal algorithm, as found in Programming Pearls, 2nd Edition, is a prime example of an in-place algorithm. It’s remarkably elegant and easy to understand. Let me explain this algorithm with some notations: ababab denotes the array to be rotated, bababa is the result of rotation, ara^rar is the reversal of aaa, brb^rbr is the reversal of bbb. It’s straightforward to infer two theorems: (ar)r=a(a^r)^r = a(ar)r=a, and (ab)r=brar(ab)^r = b^ra^r(ab)r=brar (it looks like the tranpose in matrix multiply!). To get bababa with reverse operations, we have the following chains: ab→arb→arbr→(arbr)r=baab \rightarrow a^rb \rightarrow a^rb^r \rightarrow (a^rb^r)^r = baab→arb→arbr→(arbr)r=ba ab→abr→arbr→(arbr)r=baab \rightarrow ab^r \rightarrow a^rb^r \rightarrow (a^rb^r)^r = baab→abr→arbr→(arbr)r=ba ab→(ab)r=brar→bar→baab \rightarrow (ab)^r=b^ra^r \rightarrow ba^r \rightarrow baab→(ab)r=brar→bar→ba ab→(ab)r=brar→bra→baab \rightarrow (ab)^r=b^ra^r \rightarrow b^ra \rightarrow baab→(ab)r=brar→bra→ba We can rotate the array with only three reverse operations in many ways. Here is one of implementations: // reverse array pSrcDst with N elementsvoid reverse(int* pSrcDst, size_t N){ int *pa, *pb; pa = pSrcDst; pb = pSrcDst + N; N /= 2; for (size_t i = 0; i < N; ++i) { int buffer = *(pa + i); *(pa + i) = *(pb - i - 1); *(pb - i - 1) = buffer; }}// left: the size of left part// right: the size of right partvoid reversal_rotate(int *pSrcDst, size_t left, size_t right){ reverse(pSrcDst, left); reverse(pSrcDst + left, right); reverse(pSrcDst, left + right);} Triple reversal rotation involves precisely 2N memory operations. Hoven proposed an enhanced version, named trinity rotation (Hi, Lara!), which improves locality and reduces the number of moves. Honestly, I don’t really grasp the idea behind this algorithm and it seems quite complicated. If you are curious, please explore Hoven’s page. Additionally, the performance of triple reversal also depends on how to implement a reverse function efficiently. This blog provides a detailed explanation on how to implement it, utilizing modern computer characteristics, resulting in up to x22 speedups compared to std::reverse in the standard C++ library! Performance Test I compared the perfomance of bridge rotation, triple reversal rotation, and conjoint reversal rotation (aka trinity rotation) on my PC. It’s not a rigorous test. The array size is 1000000, and I chose the left to be 333334. I ran each rotation algorithm 100 times to compute the average running time. I was expecting bridge rotation to be the worst since it had to allocate memory for one third of the array size. Surprisingly, they got nearly the same performance (bridge rotation: 253.5us, triple reversal rotation: 255.6us, conjoint reversal rotation: 252.6us). How operation system optimizs memory management may contribute to this phenomenon. If I just freed allocated memory at the end of each loop and then executed the next loop immediately, I found that the first iteration cost much longer time (1000us) and the subsequent iterations cost shorter time (200-300us). If I deliberately didn’t free the buffer in each rotation, as recommended in this blog, bridge rotation was worse than the others. I can imagine that the system reuses the same memory block in each iteration, thereby making the time for memory allocation within subsequent iterations negligible. The blog I mentioned before also compared a juggling rotation with triple reversal rotation, and it found that most rotation algorithms are almost at the same level. The best way to optimize such an algorithm is to make it as predictable as possible for modern computers so that they can utilize their hardware-level advantages to accelerate the program. It does make sense. Circshift Here I use triple reversal rotation to implement an in-place circshift function capable of shifting along any given dimension of N-dimensional data. The code is here and the interface looks like this: void circshift(const std::vector<int> &vDims, int iAxis, int shift, T *pSrcDst); It’s not a fully optimized algorithm since I use two loops to handle data before and after the target axis seperately. I compared its performance with Matlab’s built-in circshift function, and Matlab is still faster in small and medium array sizes. However, as the size grows larger, it takes much longer for Matlab to allocate a buffer for shifted elements. In such cases, this naive in-place circshift can outperform it. Fftshift Once you get circshift, fftshift and ifftshift are quite easy to implement: template <typename T>void fftshift(const std::vector<int>& vDims, int iAxis, T* pSrcDst){ int K = vDims[iAxis] / 2; circshift(vDims, iAxis, K, pSrcDst);}template <typename T>void ifftshift(const std::vector<int>& vDims, int iAxis, T* pSrcDst){ int K = (vDims[iAxis] - 1) / 2 + 1; circshift(vDims, iAxis, K, pSrcDst);}
The Moore-Penrose inverse or the pseudoinverse A+∈Rn×m\mathbf{A}^+ \in \mathbb{R}^{n \times m}A+∈Rn×m of a matrix A∈Rm×n\mathbf{A} \in \mathbb{R}^{m \times n}A∈Rm×n is a kind of generalization of the inverse matrix to non-square matrices or ill-conditioned matricies. The most confusing part in coding a pinv function is how to choose a appropriate tolerance truncating zero singular values. Basics on Pseudoinverse Mathematically, a pseudoinverse of A\mathbf{A}A is defined as a matrix statisfing some criteria. The pseudoinverse has many good properites, such as being equal to the inverse of A\mathbf{A}A if A\mathbf{A}A is a square matrix and invertiable, and it exists for any matrix, etc. The compuation of the pseudoinverse is quite intutive, A+=VS+UT\mathbf{A}^+ = \mathbf{V} \mathbf{S}^+ \mathbf{U}^TA+=VS+UT, where A=USVT\mathbf{A} = \mathbf{U} \mathbf{S} \mathbf{V}^TA=USVT is the SVD of the matrix A\mathbf{A}A. More importantly, the pseudoinverse relates to the least square problems with Tiknonov regularization: arg minx∈Rn ∥Ax−b∥22+λ∥x∥22\begin{equation} \begin{split} \argmin_{\mathbf{x} \in \mathbb{R}^n} &\ \|\mathbf{A}\mathbf{x} - \mathbf{b}\|_2^2 + \lambda \|\mathbf{x}\|_2^2\\ \end{split} \end{equation} x∈Rnargmin ∥Ax−b∥22+λ∥x∥22 where the solution is: x^=(ATA+λI)−1ATb\begin{equation} \begin{split} \mathbf{\hat{x}} = \left( \mathbf{A}^T\mathbf{A} + \lambda \mathbf{I}\right)^{-1} \mathbf{A}^T \mathbf{b}\\ \end{split} \end{equation} x^=(ATA+λI)−1ATb The pseudoinverse A+\mathbf{A}^+A+ is exactly the limit when λ→0\lambda \to 0λ→0: A+=limλ→0(ATA+λI)−1AT\begin{equation} \begin{split} \mathbf{A}^+ = \lim_{\lambda \to 0} \left( \mathbf{A}^T\mathbf{A} + \lambda \mathbf{I}\right)^{-1} \mathbf{A}^T\\ \end{split} \end{equation} A+=λ→0lim(ATA+λI)−1AT If A\mathbf{A}A is square and ill-conditioned, then it has a high condition number and many singular values or eigenvalues would gradually decay and be equal to 0 in the diagonal matrix S\mathbf{S}S. The inverse of A\mathbf{A}A usually causes overflow in real applications due to its division by zero errors. A common workaround is to replace A−1\mathbf{A}^{-1}A−1 with its pseudoinverse A+\mathbf{A}^+A+. Recall equation (3) and the SVD, we have: A+=limλ→0(STS+λI)−1AT=limλ→0V(STS+λI)−1SUT=limλ→0V[σ1σ12+λ⋯00⋮⋱0000σkσk2+λ00000⋱]UT=V[1σ1⋯00⋮⋱00001σk00000⋱]UT\begin{equation} \begin{split} \mathbf{A}^+ &= \lim_{\lambda \to 0} \left( \mathbf{S}^T\mathbf{S} + \lambda \mathbf{I}\right)^{-1} \mathbf{A}^T\\ &= \lim_{\lambda \to 0} \mathbf{V} \left( \mathbf{S}^T\mathbf{S} + \lambda \mathbf{I}\right)^{-1}\mathbf{S} \mathbf{U}^T\\ &= \lim_{\lambda \to 0} \mathbf{V} \begin{bmatrix} \frac{\sigma_1}{\sigma_1^2+\lambda} & \cdots& 0 &0 & \\ \vdots & \ddots & 0 &0\\ 0 & 0 & \frac{\sigma_k}{\sigma_k^2+\lambda} &0\\ 0 & 0 & 0 &0\\ & & & &\ddots\\ \end{bmatrix} \mathbf{U}^T\\ &= \mathbf{V} \begin{bmatrix} \frac{1}{\sigma_1} & \cdots& 0 &0 & \\ \vdots & \ddots & 0 &0\\ 0 & 0 & \frac{1}{\sigma_k} &0\\ 0 & 0 & 0 &0\\ & & & &\ddots\\ \end{bmatrix} \mathbf{U}^T\\ \end{split} \end{equation} A+=λ→0lim(STS+λI)−1AT=λ→0limV(STS+λI)−1SUT=λ→0limVσ12+λσ1⋮00⋯⋱0000σk2+λσk00000⋱UT=Vσ11⋮00⋯⋱0000σk100000⋱UT so, the pseudoinverse A+\mathbf{A}^+A+ is a truncated SVD method by discarding all zero corresponding components. Theorectially, it would not make encounter any divide-by-zero issues. Engineering Considerations However, determining what zero means in practical numerical computing is a little tricky. Typically, we classify those singular values that fall below some small tolerance as zero numbers, and the choice of default tolerance varies among implementations. Here I list some implementations: Software Implementation Note Matlab max(m,n)*eps(norm(s, inf)) eps(x) returns the positive distance from abs(x) to the next larger in magnitude floating point number of the same precision as x Scipy atol+max(m, n)*np.finfo(dtype).eps*max(s) np.finfo(dtype).eps returns the difference between 1.0 and the next smallest floating point number larger than 1.0 of the precision dtype, and atol is the absolute tolerance defaults to 0 Numpy 1e-15*max(s) Octave max(m, n)*max(s)*std::numeric_limits<T>.epsilon() std::numeric_limits<T>.epsilon() returns the difference between 1.0 and the next smallest floating point number larger than 1.0 of the precision T, and that is GCC compiler’s behavior Julia max(eps(T)*min(m, n)*maximum(s), atol) eps(T) returns the distance between 1.0 and the next larger representable floating-point value of the precision T, and the Julia community also recommend sqrt(eps(T)) for dense ill-conditioned matrices. I’m not an expert in numerical computing, so I don’t want to talk about the considerations behind these implementations. If you are interested, you can refer to the commnunity discussions on this topic, such as those in Numpy and Julia, or some typical books such as Golub and Van Loan’s Matrix Computations, Vetterling’s Numerical Recipes. Generally speaking, there is no one-size-fits-all tolerance for all kinds of problems, and Matlab’s implementation is considered to be the most conservative way. Nearly all implementations consider two tolerances: absolute and relative. Here I just provide an excerpt from stata’s manuals: An absolute tolerance is a fixed number that is used to make direct comparisons. If the tolerance for a particular routine were 1e–14, then 8.99e–15 in some calculation would be considered to be close enough to zero to act as if it were, in fact, zero, and 1.000001e–14 would be considered a valid, nonzero number. But is 1e–14 small? The number may look small to you, but whether 1e–14 is small depends on what is being measured and the units in which it is measured. If all the numbers in a certain problem were around 1e–12, you might suspect that 1e–14 is a reasonable number. That leads to relative measures of tolerance. Rather than treating, say, a predetermined quantity as being so small as to be zero, one specifies a value (for example, 1e–14) multiplied by something and uses that as the definition of small. … For the above matrix, the diagonal of U turns out to be (5.5e+14, 2.4e+13, 0.000087). An absolutist would tell you that the matrix is of full rank; the smallest number along the diagonal of U is 0.000087 (8.7e–5), and that is still a respectable number, at least when compared with computer precision, which is about 2.22e–16. Most Mata routines would tell you that the matrix has rank 2. Numbers such as 0.000087 may seem respectable when compared with machine precision, but 0.000087 is, relatively speaking, a very small number, being about 4.6e–19 relative to the average value of the diagonal elements. and from Vetterling’s comments (p795): Moreover, if a singular value wi is nonzero but very small, you should also define its reciprocal to be zero, since its apparent value is probably an artifact of roundoff error, not a meaningful number. A plausible answer to the question “how small is small?” is to edit in this fashion all singular values whose ratio to the largest singular value is less than N times the machine precision ϵ\epsilonϵ. (This is a more conservative recommendation than the default in section 2.6, which scales as N1/2N^{1/2}N1/2.)
I’ve been struggling with calculating the memory usage for a week. Here’s the case: I got a program that needs to estimate how much memory it may consume during runtime with some predefined inputs, such as the size of images, etc. The problem is that the program is so complicated that nearly no one understands the code fully. Not to mention, there are lots of parallel codes in the program, scaling the memory usage by the dynamic number of threads. getrusage I really need a way to measure the memory usage of each function, so I googled it and found Jacob’s video talking about getrusage function. According to the document, this function returns mulitple resource usage measures for the calling process, the calling thread or all children of the calling process that have terminated and been waited for. The memory measure is defined in the field ru_maxrss of struct rusage, which denotes the maximum resident set size (nearly the amount of memory used) in kilobytes. Here’s the code I learned from the Jacob’s video: #include <iostream>#include <string>#include <cstring>#include <sys/resource.h>long get_mem_usage(){ struct rusage usage; int ret; ret = getrusage(RUSAGE_SELF, &usage); return usage.ru_maxrss; // in KB}int main(){ long currentMem = get_mem_usage(); for (int i = 0; i < 100; i++) { char *p = (char *)malloc(1024 * 100); std::memset(p, 1, 1024 * 100); std::printf("usage: %ld + %ld\n", currentMem, get_mem_usage() - currentMem); } return 0;} The output was quite different from what Jacob got in the video. I got memory usage ranges from 6856KB to 7016KB. I did allocate 100 KB for 100 times, 10000KB totally. So, what happened? Well, Jacob mentioned some compiler optimizations that the compiler may not allocate memory if it’s not used. However, it couldn’t explain my results here. I found this answer on stackexchange, which says: For instance, if a process allocates a chunk of memory (say 100Mb) and uses it actively (reads/writes to it), its resident set size will be about 100Mb (plus overhead, the code segment, etc.). If after the process then stops using (but doesn’t release) that memory for a while, the OS could opt to swap chunks of that memory to swap, to make room for other processes (or cache). The resident set size would then decrease by the amount the kernel swapped out. If the process wakes up and starts re-using that memory, the kernel would re-load the data from swap, and the resident set size would go up again. Sounds reasonable, isn’t it? I changed my code and allocated 1MB for 100 times, 100MB totally. This time the memory usage was about 97.33MB. So, just remember that getrusage may be affected by many factors, and the values it returns may not correspond precisely to the theoretical values.
I use Visual Studio when I work at the company. Visual Studio does provide a better coding experience on the Windows platform. But honestly, the majority of the coding at the company is just bug fixing, which is less enjoyable. I’ve made a plan about improving my c++ coding skills since 2024 in my spare time with my Manjaro system. But I don’t want to spend too much time diving into details about compiling, linking, etc., at least for now. I want an effortless dev environment to build c++ projects just like Visual Studio does. After some searching, I gotta say, building c++ projects on Linux isn’t as hard as it seems at first glance. Compilers and Debug Tools In arch-like systems, installing these tools are easy: sudo pacman -S gcc gdb clang cmake VScode CPP Extensions Just as what I’ve done for my scientific works, VS code is still my first choice IDE, only with a few more extensions needed for a C++ building environment. I installed C/C++ Extension Pack, containing C/C++, C/C++ Themes, CMake, and CMake Tools extensions. VSCode CMake Commands Create a C++ Project Open an empty folder and press Ctrl+Shift+P, type CMake: Quick Start to create a new C++ project. It will prompt you to name your project, choose between a C or C++ project, specify whether it is a library or executable, and finally generate a CMakeLists.txt, main.cpp, and a build folder under the empty parent folder. And here is a good starting point to tweak your C++ project settings. Choose a Compiler Press Ctrl+Shift+P and type CMake: Select a Kit. It will prompt you to choose compilers installed on your system. In my case, I have GCC and Clang compilers installed, so I choose GCC as my default compiler which i think it’s enough for newbies. Don’t forget to regenerate your build system with the command Cmake: Configure. Interestingly, the output terminal in VS code shows that CMake uses Ninja instead of Make as its default build tool. I have never used Ninja before so I thought it’s a good opportunity to try this fast, lightweight build tool. Compile the Project Press Ctrl+Shift+P and type CMake: Build or F7 to actually build the whole project. If any compiling error happens, you can find it in the bottom Problems panel. The compiled library or executable object would be found under the build folder. Switch Release/Debug VS Code compiles the C++ project in Debug mode by default. Sometime we may switch to Release mode for better optimization. Press Ctrl+Shift+P, type CMake: Select Variant, and select Release or Debug as it prompts. It would automatically regenerate the build system.
Background of PCA Principle component analysis (PCA) is the most widely used dimension reduction technique. Given a matrix X∈Rm×n\mathbf{X} \in \mathbb{R}^{m \times n}X∈Rm×n in which each column of X\mathbf{X}X represents a measurement and the rank of X\mathbf{X}X is rrr, PCA solves the optimization problem as follows: arg minL∈Rm×n ∥X−L∥F2s.t. rank(L)=l,l≪r \begin{equation} \begin{split} \argmin_{\mathbf{L} \in \mathbb{R}^{m \times n}} &\ \|\mathbf{X} - \mathbf{L}\|_F^2\\ \text{s.t.} &\ \ rank(\mathbf{L}) = l, l \ll r \end{split} \end{equation} L∈Rm×nargmins.t. ∥X−L∥F2 rank(L)=l,l≪r which is also known as low-rank approximation. Mathematically, it can be expressed as X=L+N\mathbf{X}=\mathbf{L}+\mathbf{N}X=L+N, where each element of N\mathbf{N}N follows an iid normal distribution. The Eckart-Young-Mirsky theorem states that the best rank-lll approximation in terms of the Frobenius norm is L=∑i=1lσiuiviT\mathbf{L} = \sum_{i=1}^{l} \sigma_i u_i v_i^TL=∑i=1lσiuiviT, where X=∑i=1rσiuiviT\mathbf{X} = \sum_{i=1}^{r} \sigma_i u_i v_i^TX=∑i=1rσiuiviT represents the compact SVD of X\mathbf{X}X. The proof of this theorem can be found in this note. I also have a note talking about PCA in decomposition formula. Introduction to RPCA One drawback of PCA is that it is highly sensitive to corrupted data. The “corrupted” here means significant changes occur on measurements, like specularities on human face images or missing records in database. That’s why robust principle component analysis (RPCA) comes. RPCA aims to recover a low-rank matrix Y\mathbf{Y}Y from highly corrupted measurements X=L+S\mathbf{X}=\mathbf{L}+\mathbf{S}X=L+S, where the elements in S\mathbf{S}S can have arbitrarily large magnitude and they are supposed to be sparse. The optimization problem is: arg minL,S∈Rm×n ∥L∥∗+λ∥vec(S)∥1s.t. L+S=X \begin{equation} \begin{split} \argmin_{\mathbf{L}, \mathbf{S} \in \mathbb{R}^{m \times n}} &\ \|\mathbf{L}\|_* + \lambda \|vec(\mathbf{S})\|_1\\ \text{s.t.} &\ \mathbf{L} + \mathbf{S} = \mathbf{X} \end{split} \end{equation} L,S∈Rm×nargmins.t. ∥L∥∗+λ∥vec(S)∥1 L+S=X where vec(⋅)vec(\cdot)vec(⋅) represents vectorize operator and ∥⋅∥∗\|\cdot\|_*∥⋅∥∗ and ∥⋅∥1\|\cdot\|_1∥⋅∥1 represent matrix nuclear norm and vector l1l_1l1-norm, respectively. At first sight, the problem seems impossible to solve due to insufficient information. In fact, we need to assume that the low-rank component L\mathbf{L}L is not sparse and the sparsity pattern of S\mathbf{S}S is selected uniformly at random. More formally speaking, the incoherence condition should be made, see candes’ paper for more information. Two instances that RPCA is not capable of disentangling two matrices. Suppose that the matrix X\mathbf{X}X has only a one at the top left corner and zeros everywhere else. How can we decide whether X\mathbf{X}X is the low-rank or sparse? Suppose that the first column of S\mathbf{S}S is the opposite of that of L\mathbf{L}L and zeros everywhere else. We would not able to recover L\mathbf{L}L and S\mathbf{S}S since the column space of X\mathbf{X}X falls into that of L\mathbf{L}L. The main result of candes’ paper says that the RPCA problem with above assumptions, has high probability to obtain the solution, given λ=1/max(m,n)\lambda = 1/\sqrt{max(m, n)}λ=1/max(m,n). The choice of λ\lambdaλ is a pure mathematical analysis and it works correctly. However, we may improve the performance of RPCA by choosing λ\lambdaλ in accordance with prior knowledge about the solution. Solutions Proximal Gradient Descent This kind of technique is also called the Principal Component Pursuit (PCP). Before introduce proximal gradient descent (PGD), lets’ recall that how gradient descent works. Considering f(x)∈R,x∈Rmf(\mathbf{x}) \in \mathbb{R}, \mathbf{x} \in \mathbb{R}^{m}f(x)∈R,x∈Rm, f(x)f(\mathbf{x})f(x) is convex and differentiable, we want to find the solution of the problem: arg minx∈Rm f(x) \begin{equation} \begin{split} \argmin_{\mathbf{x} \in \mathbb{R}^m} &\ f(\mathbf{x}) \end{split} \end{equation} x∈Rmargmin f(x) Firstly,we expand the 2-order Taylor series of the function f(x)f(\mathbf{x})f(x) and substitute the Hessian matrix f′′(x)f''(\mathbf{x})f′′(x) with an identity matrix 1tI\frac{1}{t} \mathbf{I}t1I: f(z)≈f(x)+f′(x)(z−x)+12(z−x)Tf′′(x)(z−x)≈f(x)+f′(x)(z−x)+12(z−x)T1tI(z−x) \begin{equation} \begin{split} f(\mathbf{z}) &\approx f(\mathbf{x}) + f'(\mathbf{x})(\mathbf{z}-\mathbf{x}) + \frac{1}{2}(\mathbf{z}-\mathbf{x})^T f''(\mathbf{x}) (\mathbf{z}-\mathbf{x})\\ &\approx f(\mathbf{x}) + f'(\mathbf{x})(\mathbf{z}-\mathbf{x}) + \frac{1}{2}(\mathbf{z}-\mathbf{x})^T \frac{1}{t}\mathbf{I} (\mathbf{z}-\mathbf{x}) \end{split} \end{equation} f(z)≈f(x)+f′(x)(z−x)+21(z−x)Tf′′(x)(z−x)≈f(x)+f′(x)(z−x)+21(z−x)Tt1I(z−x) and this is a quadratic approximation of f(x)f(\mathbf{x})f(x) at point x\mathbf{x}x (here we use the numerator layout). The original problem can be reduced to the following problem: arg minz∈Rm f(x)+f′(x)(z−x)+12(z−x)T1tI(z−x) \begin{equation} \begin{split} \argmin_{\mathbf{z} \in \mathbb{R}^m} &\ f(\mathbf{x}) + f'(\mathbf{x})(\mathbf{z}-\mathbf{x}) + \frac{1}{2}(\mathbf{z}-\mathbf{x})^T \frac{1}{t}\mathbf{I} (\mathbf{z}-\mathbf{x}) \end{split} \end{equation} z∈Rmargmin f(x)+f′(x)(z−x)+21(z−x)Tt1I(z−x) which further reduces to the form: arg minz∈Rm 12t∥z−(x−tf′(x)T)∥22 \begin{equation} \begin{split} \argmin_{\mathbf{z} \in \mathbb{R}^m} \ \frac{1}{2t}\|\mathbf{z}-(\mathbf{x}-tf'(\mathbf{x})^T)\|_2^2 \end{split} \end{equation} z∈Rmargmin 2t1∥z−(x−tf′(x)T)∥22 So the gradient descent update is: x+=x−tf′(x)T \begin{equation} \begin{split} \mathbf{x}^+ = \mathbf{x}-tf'(\mathbf{x})^T \end{split} \end{equation} x+=x−tf′(x)T But what if f(x)f(\mathbf{x})f(x) is not differentiable? Lets’ break f(x)f(\mathbf{x})f(x) into two parts g(x)g(\mathbf{x})g(x) and h(x)h(\mathbf{x})h(x), f(x)=g(x)+h(x)f(\mathbf{x}) = g(\mathbf{x}) + h(\mathbf{x})f(x)=g(x)+h(x), such that g(x)g(\mathbf{x})g(x) is convex and differentiable and h((x))h(\mathbf(x))h((x)) is convex but not differentiable. We can still approximate f(x)f(\mathbf{x})f(x) at point x\mathbf{x}x with Tayler series of g(x)g(\mathbf{x})g(x): f(z)=g(z)+h(z)≈g(x)+g′(x)(z−x)+12(z−x)Tg′′(x)(z−x)+h(z)≈g(x)+g′(x)(z−x)+12(z−x)T1tI(z−x)+h(z) \begin{equation} \begin{split} f(\mathbf{z}) &= g(\mathbf{z}) + h(\mathbf{z})\\ &\approx g(\mathbf{x}) + g'(\mathbf{x})(\mathbf{z}-\mathbf{x}) + \frac{1}{2}(\mathbf{z}-\mathbf{x})^T g''(\mathbf{x}) (\mathbf{z}-\mathbf{x}) + h(\mathbf{z})\\ &\approx g(\mathbf{x}) + g'(\mathbf{x})(\mathbf{z}-\mathbf{x}) + \frac{1}{2}(\mathbf{z}-\mathbf{x})^T \frac{1}{t}\mathbf{I} (\mathbf{z}-\mathbf{x}) + h(\mathbf{z}) \end{split} \end{equation} f(z)=g(z)+h(z)≈g(x)+g′(x)(z−x)+21(z−x)Tg′′(x)(z−x)+h(z)≈g(x)+g′(x)(z−x)+21(z−x)Tt1I(z−x)+h(z) The optimization problem can be reduced to: arg minz∈Rm 12t∥z−(x−tg′(x)T)∥22+h(z) \begin{equation} \begin{split} \argmin_{\mathbf{z} \in \mathbb{R}^m} \ \frac{1}{2t}\|\mathbf{z}-(\mathbf{x}-tg'(\mathbf{x})^T)\|_2^2 + h(\mathbf{z}) \end{split} \end{equation} z∈Rmargmin 2t1∥z−(x−tg′(x)T)∥22+h(z) Here we define proximal mapping proxt(x)prox_t(\mathbf{x})proxt(x) of h(x)h(\mathbf{x})h(x) as: proxt(x)=arg minz∈Rm 12t∥z−x∥22+h(z) \begin{equation} \begin{split} prox_t(\mathbf{x}) = \argmin_{\mathbf{z} \in \mathbb{R}^m} \ \frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|_2^2 + h(\mathbf{z}) \end{split} \end{equation} proxt(x)=z∈Rmargmin 2t1∥z−x∥22+h(z) If we know the proximal mapping of h(x)h(\mathbf{x})h(x), we can easily get the solution of the above optimization problem also known as the PGD update: x+=proxt(x−tg′(x)T)=x−tx−proxt(x−tg′(x)T)t=x−tGt(x) \begin{equation} \begin{split} \mathbf{x}^+ &= prox_t(\mathbf{x}-tg'(\mathbf{x})^T)\\ &= \mathbf{x} - t\frac{\mathbf{x}-prox_t(\mathbf{x}-tg'(\mathbf{x})^T)}{t}\\ &= \mathbf{x} - t G_t(\mathbf{x}) \end{split} \end{equation} x+=proxt(x−tg′(x)T)=x−ttx−proxt(x−tg′(x)T)=x−tGt(x) where Gt(x)G_t(\mathbf{x})Gt(x) is the generalized gradient of f(x)f(\mathbf{x})f(x). Here we list a few common proximal mappings. h(x)h(\mathbf{x})h(x) proxt(x)prox_t(\mathbf{x})proxt(x) note ∣x∣1|\mathbf{x}|_1∣x∣1 St(x)=sign(x)max(∣x∣−t,0)S_t(\mathbf{x}) = sign(\mathbf{x}) max(\lvert\mathbf{x}\rvert-t, 0)St(x)=sign(x)max(∣x∣−t,0) l1l_1l1-norm of vectors, soft-thresholding operator ∣X∣∗|\mathbf{X}|_*∣X∣∗ SVTt(X)=Udiag(St(diag(Σ)))VSVT_t(\mathbf{X})=\mathbf{U} diag(S_t(diag(\mathbf{\Sigma}))) \mathbf{V}SVTt(X)=Udiag(St(diag(Σ)))V nuclear norm of matrices, singular value thresholding operator Back to RPCA, we transform the original problem into an unconstrained problem using a sightly relaxed version of the original problem: arg minL,S∈Rm×n μ∥L∥∗+μλ∥vec(S)∥1+12∥X−L−S∥F2 \begin{equation} \begin{split} \argmin_{\mathbf{L}, \mathbf{S} \in \mathbb{R}^{m \times n}} &\ \mu\|\mathbf{L}\|_* + \mu\lambda \|vec(\mathbf{S})\|_1 + \frac{1}{2} \|\mathbf{X}-\mathbf{L}-\mathbf{S}\|_F^2 \end{split} \end{equation} L,S∈Rm×nargmin μ∥L∥∗+μλ∥vec(S)∥1+21∥X−L−S∥F2 where as μ\muμ approaches 0, any solution to the above problem approaches the solution set of the original problem. Here we let: h([LS])=μ∥L∥∗+μλ∥vec(S)∥1=μ∥[I,0][LS]∥∗+μλ∥vec([0,I][LS])∥1g([LS])=12∥X−L−S∥F2=12∥X−[I,I][LS]∥F2arg minL,S∈Rm×n h([LS])+g([LS]) \begin{equation} \begin{split} h(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}) &= \mu\|\mathbf{L}\|_* + \mu\lambda \|vec(\mathbf{S})\|_1\\ &= \mu\|\begin{bmatrix} \mathbf{I} , \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}\|_* + \mu\lambda \|vec\left(\begin{bmatrix} \mathbf{0} , \mathbf{I} \end{bmatrix} \begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}\right)\|_1\\ g(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}) &= \frac{1}{2} \|\mathbf{X}-\mathbf{L}-\mathbf{S}\|_F^2\\ &= \frac{1}{2} \|\mathbf{X}-\begin{bmatrix} \mathbf{I} , \mathbf{I} \end{bmatrix} \begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}\|_F^2\\ \argmin_{\mathbf{L}, \mathbf{S} \in \mathbb{R}^{m \times n}} &\ h(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}) + g(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix})\\ \end{split} \end{equation} h([LS])g([LS])L,S∈Rm×nargmin=μ∥L∥∗+μλ∥vec(S)∥1=μ∥[I,0][LS]∥∗+μλ∥vec([0,I][LS])∥1=21∥X−L−S∥F2=21∥X−[I,I][LS]∥F2 h([LS])+g([LS]) The gradient of g([LS])g(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix})g([LS]) in numerator layout is: g′([LS])=−[X−L−SX−L−S]T \begin{equation} \begin{split} g'(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}) &= -\begin{bmatrix} \mathbf{X} - \mathbf{L} - \mathbf{S} \\ \mathbf{X} - \mathbf{L} - \mathbf{S} \end{bmatrix}^T \end{split} \end{equation} g′([LS])=−[X−L−SX−L−S]T The proximal mapping of h([LS])h(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix})h([LS]) can be decomposed into two separate proximal mappings: proxt([LS])=arg minZL,ZS∈Rm×n 12t∥[ZLZS]−[LS]∥F2+h([ZLZS])=arg minZL∈Rm×n 12t∥ZL−L∥F2+μ∥L∥∗+arg minZS∈Rm×n 12t∥ZS−S∥F2+μλ∥vec(S)∥1 \begin{equation} \begin{split} prox_t(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}) &= \argmin_{\mathbf{Z_L}, \mathbf{Z_S} \in \mathbb{R}^{m \times n}} \ \frac{1}{2t}\|\begin{bmatrix} \mathbf{Z_L} \\ \mathbf{Z_S} \end{bmatrix}-\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}\|_F^2 + h(\begin{bmatrix} \mathbf{Z_L} \\ \mathbf{Z_S}\end{bmatrix})\\ &= \argmin_{\mathbf{Z_L} \in \mathbb{R}^{m \times n}} \ \frac{1}{2t}\|\mathbf{Z_L} -\mathbf{L}\|_F^2 + \mu\|\mathbf{L}\|_* + \argmin_{\mathbf{Z_S} \in \mathbb{R}^{m \times n}} \ \frac{1}{2t}\|\mathbf{Z_S} -\mathbf{S}\|_F^2 + \mu\lambda\|vec(\mathbf{S})\|_1 \\ \end{split} \end{equation} proxt([LS])=ZL,ZS∈Rm×nargmin 2t1∥[ZLZS]−[LS]∥F2+h([ZLZS])=ZL∈Rm×nargmin 2t1∥ZL−L∥F2+μ∥L∥∗+ZS∈Rm×nargmin 2t1∥ZS−S∥F2+μλ∥vec(S)∥1 so the solution would be: proxt([LS])=[SVTμt(L)uvec(Sμλt(vec(S)))] \begin{equation} \begin{split} prox_t(\begin{bmatrix} \mathbf{L} \\ \mathbf{S} \end{bmatrix}) &= \begin{bmatrix} SVT_{\mu t}\left(\mathbf{L}\right) \\ uvec\left(S_{\mu \lambda t}\left(vec(\mathbf{S})\right)\right) \end{bmatrix} \end{split} \end{equation} proxt([LS])=[SVTμt(L)uvec(Sμλt(vec(S)))] where uvec(⋅)uvec(\cdot)uvec(⋅) represents unvectorize operator. Finally, we get our proximal gradient descent update rules of the RPCA problem: [L(k)S(k)]=proxt([L(k−1)+t(X−L(k−1)−S(k−1))S(k−1)+t(X−L(k−1)−S(k−1))])=[SVTμt(L(k−1)+t(X−L(k−1)−S(k−1)))uvec(Sμλt(vec(S(k−1)+t(X−L(k−1)−S(k−1)))))] \begin{equation} \begin{split} \begin{bmatrix} \mathbf{L}^{(k)} \\ \mathbf{S}^{(k)} \end{bmatrix} &= prox_t\left(\begin{bmatrix} \mathbf{L}^{(k-1)}+t(\mathbf{X}-\mathbf{L}^{(k-1)} - \mathbf{S}^{(k-1)}) \\ \mathbf{S}^{(k-1)}+t(\mathbf{X}-\mathbf{L}^{(k-1)} - \mathbf{S}^{(k-1)}) \end{bmatrix} \right)\\ &= \begin{bmatrix} SVT_{\mu t}\left(\mathbf{L}^{(k-1)}+t(\mathbf{X}-\mathbf{L}^{(k-1)} - \mathbf{S}^{(k-1)})\right) \\ uvec\left(S_{\mu \lambda t}\left(vec(\mathbf{S}^{(k-1)}+t(\mathbf{X}-\mathbf{L}^{(k-1)} - \mathbf{S}^{(k-1)}))\right)\right) \end{bmatrix} \end{split} \end{equation} [L(k)S(k)]=proxt([L(k−1)+t(X−L(k−1)−S(k−1))S(k−1)+t(X−L(k−1)−S(k−1))])=[SVTμt(L(k−1)+t(X−L(k−1)−S(k−1)))uvec(Sμλt(vec(S(k−1)+t(X−L(k−1)−S(k−1)))))] Practicalities of PGD Convergence For f(x)=g(x)+h(x)f(\mathbf{x}) = g(\mathbf{x}) + h(\mathbf{x})f(x)=g(x)+h(x), we assume: ggg is convex , differentiable, and g′g'g′ is Lipschitz continuous with constant L>0L > 0L>0 hhh is convex and the proximal mapping of hhh can be evaluated then proximal gradient descent with fixed step size t≤1Lt \le \frac{1}{L}t≤L1 satisfies: f(x(k))−f∗≤∥x(0)−x∗∥222tk \begin{equation} \begin{split} f(\mathbf{x}^{(k)}) - f^{\ast} \le \frac{\|\mathbf{x}^{(0)} - \mathbf{x}^{\ast}\|_2^2}{2tk} \end{split} \end{equation} f(x(k))−f∗≤2tk∥x(0)−x∗∥22 The above theorem suggests that PGD has convergence rate O(1/ϵ)O(1/\epsilon)O(1/ϵ). Appendix Proof of SVT For each τ≥0\tau \ge 0τ≥0 and Y∈Rm×n\mathbf{Y} \in \mathbf{R}^{m \times n}Y∈Rm×n, the singular value shrinkage operator obeys Dτ(Y)=arg minX12∥X−Y∥F2+τ∥X∥∗ \begin{equation} \begin{split} \mathcal{D}_\tau(\mathbf{Y}) = \argmin_{\mathbf{X}} \frac{1}{2}\| \mathbf{X} - \mathbf{Y} \|_F^2 + \tau \| \mathbf{X} \|_* \end{split} \end{equation} Dτ(Y)=Xargmin21∥X−Y∥F2+τ∥X∥∗ Proof. Since the function h(X):=12∥X−Y∥F2+τ∥X∥∗h(\mathbf{X}) := \frac{1}{2}\| \mathbf{X} - \mathbf{Y} \|_F^2 + \tau \| \mathbf{X} \|_*h(X):=21∥X−Y∥F2+τ∥X∥∗ is strictly convex, it is easy to see that there exists a unique minimizer, and we thus need to prove that it is equal to Dτ(Y)\mathcal{D}_\tau(\mathbf{Y})Dτ(Y). To do this, recall the definition of a subgradient of a convex function f:Rm×n→Rf: \mathbb{R}^{m \times n} \rightarrow \mathbb{R}f:Rm×n→R. We say that Z\mathbf{Z}Z is a subgradient of fff at X0\mathbf{X}_0X0, denoted Z∈∂f(X0)Z \in \partial f(\mathbf{X}_0)Z∈∂f(X0), if f(X)≥f(X0)+⟨Z,X−X0⟩f(\mathbf{X}) \ge f(\mathbf{X}_0) + \langle \mathbf{Z}, \mathbf{X} - \mathbf{X}_0 \ranglef(X)≥f(X0)+⟨Z,X−X0⟩ for all X\mathbf{X}X. Now X^\hat{\mathbf{X}}X^ minimizes hhh if and only if 0 is a subgradient of the functional hhh at the point X^\hat{\mathbf{X}}X^, i.e. 0∈X^−Y+τ∂∥X^∥∗0 \in \hat{\mathbf{X}} − \mathbf{Y} + \tau \partial \|\hat{\mathbf{X}}\|_*0∈X^−Y+τ∂∥X^∥∗, where ∂∥X^∥∗\partial \|\hat{\mathbf{X}}\|_*∂∥X^∥∗ is the set of subgradients of the nuclear norm. Let X∈Rm×n\mathbf{X} \in R^{m \times n}X∈Rm×n be an arbitrary matrix and UΣV∗\mathbf{U} \mathbf{\Sigma} \mathbf{V}^*UΣV∗ be its SVD. It is known that ∂∥X∥∗={UV∗+W:W∈Rm×n,U∗W=0,WV=0,∥W∥2≤1}\partial \|\mathbf{X}\|_* = \{ \mathbf{U} \mathbf{V}^* + \mathbf{W}: \mathbf{W} \in \mathbb{R}^{m \times n}, \mathbf{U}*\mathbf{W}=0, \mathbf{W}\mathbf{V}=0, \|\mathbf{W}\|_2 \le 1\}∂∥X∥∗={UV∗+W:W∈Rm×n,U∗W=0,WV=0,∥W∥2≤1}. Set X^=Dτ(Y)\hat{\mathbf{X}} = \mathcal{D}_\tau(\mathbf{Y})X^=Dτ(Y) for short. In order to show that X^\hat{\mathbf{X}}X^ obeys the optimal condition, decompose the SVD of Y\mathbf{Y}Y as Y=U0Σ0V0∗+U1Σ1V1∗\mathbf{Y} = \mathbf{U}_0 \mathbf{\Sigma}_0 \mathbf{V}_0^* + \mathbf{U}_1 \mathbf{\Sigma}_1 \mathbf{V}_1^*Y=U0Σ0V0∗+U1Σ1V1∗ , where U0,V0\mathbf{U}_0,\mathbf{V}_0U0,V0 (resp. U1,V1\mathbf{U}_1,\mathbf{V}_1U1,V1) are the singular vectors associated with singular values greater than τ\tauτ (resp. smaller than or equal to τ\tauτ). With these notations, we have X^=U0(Σ0−τI)V0\hat{\mathbf{X}} = \mathbf{U}_0 \left(\mathbf{\Sigma}_0 - \tau \mathbf{I} \right)\mathbf{V}_0X^=U0(Σ0−τI)V0 and, therefore, Y−X^=τ(U0V0∗+W),W=τ−1U1Σ1V1∗\mathbf{Y} - \hat{\mathbf{X}} = \tau \left( \mathbf{U}_0 \mathbf{V}_0^* + \mathbf{W}\right), \mathbf{W} = \tau^{-1} \mathbf{U}_1 \mathbf{\Sigma}_1 \mathbf{V}_1^*Y−X^=τ(U0V0∗+W),W=τ−1U1Σ1V1∗. By definition, U0∗W=0WV0=0\mathbf{U}_0*\mathbf{W}=0 \mathbf{W}\mathbf{V}_0=0U0∗W=0WV0=0 and since the diagonal elements of Σ1\mathbf{\Sigma}_1Σ1 have magnitudes bounded by τ\tauτ , we also have ∥W∥2≤1\|\mathbf{W}\|_2 \le 1∥W∥2≤1. Hence Y−X^∈∂∥X^∥∗\mathbf{Y} - \hat{\mathbf{X}} \in \partial \|\hat{\mathbf{X}}\|_*Y−X^∈∂∥X^∥∗, which concludes the proof. References
I used to do all scientific computing work on Jupyter notebooks. My most common way of debugging was print, which is definitely not the best way to do so. Things got complex when I sometimes had to debug other packages installed in the Python environment. The debug mode provided in Jupyter is not very convenient. But I do like the freedom of running cells in Jupyter, which is important for testing new algorithms, so I don’t want to move my workflow to other IDEs like Pycharm. My final choice is VS Code with its gorgeous Python extensions. Virtual Environment Python has a gorgeous ecosystem of libraries and packages. Virtual environments allow user to specify which versions of dependencies and even versions of Python interpreters. Environments also provide isolation between projectes which is important for solving conficting dependencies. I am using Conda as my environment management tool due to its ease of use. Specifically, I choose Miniconda which only includes essential components, allowing me to customize my environments from the ground up. In arch linux, Miniconda is included in AUR, so just install it. yay -S miniconda3 After the installation, it would prompt you to add conda to your terminal: conda init zsh This command would automatically initiate conda for zsh shell (writing things into .zshrc in your home directory). You may want to change conda mirros before creating any environments in China. See conda mirror help for more information. Create Virtual Environments conda create -n test python=3.10 This command creates a virtual environment named ml with python 3.10 installed. Activate/Deactivate Virtual Environments The default virtual environment is “base”. Change to our “test” environment is quite simple: conda activate test Or deactivate it: conda deactivate List All Environments conda env list Pretty easy, hah. Package Management Although conda is able to handle both package and environment management. I prefer to use pip to install necessary packages. pip is included in any conda environment. Again, change pip mirros before install any packages in China. See pip mirror help for more information. Install Basic Scientific Packages pip install numpy scipy pandas scikit-learn Install Basic Visualizaiton Packages pip install matplotlib seaborn pyqt5 Here I choose PyQt as the backend of matplotlib. Install JupyterLab pip install jupyterlab The following commands generate Jupyer server configuration files and set a password, so you can customize it yourself. The configuration files are in the .jupyter folder in your home directory. jupyter server --generate-configjupyter server password We can start our scientific coding with Jupyter Lab, but I would recommend VS Code for a more comfortable coding experience. Now, let’s just start a Jupyter server without taking you to the default browser page: jupyter lab ---no-browser then we would go back into the VS Code configuration. VS Code VS Code is developed by Microsoft. It is a lightweight but powerful code editor which has a rich ecosystem of extensions for nearly every lanaguage. Here I install the binary release from AUR: yay -S visual-studio-code-bin Basic Extensions After starting the VS Code, install Python and Jupyter extensions from the extension market. And this pretty much sets everything up for Python development automatically. Python Formatting A formatter makes your code look clean and readable. In any Python file, hit Ctrl+Shift+P and type format, select Format Document With and hit Python. If you don’t have a formatter installed, VS Code would pop up a window to sugget you install a formatter. I personally choose black formatter. Install Black Formatter extension. In settings, search Format On Save and put this on. Then you will be able to format each document automatically every time you hit save it. Below is a picture of how black formatter format your code. Python Intellisense The default Python intellisense should work out of the box. If not, go to the Settings and search for python.languageserver, change the python language server from default to Pylance or Jedi. Then restart the VS Code. If this still not work, perhaps you need to disable all extensions, restart the VS Code and enable all extensions again. Python Debugging The default Jupyter debug only steps through user-written code. To debug library code, open Settings by Ctrl+, and search justmycode, uncheck the option and restart the VS Code. Run Jupyter Notebook in VS Code There are many ways to create a Jupyer notebook in VS Code (shortcuts, commands, etc). The most easy way is to create a new file named xxx.ipynb and you can directly open it in the VS Code. The first thing is to make sure you select the right Python kernel or environment to run the notebook. You may have multiple choices. Then toggle the Jupyter server selection menu and select Existing server. You will be prompt to enter your Jupyter password. Running cells in VS Code is almost the same as in the Jupyter browser app. Triggering debug mode requires clicking the debug cell button alongside each cell. Just like MATLAB, you can set multiple breakpoints and watch how variables change with the program. Docker for Deep Learning Docker allows you to package applications and their dependencies into isolated containers. One good reason for using Docker is that official PyTorch releases usually depend on older CUDA and cuDNN versions, whereas Arch-based distributions typically have the latest NVIDIA driver. This may cause unexpected failures during installation. For example, my CUDA version is 12.2 whereas the official suggested CUDA version of PyTorch is 11.8. You can use Docker to create a container with a different version of NVIDIA CUDA than what you have installed locally on your machine. Docker containers are designed to be isolated environments that can run software with specific dependencies, regardless of what is installed on the host system. Install Docker Install docker engine on Arch with these commands: sudo pacman -S docker docker-composesudo systemctl enable docker.servicesudo systemctl start docker.service After that, add the current user to docker group: sudo groupadd dockersudo usermod -aG docker $USER Log out and log in to verify successful installation by: docker run hello-world If everything works, it would pull a small image and print hello world. My home directory is quite empty so I want to change docker default images location to my home directory: sudo systemctl stop docker.servicesudo cp -r /var/lib/docker /home/swolf/docker Configure data-root in /etc/docker/daemon.json: { "data-root": "/home/swolf/docker"} Restart docker.service to apply these changes. Perhaps you may need docker-compose: sudo pacman -S docker-compose Install NVIDIA Container Toolkit This is the most tricky part and I can not ensure my installation steps works for you. First, install NVIDIA Container Toolkit from AUR: yay -S nvidia-container-toolkit After that, restart docker.service and type: docker run --gpus all nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi and hopefully you can get the same output as of running nvidia-smi on the host machine. Be aware of the maximum supported CUDA version for your host NVIDIA driver, check this list. Build A Docker Image for DL NVIDIA has pre-build PyTorch images combined with CUDA, CUDNN, etc. Check the release note to see if your requirements have been satisfied already. I am building my deep learning image based on anibali’s PyTorch image. Here is my Dockerfile: FROM anibali/pytorch:2.0.1-cuda11.8RUN pip install --upgrade pipRUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simpleRUN pip install --no-cache-dir numpy scipy pandas scikit-learnRUN pip install --no-cache-dir matplotlib seaborn pyqt5 RUN pip install --no-cache-dir jupyterlabRUN jupyter server --generate-configWORKDIR /home/user/MyProjectsEXPOSE 8889# Start JupyterCMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8889", "--no-browser", "--autoreload", "--IdentityProvider.token=''", "--PasswordIdentityProvider.hashed_password=''"] This Dockerfile would build an image making Jupyter available to external users. I personally change the default port to 8889 for compatibility with local Jupyter servers. And my docker-compose.yml file looks like this: version: '3'services: my_dl: build: context: . dockerfile: Dockerfile container_name: dl ports: - "8889:8889" volumes: - /home/swolf/MyProjects/:/home/user/MyProjects/ stdin_open: true # Enable stdin for interaction tty: true # Allocate a pseudo-TTY deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [ gpu ] So all my stuffs in /home/swolf/MyProjects/ would be a persistent volume to the docker container. Building the image is also simple: docker-compose build To start docker container, use this command: docker-compose up and stop the container after coding: docker-compose down Connect to Jupyter in Container After starting the container above, open your browser and type the URL http://127.0.0.1:8889/. Then you can access any resources inside the container from the browser app. Connect to Container in VS Code Of course you could open any file in /home/swolf/MyProjects/ with VS Code, but it can not parse packages installed in the container so Go to definitions or something else may not works well. We actually want to “open” the IDE inside the container. For this, two extensions are needed: Docker and Dev Containers. After that, there would be a small button on the left bottom corner named Open a Remote Window. Press the button and it would pop up a selection menu prompting you to select the way how to connect to a container. Since we have started our deep learning container, we choose Attach to Running Container. VS Code would open a new window, connecting to the running container with a few necessary extensions installed automatically. You may need install Python and Jupyter extensions inside the container from the extension market. Then you will be able to perform all deep learning tasks just as you would in a local development environment.
The Fourier transform decomposes a function into different frequency components, which can be summed to represent the original function. Fourier Series TL;DR Any continuous periodic function can be represented by Fourier series, where Fourier coefficients are the weighted integrals of the periodic function over one period. Definition A periodic function fT(x)f_{T}(x)fT(x) with a period TTT can be decomposed into the following exponential form: fT(x)=∑n=−∞∞cnei2πnxT\begin{equation} f_{T}(x) = \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n \frac{x}{T}} \end{equation} fT(x)=n=−∞∑∞cnei2πnTx where cnc_ncn are Fourier coefficients: cn=1T∫TfT(x)e−i2πnxTdx\begin{equation} c_n = \frac{1}{T} \int_{T} f_{T}(x) e^{-i 2\pi n \frac{x}{T}} dx \end{equation} cn=T1∫TfT(x)e−i2πnTxdx Continuous Fourier Transform TL;DR The Fourier transform works for both periodic and non-periodic continuous functions. The Fourier transform is a special case of the Fourier series when the period goes to infinity. The Fourier transform of a periodic function is a summation of weighted delta functions at specific frequencies (harmonics of 1/T1/T1/T), where the weights are the Fourier coefficients. Definition For any integrable real-valued function f:R→Cf: \mathbb{R} \rightarrow \mathbb{C}f:R→C, the Fourier transform is defined as: f^(k)=∫−∞∞f(x)e−i2πkxdx\begin{equation} \hat{f}(k) = \int_{-\infty}^{\infty} f(x) e^{-i 2\pi k x} dx \end{equation} f^(k)=∫−∞∞f(x)e−i2πkxdx where k∈Rk \in \mathbb{R}k∈R represents continuous frequencies. For example, if xxx is measured in seconds, then frequency is in hertz. The Fourier transform is able to represent periodic and non-periodic functions, whereas the Fourier series only works for periodic functions. The inverse Fourier transform is defined as: f(x)=∫−∞∞f^(k)ei2πxkdk\begin{equation} f(x) = \int_{-\infty}^{\infty} \hat{f}(k) e^{i 2\pi x k} dk \end{equation} f(x)=∫−∞∞f^(k)ei2πxkdk f(x)f(x)f(x) and f^(k)\hat{f}(k)f^(k) are often referred to as a Fourier transform pair. Here, we use F\mathcal{F}F and F−1\mathcal{F}^{-1}F−1 to denote the Fourier transform (FT) and inverse Fourier transform (iFT), respectively. Sign of Fourier transform Remember the sign used in Fourier transform and inverse Fourier transform is just a convention. Mathematicians usually choose a negative sign for the inverse Fourier transform while engineers stuck to a positive sign for it. It is not that one is better than the other. The consistency is the key, otherwise errors and confusions may arise. Connection to Fourier Series Only periodic signals could be decomposed with Fourier series, what about non-periodic signals? We would see that Fourier transform is actually Fourier series when TTT goes to the infinity, meaning there is non-periodic signals. f(x)=limT→∞fT(x)=limT→∞∑n=−∞∞cnei2πnxT=limT→∞∑n=−∞∞[1T∫TfT(x)e−i2πnxTdx]ei2πnxT=limΔfn→0∑n=−∞∞[∫−T/2T/2fT(x)e−i2πknxdx]ei2πxknΔkn, kn=nT Δkn=1T=∫−∞∞[∫−∞∞f(x)e−i2πkxdx]ei2πxkdk=∫−∞∞f^(k)ei2πxkdk\begin{equation} \begin{split} f(x) &= \lim_{T \rightarrow \infty} f_T(x)\\ &= \lim_{T \rightarrow \infty} \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n \frac{x}{T}}\\ &= \lim_{T \rightarrow \infty} \sum_{n=-\infty}^{\infty} \left[ \frac{1}{T} \int_{T} f_{T}(x) e^{-i 2\pi n \frac{x}{T}} dx \right] e^{i 2\pi n \frac{x}{T}}\\ &= \lim_{\Delta{f_n} \rightarrow 0} \sum_{n=-\infty}^{\infty} \left[ \int_{-T/2}^{T/2} f_{T}(x) e^{-i 2\pi k_n x} dx \right] e^{i 2\pi x k_n} \Delta{k_n},\ k_n = \frac{n}{T}\ \Delta{k_n}=\frac{1}{T}\\ &= \int_{-\infty}^{\infty} \left[ \int_{-\infty}^{\infty} f(x) e^{-i 2\pi k x} dx \right] e^{i 2\pi x k} dk\\ &= \int_{-\infty}^{\infty} \hat{f}(k) e^{i 2\pi x k} dk \end{split} \end{equation} f(x)=T→∞limfT(x)=T→∞limn=−∞∑∞cnei2πnTx=T→∞limn=−∞∑∞[T1∫TfT(x)e−i2πnTxdx]ei2πnTx=Δfn→0limn=−∞∑∞[∫−T/2T/2fT(x)e−i2πknxdx]ei2πxknΔkn, kn=Tn Δkn=T1=∫−∞∞[∫−∞∞f(x)e−i2πkxdx]ei2πxkdk=∫−∞∞f^(k)ei2πxkdk and that is exactly Fourier transform (note the definition of integral above). Properties Linearity For any complex numbers aaa and bbb, if h(x)=af(x)+bg(x)h(x)=af(x)+bg(x)h(x)=af(x)+bg(x), then h^(k)=f^(k)+g^(k)\hat{h}(k) = \hat{f}(k) + \hat{g}(k)h^(k)=f^(k)+g^(k). Time Shifting For any real number x0x_0x0, if h(x)=f(x−x0)h(x)=f(x-x_0)h(x)=f(x−x0), then h^(k)=e−i2πx0kf^(k)\hat{h}(k)=e^{-i 2\pi x_0 k} \hat{f}(k)h^(k)=e−i2πx0kf^(k). F[h(x)]=∫−∞∞f(x−x0)e−i2πkxdx=∫−∞∞f(x^)e−i2πk(x^+x0)d(x^+x0)=e−i2πx0k∫−∞∞f(x^)e−i2πkx^dx^=e−i2πx0kf^(k)\begin{equation} \begin{split} \mathcal{F}\left[h(x)\right] &= \int_{-\infty}^{\infty} f(x-x_0) e^{-i 2\pi k x} dx\\ &= \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi k (\hat{x}+x_0)} d(\hat{x}+x_0)\\ &= e^{-i 2\pi x_0 k} \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi k \hat{x}} d\hat{x}\\ &= e^{-i 2\pi x_0 k} \hat{f}(k) \end{split} \end{equation} F[h(x)]=∫−∞∞f(x−x0)e−i2πkxdx=∫−∞∞f(x^)e−i2πk(x^+x0)d(x^+x0)=e−i2πx0k∫−∞∞f(x^)e−i2πkx^dx^=e−i2πx0kf^(k) Frequency Shifting For any real number k0k_0k0, if h^(k)=f^(k−k0)\hat{h}(k) = \hat{f}(k-k_0)h^(k)=f^(k−k0), then h(x)=ei2πk0xf(x)h(x)=e^{i 2 \pi k_0 x} f(x)h(x)=ei2πk0xf(x). F−1[h^(k)]=∫−∞∞f^(k−k0)ei2πxkdk=∫−∞∞f^(k^)ei2πx(k^+k0)d(k^+k0)=ei2πk0x∫−∞∞f^(k^)ei2πxk^dk^=ei2πk0xf(x)\begin{equation} \begin{split} \mathcal{F}^{-1}\left[\hat{h}(k)\right] &= \int_{-\infty}^{\infty} \hat{f}(k-k_0) e^{i 2\pi x k} dk\\ &= \int_{-\infty}^{\infty} \hat{f}(\hat{k}) e^{i 2\pi x(\hat{k}+k_0)} d(\hat{k}+k_0)\\ &= e^{i 2\pi k_0 x} \int_{-\infty}^{\infty} \hat{f}(\hat{k}) e^{i 2\pi x\hat{k}} d\hat{k}\\ &= e^{i 2\pi k_0 x} f(x) \end{split} \end{equation} F−1[h^(k)]=∫−∞∞f^(k−k0)ei2πxkdk=∫−∞∞f^(k^)ei2πx(k^+k0)d(k^+k0)=ei2πk0x∫−∞∞f^(k^)ei2πxk^dk^=ei2πk0xf(x) Scale Property For any real number aaa, if h(x)=f(ax)h(x)=f(ax)h(x)=f(ax), then h^(k)=1∣a∣f^(ka)\hat{h}(k)=\frac{1}{|a|}\hat{f}(\frac{k}{a})h^(k)=∣a∣1f^(ak). Let’s assuming a>0a \gt 0a>0: F[h(x)]=∫−∞∞f(ax)e−i2πkxdx=∫−∞∞f(x^)e−i2πk(x^/a)d(x^/a)=1a∫−∞∞f(x^)e−i2π(k/a)x^dx^=1af^(ka)\begin{equation} \begin{split} \mathcal{F}\left[h(x)\right] &= \int_{-\infty}^{\infty} f(ax) e^{-i 2\pi k x} dx\\ &= \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi k (\hat{x}/a)} d(\hat{x}/a)\\ &= \frac{1}{a} \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi (k / a) \hat{x}} d\hat{x}\\ &= \frac{1}{a} \hat{f}(\frac{k}{a}) \end{split} \end{equation} F[h(x)]=∫−∞∞f(ax)e−i2πkxdx=∫−∞∞f(x^)e−i2πk(x^/a)d(x^/a)=a1∫−∞∞f(x^)e−i2π(k/a)x^dx^=a1f^(ak) and if a<0a \lt 0a<0: F[h(x)]=∫−∞∞f(ax)e−i2πkxdx=∫∞−∞f(x^)e−i2πk(x^/a)d(x^/a)=1−a∫−∞∞f(x^)e−i2π(k/a)x^dx^=1−aF(ka)\begin{equation} \begin{split} \mathcal{F}\left[h(x)\right] &= \int_{-\infty}^{\infty} f(ax) e^{-i 2\pi k x} dx\\ &= \int_{\infty}^{-\infty} f(\hat{x}) e^{-i 2\pi k (\hat{x}/a)} d(\hat{x}/a)\\ &= \frac{1}{-a} \int_{-\infty}^{\infty} f(\hat{x}) e^{-i 2\pi (k/ a) \hat{x}} d\hat{x}\\ &= \frac{1}{-a} F(\frac{k}{a}) \end{split} \end{equation} F[h(x)]=∫−∞∞f(ax)e−i2πkxdx=∫∞−∞f(x^)e−i2πk(x^/a)d(x^/a)=−a1∫−∞∞f(x^)e−i2π(k/a)x^dx^=−a1F(ak) Time Convolution Theorem For Fourier transform pairs f(x)↔f^(k)f(x) \leftrightarrow \hat{f}(k)f(x)↔f^(k) and g(x)↔g^(k)g(x) \leftrightarrow \hat{g}(k)g(x)↔g^(k), we have: F[f(x)∗g(x)]=∫−∞∞∫−∞∞f(τ)g(x−τ)dτe−i2πkxdx=∫−∞∞∫−∞∞g(x−τ)e−i2πkxdxf(τ)dτ=∫−∞∞∫−∞∞g(x^)e−i2πk(x^+τ)d(x^+τ)f(τ)dτ=∫−∞∞(∫−∞∞g(x^)e−i2πkx^dx^)f(τ)e−i2πkτdτ=g^(k)∫−∞∞f(τ)e−i2πkτdτ=g^(k)f^(k)\begin{equation} \begin{split} \mathcal{F}\left[f(x) \ast g(x)\right] &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(\tau)g(x-\tau) d\tau e^{-i 2\pi k x} dx \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x-\tau) e^{-i 2\pi k x} dx f(\tau) d\tau\\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(\hat{x}) e^{-i 2\pi k (\hat{x}+\tau)} d(\hat{x}+\tau) f(\tau) d\tau\\ &= \int_{-\infty}^{\infty} \left(\int_{-\infty}^{\infty} g(\hat{x}) e^{-i 2\pi k \hat{x}} d\hat{x}\right) f(\tau) e^{-i 2\pi k \tau} d\tau\\ &= \hat{g}(k) \int_{-\infty}^{\infty} f(\tau) e^{-i 2\pi k \tau} d\tau\\ &= \hat{g}(k) \hat{f}(k) \end{split} \end{equation} F[f(x)∗g(x)]=∫−∞∞∫−∞∞f(τ)g(x−τ)dτe−i2πkxdx=∫−∞∞∫−∞∞g(x−τ)e−i2πkxdxf(τ)dτ=∫−∞∞∫−∞∞g(x^)e−i2πk(x^+τ)d(x^+τ)f(τ)dτ=∫−∞∞(∫−∞∞g(x^)e−i2πkx^dx^)f(τ)e−i2πkτdτ=g^(k)∫−∞∞f(τ)e−i2πkτdτ=g^(k)f^(k) Frequency Convolution Theorem For Fourier transform pairs f(x)↔f^(k)f(x) \leftrightarrow \hat{f}(k)f(x)↔f^(k) and g(x)↔g^(k)g(x) \leftrightarrow \hat{g}(k)g(x)↔g^(k), we have: F−1[f^(k)∗g^(k)]=∫−∞∞∫−∞∞f^(τ)g^(k−τ)dτei2πxkdk=∫−∞∞∫−∞∞g^(k−τ)ei2πxkdkf^(τ)dτ=∫−∞∞∫−∞∞g^(k^)ei2πx(k^+τ)d(k^+τ)f^(τ)dτ=∫−∞∞(∫−∞∞g^(k^)ei2πxk^dk^)f^(τ)ei2πxτdτ=g(x)∫−∞∞f^(τ)ei2πxτdτ=g(x)f(x)\begin{equation} \begin{split} \mathcal{F}^{-1}\left[\hat{f}(k) \ast \hat{g}(k)\right] &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{f}(\tau) \hat{g}(k-\tau) d\tau e^{i 2\pi x k} dk\\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{g}(k-\tau) e^{i 2\pi x k} dk \hat{f}(\tau) d\tau\\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{g}(\hat{k}) e^{i 2\pi x (\hat{k}+\tau)} d(\hat{k}+\tau) \hat{f}(\tau) d\tau\\ &= \int_{-\infty}^{\infty} \left(\int_{-\infty}^{\infty} \hat{g}(\hat{k}) e^{i 2\pi x \hat{k}} d\hat{k}\right) \hat{f}(\tau) e^{i 2\pi x \tau} d\tau\\ &= g(x) \int_{-\infty}^{\infty} \hat{f}(\tau) e^{i 2\pi x \tau} d\tau\\ &= g(x) f(x) \end{split} \end{equation} F−1[f^(k)∗g^(k)]=∫−∞∞∫−∞∞f^(τ)g^(k−τ)dτei2πxkdk=∫−∞∞∫−∞∞g^(k−τ)ei2πxkdkf^(τ)dτ=∫−∞∞∫−∞∞g^(k^)ei2πx(k^+τ)d(k^+τ)f^(τ)dτ=∫−∞∞(∫−∞∞g^(k^)ei2πxk^dk^)f^(τ)ei2πxτdτ=g(x)∫−∞∞f^(τ)ei2πxτdτ=g(x)f(x) Conguation F[f(x)‾]=∫−∞∞f(x)‾e−i2πkxdx=∫−∞∞f(x)ei2πkxdx‾=∫−∞∞f(x)e−i2π(−k)xdx‾=f^(−k)‾\begin{equation} \begin{split} \mathcal{F}\left[\overline{f(x)}\right] &= \int_{-\infty}^{\infty} \overline{f(x)} e^{-i 2\pi k x} dx \\ &= \overline{ \int_{-\infty}^{\infty} f(x) e^{i 2\pi k x} dx}\\ &= \overline{ \int_{-\infty}^{\infty} f(x) e^{-i 2\pi (-k) x} dx}\\ &= \overline{\hat{f}(-k)} \end{split} \end{equation} F[f(x)]=∫−∞∞f(x)e−i2πkxdx=∫−∞∞f(x)ei2πkxdx=∫−∞∞f(x)e−i2π(−k)xdx=f^(−k) If f(x)f(x)f(x) is a real-valued function, then f^(−k)=f^(k)‾\hat{f}(-k)=\overline{\hat{f}(k)}f^(−k)=f^(k), which is referred to as conjugate symmetric property. If f(x)f(x)f(x) is an imaginary-valued function, then f^(−k)=−f^(k)‾\hat{f}(-k)=- \overline{\hat{f}(k)}f^(−k)=−f^(k), which is referred to as conjugate anti-symmetric property. Same properties occur in the inverse Fourier transform. Common FT Pairs Time Domain Frequency Domain Description 111 δ(k)\delta(k)δ(k) δ(k)=∫−∞∞e−i2πkxdx\delta(k) = \int_{-\infty}^{\infty} e^{-i 2\pi k x} dxδ(k)=∫−∞∞e−i2πkxdx δ(x)\delta(x)δ(x) 111 1=∫−∞∞δ(x)e−i2πkxdx1 = \int_{-\infty}^{\infty} \delta(x) e^{-i 2\pi k x} dx1=∫−∞∞δ(x)e−i2πkxdx sgn(x)={1,x>00,x=0−1,x<0\mathrm{sgn}(x) = \left\{\begin{aligned} 1,x \gt 0\\ 0,x = 0\\-1,x \lt 0\\ \end{aligned}\right.sgn(x)=⎩⎨⎧1,x>00,x=0−1,x<0 1/(iπk)1/(i\pi k)1/(iπk) sgn(x)\mathrm{sgn}(x)sgn(x) is the sign function u(x)={1,x>01/2,x=00,x<0u(x) = \left\{\begin{aligned} 1,x \gt 0\\ 1/2,x = 0\\0,x \lt 0\\ \end{aligned}\right.u(x)=⎩⎨⎧1,x>01/2,x=00,x<0 12(δ(k)+1/(iπk))\frac{1}{2}\left(\delta(k) + 1/(i\pi k)\right)21(δ(k)+1/(iπk)) u(x)u(x)u(x) is the unit step function.u(x)=12(1+sgn(x))u(x)=\frac{1}{2}(1+\mathrm{sgn}(x))u(x)=21(1+sgn(x)) eiaxe^{i a x}eiax δ(k−a/(2π))\delta(k-a/(2\pi))δ(k−a/(2π)) Frequency Shifting cos(ax)\mathrm{cos}(ax)cos(ax) 12(δ(k−a/(2π))+δ(k+a/(2π)))\frac{1}{2}\left(\delta(k-a/(2\pi)) + \delta(k+a/(2\pi))\right)21(δ(k−a/(2π))+δ(k+a/(2π))) cos(ax)=12(eiax+e−iax)\mathrm{cos}(ax) =\\ \frac{1}{2}(e^{i a x} + e^{-i a x})cos(ax)=21(eiax+e−iax) sin(ax)\mathrm{sin}(ax)sin(ax) 12(δ(k−a/(2π))−δ(k+a/(2π)))\frac{1}{2}\left(\delta(k-a/(2\pi)) - \delta(k+a/(2\pi))\right)21(δ(k−a/(2π))−δ(k+a/(2π))) sin(ax)=12(eiax−e−iax)\mathrm{sin}(ax) =\\ \frac{1}{2}(e^{i a x} - e^{-i a x})sin(ax)=21(eiax−e−iax) rect(x)={1,∣x∣<1/21/2,∣x∣=1/20,∣x∣>1/2\mathrm{rect}(x)=\left\{\begin{aligned}1,\lvert x \rvert \lt 1/2\\ 1/2,\lvert x \rvert = 1/2\\ 0,\lvert x \rvert \gt 1/2\end{aligned}\right.rect(x)=⎩⎨⎧1,∣x∣<1/21/2,∣x∣=1/20,∣x∣>1/2 sinc(k)=sin(πk)/(πk)\mathrm{sinc}(k) = \mathrm{sin}(\pi k)/(\pi k)sinc(k)=sin(πk)/(πk) rect(x)=u(x+1/2)−u(x−1/2)\mathrm{rect}(x) =\\ u(x+1/2)-u(x-1/2)rect(x)=u(x+1/2)−u(x−1/2) sinc(x)\mathrm{sinc}(x)sinc(x) rect(k)={1,∣k∣<1/21/2,∣k∣=1/20,∣k∣>1/2\mathrm{rect}(k)=\left\{\begin{aligned}1,\lvert k \rvert \lt 1/2\\ 1/2,\lvert k \rvert = 1/2\\ 0,\lvert k \rvert \gt 1/2\end{aligned}\right.rect(k)=⎩⎨⎧1,∣k∣<1/21/2,∣k∣=1/20,∣k∣>1/2 ∑n=−∞∞δ(x−nΔx)\sum_{n=-\infty}^{\infty} \delta(x-n \Delta x)∑n=−∞∞δ(x−nΔx) 1Δx∑m=−∞∞δ(k−m/Δx)\frac{1}{\Delta x} \sum_{m=-\infty}^{\infty} \delta(k-m/\Delta x)Δx1∑m=−∞∞δ(k−m/Δx) Comb function Comb Function A comb function (a.k.a sampling function, sometimes referred to as impulse sampling) is a periodic function with the formula: SΔx(x)=∑n=−∞∞δ(x−nΔx)S_{\Delta{x}}(x) = \sum_{n=-\infty}^{\infty} \delta(x-n\Delta{x}) SΔx(x)=n=−∞∑∞δ(x−nΔx) where Δx\Delta{x}Δx is the given period. Fourier transform could be extended to [[generalized functions]] like δ(x)\delta(x)δ(x), which makes it possible to bypass the limitation of absolute integrable property on f(x)f(x)f(x). The key is to decompose periodic functions into Fourier series and use additive property of Fourier transform . The Fourier transform of SΔx(x)S_{\Delta{x}}(x)SΔx(x) is: F(SΔx(x))=∫−∞∞∑n=−∞∞δ(x−nΔx)e−i2πkxdx=∑n=−∞∞∫−∞∞δ(x−nΔx)e−i2πkxdx=∑n=−∞∞e−i2πknΔx\begin{equation} \begin{split} \mathcal{F}\left(S_{\Delta{x}}(x)\right) &= \int_{-\infty}^{\infty} \sum_{n=-\infty}^{\infty} \delta(x-n\Delta{x}) e^{-i 2\pi k x} dx\\ &= \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} \delta(x-n\Delta{x}) e^{-i 2\pi k x} dx\\ &= \sum_{n=-\infty}^{\infty} e^{-i 2\pi k n \Delta{x}}\\ \end{split} \end{equation} F(SΔx(x))=∫−∞∞n=−∞∑∞δ(x−nΔx)e−i2πkxdx=n=−∞∑∞∫−∞∞δ(x−nΔx)e−i2πkxdx=n=−∞∑∞e−i2πknΔx It is not obvious to see what the transform is, but we can prove it with Fourier series. Note that SΔx(x)S_{\Delta{x}}(x)SΔx(x) is a periodic function, its Fourier series is represented as: cn=1Δx∫ΔxSΔx(x)e−i2πnx/Δxdx=1Δx∫−Δx2Δx2∑m=−∞∞δ(x−mΔx)e−i2πnx/Δxdx=1Δx∫−Δx2Δx2δ(x)e−i2πnx/Δxdx=1Δx∫−∞∞δ(x)e−i2πnx/Δxdx=1ΔxSΔx(x)=∑n=−∞∞cnei2πnx/Δx=1Δx∑n=−∞∞ei2πnx/Δx\begin{equation} \begin{split} c_n &= \frac{1}{\Delta{x}} \int_{\Delta{x}} S_{\Delta{x}}(x) e^{-i 2\pi n x / \Delta{x}} dx\\ &= \frac{1}{\Delta{x}} \int_{-\frac{\Delta{x}}{2}}^{\frac{\Delta{x}}{2}} \sum_{m=-\infty}^{\infty} \delta(x-m\Delta{x}) e^{-i 2\pi n x / \Delta{x}} dx\\ &= \frac{1}{\Delta{x}} \int_{-\frac{\Delta{x}}{2}}^{\frac{\Delta{x}}{2}} \delta(x) e^{-i 2\pi n x / \Delta{x}} dx\\ &= \frac{1}{\Delta{x}} \int_{-\infty}^{\infty} \delta(x) e^{-i 2\pi n x / \Delta{x}} dx\\ &= \frac{1}{\Delta{x}}\\ S_{\Delta{x}}(x) &= \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n x / \Delta{x}}\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} e^{i 2\pi n x / \Delta{x}} \end{split} \end{equation} cnSΔx(x)=Δx1∫ΔxSΔx(x)e−i2πnx/Δxdx=Δx1∫−2Δx2Δxm=−∞∑∞δ(x−mΔx)e−i2πnx/Δxdx=Δx1∫−2Δx2Δxδ(x)e−i2πnx/Δxdx=Δx1∫−∞∞δ(x)e−i2πnx/Δxdx=Δx1=n=−∞∑∞cnei2πnx/Δx=Δx1n=−∞∑∞ei2πnx/Δx and Dirac delta function is expressed as: δ(x)=∫−∞∞ei2πxkdk=∫−∞∞e−i2πx(−k)dk=∫∞−∞e−i2πxk^d(−k^)=∫−∞∞e−i2πxk^d(k^)\begin{equation} \begin{split} \delta(x) &= \int_{-\infty}^{\infty} e^{i 2\pi x k} dk\\ &= \int_{-\infty}^{\infty} e^{-i 2\pi x (-k)} dk\\ &= \int_{\infty}^{-\infty} e^{-i 2\pi x \hat{k}} d(-\hat{k})\\ &= \int_{-\infty}^{\infty} e^{-i 2\pi x \hat{k}} d(\hat{k}) \end{split} \end{equation} δ(x)=∫−∞∞ei2πxkdk=∫−∞∞e−i2πx(−k)dk=∫∞−∞e−i2πxk^d(−k^)=∫−∞∞e−i2πxk^d(k^) Now we can apply Fourier transform to SΔx(x)S_{\Delta{x}}(x)SΔx(x): F(SΔx(x))=∫−∞∞∑n=−∞∞cnei2πnx/Δxe−i2πkxdx=1Δx∑n=−∞∞∫−∞∞ei2π(n/Δx−k)xdx=1Δx∑n=−∞∞δ(k−n/Δx)\begin{equation} \begin{split} \mathcal{F}\left(S_{\Delta{x}}(x)\right) &= \int_{-\infty}^{\infty} \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n x / \Delta{x}} e^{-i 2\pi k x} dx\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} e^{i 2\pi (n/\Delta{x} - k) x} dx\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \delta({k - n/\Delta{x}}) \end{split} \end{equation} F(SΔx(x))=∫−∞∞n=−∞∑∞cnei2πnx/Δxe−i2πkxdx=Δx1n=−∞∑∞∫−∞∞ei2π(n/Δx−k)xdx=Δx1n=−∞∑∞δ(k−n/Δx) so F(SΔx(x))\mathcal{F}\left(S_{\Delta{x}}(x)\right)F(SΔx(x)) is also a periodic function with the period as 1/Δx1/\Delta{x}1/Δx. Hence, we again apply Fourier series: cn=Δx∫1ΔxF(SΔx(x))e−i2πnkΔxdk=∫−12Δx12Δx∑m=−∞∞δ(k−m/Δx)e−i2πnkΔxdk=∫−12Δx12Δxδ(k)e−i2πnkΔxdk=∫−∞∞δ(k)e−i2πnkΔxdk=1F(SΔx(x))=∑n=−∞∞cnei2πnkΔx=∑n=−∞∞ei2πnkΔx=∑n=−∞∞e−i2πnkΔx\begin{equation} \begin{split} c_n &= \Delta{x} \int_{\frac{1}{\Delta{x}}} \mathcal{F}\left(S_{\Delta{x}}(x)\right) e^{-i 2\pi n k \Delta{x}} dk\\ &= \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \sum_{m=-\infty}^{\infty} \delta(k-m/\Delta{x}) e^{-i 2\pi n k \Delta{x}} dk\\ &= \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \delta(k) e^{-i 2\pi n k \Delta{x}} dk\\ &= \int_{-\infty}^{\infty} \delta(k) e^{-i 2\pi n k \Delta{x}} dk\\ &= 1\\ \mathcal{F}\left(S_{\Delta{x}}(x)\right) &= \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n k \Delta{x}}\\ &= \sum_{n=-\infty}^{\infty} e^{i 2\pi n k \Delta{x}}\\ &= \sum_{n=-\infty}^{\infty} e^{-i 2\pi n k \Delta{x}} \end{split} \end{equation} cnF(SΔx(x))=Δx∫Δx1F(SΔx(x))e−i2πnkΔxdk=∫−2Δx12Δx1m=−∞∑∞δ(k−m/Δx)e−i2πnkΔxdk=∫−2Δx12Δx1δ(k)e−i2πnkΔxdk=∫−∞∞δ(k)e−i2πnkΔxdk=1=n=−∞∑∞cnei2πnkΔx=n=−∞∑∞ei2πnkΔx=n=−∞∑∞e−i2πnkΔx and we can see that why Fourier transform of a comb function is still a comb function. FT of Periodic Functions With the help of Dirac delta function, Fourier transform could also be used on periodic functions. Considering a periodic function fT(x)f_{T}(x)fT(x) with period TTT, we can write it as a Fourier series: fT(x)=∑n=−∞∞cnei2πnx/T\begin{equation} f_{T}(x) = \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi n x/T} \end{equation} fT(x)=n=−∞∑∞cnei2πnx/T Let’s compute the Fourier transform: F(fT(x))=∫−∞∞fT(x)e−i2πkxdx=∑n=−∞∞cn∫−∞∞ei2π(n/T−k)xdx=∑n=−∞∞cnδ(n/T−k)=∑n=−∞∞cnδ(k−n/T)\begin{equation} \begin{split} \mathcal{F}(f_T(x)) &= \int_{-\infty}^{\infty} f_T(x) e^{-i 2\pi k x} dx\\ &= \sum_{n=-\infty}^{\infty} c_n \int_{-\infty}^{\infty} e^{i 2\pi (n/T-k) x} dx\\ &= \sum_{n=-\infty}^{\infty} c_n \delta(n/T - k)\\ &= \sum_{n=-\infty}^{\infty} c_n \delta(k - n/T) \end{split} \end{equation} F(fT(x))=∫−∞∞fT(x)e−i2πkxdx=n=−∞∑∞cn∫−∞∞ei2π(n/T−k)xdx=n=−∞∑∞cnδ(n/T−k)=n=−∞∑∞cnδ(k−n/T) so the Fourier transform of a periodic function is a sum of delta functions at the Fourier series frequencies and the weight of each delta function is the Fourier series coefficient, as we proved in the Fourier transform of the comb function. The inverse Fourier transform is: F(∑n=−∞∞cnδ(k−n/T))=∫−∞∞∑n=−∞∞cnδ(k−n/T)ei2πxkdk=∑n=−∞∞cn∫−∞∞δ(k−n/T)ei2πxkdk=∑n=−∞∞cnei2πxn/T=fT(x)\begin{equation} \begin{split} \mathcal{F}(\sum_{n=-\infty}^{\infty} c_n \delta(k - n/T)) &= \int_{-\infty}^{\infty} \sum_{n=-\infty}^{\infty} c_n \delta(k - n/T) e^{i 2\pi x k} dk\\ &= \sum_{n=-\infty}^{\infty} c_n \int_{-\infty}^{\infty} \delta(k - n/T) e^{i 2\pi x k} dk\\ &= \sum_{n=-\infty}^{\infty} c_n e^{i 2\pi x n/T}\\ &= f_T(x) \end{split} \end{equation} F(n=−∞∑∞cnδ(k−n/T))=∫−∞∞n=−∞∑∞cnδ(k−n/T)ei2πxkdk=n=−∞∑∞cn∫−∞∞δ(k−n/T)ei2πxkdk=n=−∞∑∞cnei2πxn/T=fT(x) Discrete-Time Fourier Transform TL;DR The discrete-time Fourier transform (DTFT) is a special case of the Fourier transform when the original function is sampled. The frequency domain of DTFT is continuous but periodic, with a period of 1/Δx1/\Delta{x}1/Δx, where Δx\Delta{x}Δx is the sampling interval. The DTFT of periodic sequence is a summation of weighted delta functions. Definition For a discrete sequence of real or complex values f[n]f[n]f[n] with all integers nnn, the discrete-time Fourier transform is defined as: f^dtft(k)=∑n=−∞∞f[n]e−i2πnkΔx\begin{equation} \hat{f}_{dtft}(k) = \sum_{n=-\infty}^{\infty} f[n] e^{-i2\pi n k \Delta{x}} \end{equation} f^dtft(k)=n=−∞∑∞f[n]e−i2πnkΔx where 1Δx\frac{1}{\Delta{x}}Δx1 is the sampling frequency in the time domain. This formula can be seen as a Fourier series (−-− and +++ signs are the same thing) and f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k) is actually a periodic function with period 1/Δx1/\Delta{x}1/Δx. The inverse discrete-time Fourier transform is defined as: f[n]=Δx∫1/Δxf^dtft(k)ei2πnkΔxdk\begin{equation} f[n] = \Delta{x} \int_{1/\Delta{x}} \hat{f}_{dtft}(k) e^{i 2\pi n k \Delta{x}} dk \end{equation} f[n]=Δx∫1/Δxf^dtft(k)ei2πnkΔxdk note the integral is only evaluated in a period. Here we use Fdtft\mathcal{F}_{dtft}Fdtft to represent the discrete-time Fourier transform (DTFT) and Fdtft−1\mathcal{F}_{dtft}^{-1}Fdtft−1 to represent the inverse discrete-time Fourier transform (iDTFT). Connection to the FT Modern computers can only handle discrete values instead of continuous signals. The most basic discretization technique is sampling. Considering a continuous function f(x)f(x)f(x) and an uniform sampling pattern SΔx(x)=∑n=−∞∞δ(x−nΔx)S_{\Delta{x}}(x) = \sum_{n=-\infty}^{\infty} \delta(x-n\Delta{x})SΔx(x)=∑n=−∞∞δ(x−nΔx), which is the [[#Comb Function]] we described above, the sampling process can be simulated as: fS(x)=f(x)SΔx(x)=∑n=−∞∞f(x)δ(x−nΔx)\begin{equation} \begin{split} f_S(x) &= f(x) S_{\Delta{x}}(x)\\ &= \sum_{n=-\infty}^{\infty} f(x)\delta(x-n\Delta{x}) \end{split} \end{equation} fS(x)=f(x)SΔx(x)=n=−∞∑∞f(x)δ(x−nΔx) The Fourier transform (using the definition) of the above function is: f^S(k)=∫−∞∞fS(x)e−i2πkxdx=∫−∞∞(∑n=−∞∞f(x)δ(x−nΔx))e−i2πkxdx=∑n=−∞∞(∫−∞∞f(x)e−i2πkxδ(x−nΔx)dx)=∑n=−∞∞f(nΔx)e−i2πknΔx, f[n]=f(nΔx)=∑n=−∞∞f[n]e−i2πnkΔx=f^dtft(k)\begin{equation} \begin{split} \hat{f}_S(k) &= \int_{-\infty}^{\infty} f_S(x) e^{-i 2\pi k x} dx\\ &= \int_{-\infty}^{\infty} \left( \sum_{n=-\infty}^{\infty} f(x)\delta(x-n\Delta{x}) \right) e^{-i 2\pi k x} dx\\ &= \sum_{n=-\infty}^{\infty} \left( \int_{-\infty}^{\infty} f(x) e^{-i 2\pi k x} \delta(x-n\Delta{x}) dx\right)\\ &= \sum_{n=-\infty}^{\infty} f(n\Delta{x})e^{-i 2\pi k n\Delta{x}},\ f[n]=f(n\Delta{x})\\ &=\sum_{n=-\infty}^{\infty} f[n] e^{-i 2\pi n k \Delta{x}}\\ &= \hat{f}_{dtft}(k) \end{split} \end{equation} f^S(k)=∫−∞∞fS(x)e−i2πkxdx=∫−∞∞(n=−∞∑∞f(x)δ(x−nΔx))e−i2πkxdx=n=−∞∑∞(∫−∞∞f(x)e−i2πkxδ(x−nΔx)dx)=n=−∞∑∞f(nΔx)e−i2πknΔx, f[n]=f(nΔx)=n=−∞∑∞f[n]e−i2πnkΔx=f^dtft(k) which is the definition of discrete-time Fourier transform. The next step is to prove the correctness of the inverse discrete-time Fourier transform: Δx∫1/Δxf^dtft(k)ei2πnkΔxdk=Δx∫−12Δx12Δxf^dtft(k)ei2πnkΔxdk=Δx∫−12Δx12Δxf^S(k)ei2πnkΔxdk=Δx∫−12Δx12Δx[∑m=−∞∞f[m]e−i2πmkΔx]ei2πnkΔxdk=Δx∑m=−∞∞f[m][∫−12Δx12Δxei2π(n−m)kΔxdk]=Δx∑m=−∞∞f[m][1i2π(n−m)Δxei2π(n−m)kΔx∣−12Δx12Δx]=Δx∑m=−∞∞f[m][1Δxsin(π(n−m))π(n−m)]=f[n]\begin{equation} \begin{split} \Delta{x} \int_{1/\Delta{x}} \hat{f}_{dtft}(k) e^{i 2\pi n k \Delta{x}} dk &= \Delta{x} \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \hat{f}_{dtft}(k) e^{i 2\pi n k \Delta{x}} dk\\ &= \Delta{x} \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \hat{f}_{S}(k) e^{i 2\pi n k \Delta{x}} dk\\ &=\Delta{x} \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \left[\sum_{m=-\infty}^{\infty} f[m] e^{-i 2\pi m k \Delta{x}}\right] e^{i 2\pi n k \Delta{x}} dk\\ &=\Delta{x} \sum_{m=-\infty}^{\infty} f[m] \left[ \int_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} e^{i 2\pi (n-m) k \Delta{x}} dk \right]\\ &= \Delta{x} \sum_{m=-\infty}^{\infty} f[m] \left[ \left. \frac{1}{i 2\pi (n-m) \Delta{x}} e^{i 2\pi (n-m) k \Delta{x}} \right|_{-\frac{1}{2\Delta{x}}}^{\frac{1}{2\Delta{x}}} \right]\\ &= \Delta{x} \sum_{m=-\infty}^{\infty} f[m] \left[\frac{1}{\Delta{x}} \frac{\sin(\pi(n-m))}{\pi (n-m)}\right]\\ &= f[n] \end{split} \end{equation} Δx∫1/Δxf^dtft(k)ei2πnkΔxdk=Δx∫−2Δx12Δx1f^dtft(k)ei2πnkΔxdk=Δx∫−2Δx12Δx1f^S(k)ei2πnkΔxdk=Δx∫−2Δx12Δx1[m=−∞∑∞f[m]e−i2πmkΔx]ei2πnkΔxdk=Δxm=−∞∑∞f[m][∫−2Δx12Δx1ei2π(n−m)kΔxdk]=Δxm=−∞∑∞f[m][i2π(n−m)Δx1ei2π(n−m)kΔx−2Δx12Δx1]=Δxm=−∞∑∞f[m][Δx1π(n−m)sin(π(n−m))]=f[n] note that sinc(x)=sin(πx)πx\mathrm{sinc}(x)=\frac{\sin(\pi x)}{\pi x}sinc(x)=πxsin(πx) only has a nonzero value 1 at x=0x=0x=0 and 0 for other integers. It is clear that f^S(k)\hat{f}_S(k)f^S(k) is a periodic function with period 1/Δx1/\Delta{x}1/Δx, we can see that from the convolution theorem of the Fourier transform: f^S(k)=F[f(x)SΔx(x)]=F[f(x)]∗F[SΔx(x)]=f^(k)∗S^Δx(k)=f^(k)∗1Δx∑n=−∞∞δ(k−n/Δx)=1Δx∑n=−∞∞f^(k)∗δ(k−n/Δx)=1Δx∑n=−∞∞∫−∞∞f^(τ)δ(k−τ−n/Δx)dτ=1Δx∑n=−∞∞∫−∞∞f^(τ)δ(−(τ−k+n/Δx))dτ=1Δx∑n=−∞∞∫−∞∞f^(τ)δ(τ−(k−n/Δx))dτ=1Δx∑n=−∞∞f^(k−n/Δx)\begin{equation} \begin{split} \hat{f}_S(k) &= \mathcal{F}\left[f(x)S_{\Delta{x}}(x)\right]\\ &= \mathcal{F}\left[f(x)\right] \ast \mathcal{F}\left[S_{\Delta{x}}(x)\right]\\ &= \hat{f}(k) \ast \hat{S}_{\Delta{x}}(k)\\ &= \hat{f}(k) \ast \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \delta(k - n/\Delta{x})\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \hat{f}(k) \ast \delta(k - n/\Delta{x})\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{f}(\tau) \delta(k - \tau - n/\Delta{x}) d\tau\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{f}(\tau) \delta(-(\tau - k + n/\Delta{x})) d\tau\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \int_{-\infty}^{\infty} \hat{f}(\tau) \delta(\tau - (k - n/\Delta{x})) d\tau\\ &= \frac{1}{\Delta{x}} \sum_{n=-\infty}^{\infty} \hat{f}(k-n/\Delta{x})\\ \end{split} \end{equation} f^S(k)=F[f(x)SΔx(x)]=F[f(x)]∗F[SΔx(x)]=f^(k)∗S^Δx(k)=f^(k)∗Δx1n=−∞∑∞δ(k−n/Δx)=Δx1n=−∞∑∞f^(k)∗δ(k−n/Δx)=Δx1n=−∞∑∞∫−∞∞f^(τ)δ(k−τ−n/Δx)dτ=Δx1n=−∞∑∞∫−∞∞f^(τ)δ(−(τ−k+n/Δx))dτ=Δx1n=−∞∑∞∫−∞∞f^(τ)δ(τ−(k−n/Δx))dτ=Δx1n=−∞∑∞f^(k−n/Δx) so the discrete-time Fourier transform of f[n]f[n]f[n] is a summation of shifted replicates of f^(k)\hat{f}(k)f^(k) in terms of a frequency period 1/Δx1/\Delta{x}1/Δx. Properties Common DTFT Pairs DTFT of Periodic Sequences Considering f[n]f[n]f[n] is an NNN-periodic sequence, we can write f[n]f[n]f[n] as fS(x)=fT(x)SΔx(x)f_S(x) = f_T(x) S_{\Delta{x}}(x)fS(x)=fT(x)SΔx(x), where NΔx=TN\Delta{x}=TNΔx=T. fS(x)f_S(x)fS(x) is also a periodic function with a period TTT: fS(x+T)=fT(x+T)SΔx(x+T)=fT(x)SΔx(x+NΔx)=fT(x)SΔx(x)=fS(x)f[n+N]=fS((n+N)Δx)=fS(nΔx+T)=fS(nΔx)=f[n]\begin{equation} \begin{split} f_{S}(x+T) &= f_{T}(x+T) S_{\Delta{x}}(x+T)\\ &= f_T(x) S_{\Delta{x}}(x+N\Delta{x})\\ &= f_T(x) S_{\Delta{x}}(x)\\ &= f_S(x)\\ f[n+N] &= f_S((n+N)\Delta{x})\\ &= f_S(n\Delta{x}+T)\\ &= f_S(n\Delta{x})\\ &= f[n] \end{split} \end{equation} fS(x+T)f[n+N]=fT(x+T)SΔx(x+T)=fT(x)SΔx(x+NΔx)=fT(x)SΔx(x)=fS(x)=fS((n+N)Δx)=fS(nΔx+T)=fS(nΔx)=f[n] For delta function, we also know that: ∫abf(x)δ(x−x0)dx==∫−∞∞f(x)(u(x−a)−u(x−b))δ(x−x0)dx=f(x0)(u(x0−a)−u(x0−b))={f(x0)x0∈(a,b)f(x0)/2x0=a,b0else\begin{equation} \begin{split} \int_{a}^{b} f(x) \delta(x-x_0) dx &=\\ &= \int_{-\infty}^{\infty} f(x) \left(u(x-a) - u(x-b)\right) \delta(x-x_0) dx\\ &= f(x_0)\left(u(x_0-a) - u(x_0-b)\right)\\ &= \begin{cases} f(x_0) & x_0 \in (a, b)\\ f(x_0)/2 & x_0=a,b\\ 0 & \text{else} \end{cases} \end{split} \end{equation} ∫abf(x)δ(x−x0)dx==∫−∞∞f(x)(u(x−a)−u(x−b))δ(x−x0)dx=f(x0)(u(x0−a)−u(x0−b))=⎩⎨⎧f(x0)f(x0)/20x0∈(a,b)x0=a,belse where u(x)u(x)u(x) is the unit step function. Remember that Fourier transform of a periodic function is a summation of delta functions weighted by the Fourier coefficients: cm=1T∫TfS(x)e−i2πmx/Tdx=1T∫−−Δx2T−−Δx2fT(x)SΔx(x)e−i2πmx/Tdx=1T∫−−Δx2T−−Δx2fT(x)∑n=−∞∞δ(x−nΔx)e−i2πmx/Tdx=1T∫−−Δx2T−−Δx2fT(x)∑n=0N−1δ(x−nΔx)e−i2πmx/Tdx=1T∑n=0N−1∫−−Δx2T−−Δx2fT(x)e−i2πmx/Tδ(x−nΔx)dx=1T∑n=0N−1fT(nΔx)e−i2πnmΔx/T=1T∑n=0N−1fT(nΔx)e−i2πnmΔx/(NΔx))=1T∑n=0N−1fT(nΔx)e−i2πnm/N)=1T∑n=0N−1f[n]e−i2πnm/Nf^dtft(k)=F(fS(x))=∑m=−∞∞cmδ(k−m/T)=1T∑m=−∞∞(∑n=0N−1f[n]e−i2πnm/N)δ(k−m/T)=1T∑m=−∞∞f^[m]δ(k−m/T)=1NΔx∑m=−∞∞f^[m]δ(k−mNΔx)\begin{equation} \begin{split} c_m &= \frac{1}{T} \int_{T} f_{S}(x) e^{-i 2\pi m x / T} dx\\ &= \frac{1}{T} \int_{-\frac{-\Delta{x}}{2}}^{T-\frac{-\Delta{x}}{2}} f_T(x) S_{\Delta{x}}(x) e^{-i 2\pi m x / T} dx\\ &= \frac{1}{T} \int_{-\frac{-\Delta{x}}{2}}^{T-\frac{-\Delta{x}}{2}} f_T(x) \sum_{n=-\infty}^{\infty} \delta(x-n\Delta{x}) e^{-i 2\pi m x / T} dx\\ &= \frac{1}{T} \int_{-\frac{-\Delta{x}}{2}}^{T-\frac{-\Delta{x}}{2}} f_T(x) \sum_{n=0}^{N-1} \delta(x-n\Delta{x}) e^{-i 2\pi m x / T} dx\\ &= \frac{1}{T} \sum_{n=0}^{N-1} \int_{-\frac{-\Delta{x}}{2}}^{T-\frac{-\Delta{x}}{2}} f_T(x) e^{-i 2\pi m x / T} \delta(x-n\Delta{x}) dx\\ &= \frac{1}{T} \sum_{n=0}^{N-1} f_T(n\Delta{x}) e^{-i 2\pi n m \Delta{x} / T}\\ &= \frac{1}{T} \sum_{n=0}^{N-1} f_T(n\Delta{x}) e^{-i 2\pi n m \Delta{x} / (N\Delta{x}))}\\ &= \frac{1}{T} \sum_{n=0}^{N-1} f_T(n\Delta{x}) e^{-i 2\pi n m / N)}\\ &= \frac{1}{T} \sum_{n=0}^{N-1} f[n] e^{-i 2\pi n m / N}\\ \hat{f}_{dtft}(k) &= \mathcal{F}(f_S(x))\\ &= \sum_{m=-\infty}^{\infty} c_m \delta(k - m/T)\\ &= \frac{1}{T} \sum_{m=-\infty}^{\infty} \left(\sum_{n=0}^{N-1} f[n] e^{-i 2\pi n m / N}\right) \delta(k - m/T)\\ &= \frac{1}{T} \sum_{m=-\infty}^{\infty} \hat{f}[m] \delta(k - m/T)\\ &= \frac{1}{N\Delta{x}} \sum_{m=-\infty}^{\infty} \hat{f}[m] \delta(k - \frac{m}{N\Delta{x}}) \end{split} \end{equation} cmf^dtft(k)=T1∫TfS(x)e−i2πmx/Tdx=T1∫−2−ΔxT−2−ΔxfT(x)SΔx(x)e−i2πmx/Tdx=T1∫−2−ΔxT−2−ΔxfT(x)n=−∞∑∞δ(x−nΔx)e−i2πmx/Tdx=T1∫−2−ΔxT−2−ΔxfT(x)n=0∑N−1δ(x−nΔx)e−i2πmx/Tdx=T1n=0∑N−1∫−2−ΔxT−2−ΔxfT(x)e−i2πmx/Tδ(x−nΔx)dx=T1n=0∑N−1fT(nΔx)e−i2πnmΔx/T=T1n=0∑N−1fT(nΔx)e−i2πnmΔx/(NΔx))=T1n=0∑N−1fT(nΔx)e−i2πnm/N)=T1n=0∑N−1f[n]e−i2πnm/N=F(fS(x))=m=−∞∑∞cmδ(k−m/T)=T1m=−∞∑∞(n=0∑N−1f[n]e−i2πnm/N)δ(k−m/T)=T1m=−∞∑∞f^[m]δ(k−m/T)=NΔx1m=−∞∑∞f^[m]δ(k−NΔxm) Note we use a trick with a range between −−Δx2-\frac{-\Delta{x}}{2}−2−Δx and T−−Δx2T-\frac{-\Delta{x}}{2}T−2−Δxto make sure only N points are available for δ(x−mΔx)\delta(x-m\Delta{x})δ(x−mΔx) in the integral equation, and introduce a new symbol f^[m]=∑n=0N−1f[n]e−i2πnm/N\hat{f}[m] = \sum_{n=0}^{N-1} f[n] e^{-i 2\pi n m / N}f^[m]=∑n=0N−1f[n]e−i2πnm/N, which is referred to as the discrete Fourier transform (explained latter). So the discrete-time Fourier transform of a periodic sequence can be seen as the Fourier transform of a periodic function with exact sampling interval (Δx=TN\Delta x=\frac{T}{N}Δx=NT). The result is a summation of delta functions, where the weights are the discrete Fourier transform values. f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k) converges to zero everywhere except at integer multiples of 1T\frac{1}{T}T1, known as harmonic frequencies. And the period 1/Δx1/\Delta{x}1/Δx still holds for f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k): f^dtft(k+1/Δx)=1NΔx∑m=−∞∞f^[m]δ(k+1/Δx−m/(NΔx))=1NΔx∑m=−∞∞f^[m]δ(k−(m−N)/(NΔx)), n=m−N=1NΔx∑n=−∞∞f^[n+N]δ(k−n/(NΔx))=1NΔx∑n=−∞∞f^[n]δ(k−n/(NΔx))=f^dtft(k)\begin{equation} \begin{split} \hat{f}_{dtft}(k + 1/\Delta{x}) &= \frac{1}{N\Delta{x}} \sum_{m=-\infty}^{\infty} \hat{f}[m] \delta(k + 1/\Delta{x} - m/(N\Delta{x}))\\ &= \frac{1}{N\Delta{x}} \sum_{m=-\infty}^{\infty} \hat{f}[m] \delta(k - (m-N)/(N\Delta{x})),\ n=m-N\\ &= \frac{1}{N\Delta{x}} \sum_{n=-\infty}^{\infty} \hat{f}[n+N] \delta(k - n/(N\Delta{x}))\\ &= \frac{1}{N\Delta{x}} \sum_{n=-\infty}^{\infty} \hat{f}[n] \delta(k - n/(N\Delta{x}))\\ &= \hat{f}_{dtft}(k) \end{split} \end{equation} f^dtft(k+1/Δx)=NΔx1m=−∞∑∞f^[m]δ(k+1/Δx−m/(NΔx))=NΔx1m=−∞∑∞f^[m]δ(k−(m−N)/(NΔx)), n=m−N=NΔx1n=−∞∑∞f^[n+N]δ(k−n/(NΔx))=NΔx1n=−∞∑∞f^[n]δ(k−n/(NΔx))=f^dtft(k) Substituting f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k) into the inverse discrete-time Fourier transform formula (note that ks=1Δxk_s=\frac{1}{\Delta{x}}ks=Δx1 is the sampling frequency), we can verify that: 1ks∫ksf^dtft(k)ei2πnkksdk=1ks∫−ks2Nks−ks2NksN∑m=−∞∞f^dft[m]δ(k−mks/N)ei2πnkksdk=1N∫−ks2Nks−ks2N∑m=0N−1f^dft[m]δ(k−mks/N)ei2πnkksdk=1N∑m=0N−1∫−ks2Nks−ks2Nf^dft[m]ei2πnkksδ(k−mks/N)dk=1N∑m=0N−1f^dft[m]ei2πnmN=1N∑m=0N−1∑l=0N−1f[l]e−i2πmlNei2πnmN=1N∑l=0N−1f[l]∑m=0N−1ei2π(n−l)mN=1N∑l=0N−1f[l][eiπN−1N(n−l)sin(π(n−l))sin(πn−lN)]=1N∑l=0N−1f[l]g[l]=f[n]\begin{equation} \begin{split} \frac{1}{k_s} \int_{k_s} \hat{f}_{dtft}(k) e^{i 2\pi n \frac{k}{k_s}} dk &= \frac{1}{k_s} \int_{-\frac{k_s}{2N}}^{k_s - \frac{k_s}{2N}} \frac{k_s}{N} \sum_{m=-\infty}^{\infty} \hat{f}_{dft}[m] \delta(k - m k_s / N) e^{i 2\pi n \frac{k}{k_s}} dk\\ &= \frac{1}{N} \int_{-\frac{k_s}{2N}}^{k_s - \frac{k_s}{2N}} \sum_{m=0}^{N-1} \hat{f}_{dft}[m] \delta(k - m k_s / N) e^{i 2\pi n \frac{k}{k_s}} dk\\ &= \frac{1}{N} \sum_{m=0}^{N-1} \int_{-\frac{k_s}{2N}}^{k_s - \frac{k_s}{2N}} \hat{f}_{dft}[m] e^{i 2\pi n \frac{k}{k_s}} \delta(k - m k_s / N) dk\\ &= \frac{1}{N} \sum_{m=0}^{N-1} \hat{f}_{dft}[m] e^{i 2\pi n \frac{m}{N}} \\ &= \frac{1}{N} \sum_{m=0}^{N-1} \sum_{l=0}^{N-1} f[l] e^{-i 2\pi m \frac{l}{N}} e^{i 2\pi n \frac{m}{N}}\\ &= \frac{1}{N} \sum_{l=0}^{N-1} f[l] \sum_{m=0}^{N-1} e^{i 2\pi (n-l) \frac{m}{N}}\\ &= \frac{1}{N} \sum_{l=0}^{N-1} f[l] \left[e^{i \pi \frac{N-1}{N} (n-l)} \frac{\sin(\pi (n-l))}{\sin(\pi \frac{n-l}{N})}\right]\\ &= \frac{1}{N} \sum_{l=0}^{N-1} f[l] g[l]\\ &= f[n] \end{split} \end{equation} ks1∫ksf^dtft(k)ei2πnkskdk=ks1∫−2Nksks−2NksNksm=−∞∑∞f^dft[m]δ(k−mks/N)ei2πnkskdk=N1∫−2Nksks−2Nksm=0∑N−1f^dft[m]δ(k−mks/N)ei2πnkskdk=N1m=0∑N−1∫−2Nksks−2Nksf^dft[m]ei2πnkskδ(k−mks/N)dk=N1m=0∑N−1f^dft[m]ei2πnNm=N1m=0∑N−1l=0∑N−1f[l]e−i2πmNlei2πnNm=N1l=0∑N−1f[l]m=0∑N−1ei2π(n−l)Nm=N1l=0∑N−1f[l][eiπNN−1(n−l)sin(πNn−l)sin(π(n−l))]=N1l=0∑N−1f[l]g[l]=f[n] where g[l]=eiπN−1N(n−l)sin(π(n−l))sin(πn−lN)g[l]=e^{i \pi \frac{N-1}{N} (n-l)} \frac{\sin(\pi (n-l))}{\sin(\pi \frac{n-l}{N})}g[l]=eiπNN−1(n−l)sin(πNn−l)sin(π(n−l)) only has a value NNN when l=nl=nl=n otherwise 0. Discrete-Time Fourier Series Definition For a N-periodic sequence f[n]f[n]f[n], it has the following series representation: cm=∑n=0N−1f[n]e−i2πmnNf[n]=1N∑m=0N−1cmei2πnmN\begin{equation} \begin{split} c_m &= \sum_{n=0}^{N-1} f[n] e^{-i2\pi\frac{mn}{N}}\\ f[n] &= \frac{1}{N} \sum_{m=0}^{N-1} c_m e^{i2\pi \frac{nm}{N}} \end{split} \end{equation} cmf[n]=n=0∑N−1f[n]e−i2πNmn=N1m=0∑N−1cmei2πNnm Here cmc_mcm’s are the DTFS coefficients and they are periodic with period NNN. The series representation of f[n]f[n]f[n] is called the inverse DTFS. Connection to the DTFT As we proved in DTFT of Periodic Sequences, DTFS is actually a discretized version of DTFT of periodic sequences. Given a signal f(x)f(x)f(x) with period TTT and its N-periodic sequence f[n]f[n]f[n], we first compute its DTFS coefficients cmc_mcm, then the DTFT f^dtft(k)\hat{f}_{dtft}(k)f^dtft(k) of f[n]f[n]f[n] is represented by: f^dtft(k)=1T∑m=−∞∞cmδ(k−m/T)\begin{equation} \begin{split} \hat{f}_{dtft}(k) &= \frac{1}{T} \sum_{m=-\infty}^{\infty} c_m \delta(k - m/T)\\ \end{split} \end{equation} f^dtft(k)=T1m=−∞∑∞cmδ(k−m/T) The original sequence f[n]f[n]f[n] can be recovered from the inverse DTFT, which is also the inverse DTFS. Discrete Fourier Transform TL;DR I found that Professor Jeffery Fessler’s note gives a clear picture of relations of different transforms. In general, the DFT is a simpified way to transform sampled time-domain sequences to sampled frequency-domain sequences. Definition For a sequence of NNN complex numbers {f[n]}n=0N−1{\{f[n]\}}_{n=0}^{N-1}{f[n]}n=0N−1, the discrete Fourier Transform transforms the sequence into another sequence of complex numbers {f^[m]}m=0N−1{\{\hat{f}[m]\}}_{m=0}^{N-1}{f^[m]}m=0N−1: f^[m]=∑n=0N−1f[n]e−i2πknN\begin{equation} \hat{f}[m] = \sum_{n=0}^{N-1} f[n] e^{-i 2\pi k \frac{n}{N}} \end{equation} f^[m]=n=0∑N−1f[n]e−i2πkNn The inverse discrete Fourier transform is given by: f[n]=1N∑m=0N−1f^[m]ei2πnmN\begin{equation} f[n] = \frac{1}{N} \sum_{m=0}^{N-1} \hat{f}[m] e^{i 2\pi n \frac{m}{N}} \end{equation} f[n]=N1m=0∑N−1f^[m]ei2πnNm Connection to the DTFT and DTFS The DFT can be motivated by the sampling of the DTFT. Consider sampling the DTFT with Δk\Delta{k}Δk sampling interval such that one period was sampled with exactly NNN points (NΔk=1/ΔxN \Delta{k} = 1/\Delta{x}NΔk=1/Δx): f^dtft[m]=f^dtft(mΔk)=∑n=−∞∞f[n]e−i2πnΔxmΔk=∑n=−∞∞f[n]e−i2πnmN\begin{equation} \begin{split} \hat{f}_{dtft}[m] &= \hat{f}_{dtft}(m\Delta{k})\\ &= \sum_{n=-\infty}^{\infty} f[n] e^{-i2\pi n\Delta{x} m\Delta{k}}\\ &= \sum_{n=-\infty}^{\infty} f[n] e^{-i2\pi \frac{nm}{N}} \end{split} \end{equation} f^dtft[m]=f^dtft(mΔk)=n=−∞∑∞f[n]e−i2πnΔxmΔk=n=−∞∑∞f[n]e−i2πNnm Let’s break f[n]f[n]f[n] into NNN-length segments such as f[0]...f[N−1]f[0] ... f[N-1]f[0]...f[N−1], f[N]...f[2N−1]f[N] ... f[2N-1]f[N]...f[2N−1], let n=l−jNn=l-jNn=l−jN where l∈[0,N−1]l \in [0, N-1]l∈[0,N−1] and j∈[−∞,∞]j \in [-\infty, \infty]j∈[−∞,∞], then f^dtft[m]\hat{f}_{dtft}[m]f^dtft[m] can be redefined with NNN-point periodic superposition fps[l]=∑j=−∞∞f[l−jN]f_{ps}[l] = \sum_{j=-\infty}^{\infty} f[l-jN]fps[l]=∑j=−∞∞f[l−jN]: f^dtft[m]=∑n=−∞∞f[n]e−i2πnmN=∑j=−∞∞∑l=0N−1f[l−jN]e−i2π(l−jN)mN=∑j=−∞∞∑l=0N−1f[l−jN]e−i2πlmN=∑l=0N−1(∑j=−∞∞f[l−jN])e−i2πlmN=∑l=0N−1fps[l]e−i2πlmN\begin{equation} \begin{split} \hat{f}_{dtft}[m] &= \sum_{n=-\infty}^{\infty} f[n] e^{-i2\pi \frac{nm}{N}}\\ &= \sum_{j=-\infty}^{\infty} \sum_{l=0}^{N-1} f[l-jN] e^{-i2\pi (l-jN) \frac{m}{N}}\\ &= \sum_{j=-\infty}^{\infty} \sum_{l=0}^{N-1} f[l-jN] e^{-i2\pi l \frac{m}{N}}\\ &= \sum_{l=0}^{N-1} \left(\sum_{j=-\infty}^{\infty} f[l-jN]\right) e^{-i2\pi l \frac{m}{N}}\\ &= \sum_{l=0}^{N-1} f_{ps}[l] e^{-i2\pi l \frac{m}{N}} \end{split} \end{equation} f^dtft[m]=n=−∞∑∞f[n]e−i2πNnm=j=−∞∑∞l=0∑N−1f[l−jN]e−i2π(l−jN)Nm=j=−∞∑∞l=0∑N−1f[l−jN]e−i2πlNm=l=0∑N−1(j=−∞∑∞f[l−jN])e−i2πlNm=l=0∑N−1fps[l]e−i2πlNm Obviously, fps[l]f_{ps}[l]fps[l] is a NNN-periodic sequence. Since fps[l]f_{ps}[l]fps[l] is NNN-periodic, it can be represented with the DTFS: cm=∑l=0N−1fps[l]e−i2πlmNfps[l]=1N∑m=0N−1cmei2πmlN\begin{equation} \begin{split} c_m &= \sum_{l=0}^{N-1} f_{ps}[l] e^{-i 2\pi \frac{lm}{N}}\\ f_{ps}[l] &= \frac{1}{N} \sum_{m=0}^{N-1} c_m e^{i 2\pi \frac{ml}{N}} \end{split} \end{equation} cmfps[l]=l=0∑N−1fps[l]e−i2πNlm=N1m=0∑N−1cmei2πNml Comparing the DTFS coefficients and the above DTFT samples, we see that: cm=f^dtft[m]\begin{equation} \begin{split} c_m &= \hat{f}_{dtft}[m] \end{split} \end{equation} cm=f^dtft[m] Thus, we can recover the periodic sequence fps[l]f_{ps}[l]fps[l] from those DTFT samples with the inverse DTFS: fps[l]=1N∑m=0N−1f^dtft[m]ei2πmlN\begin{equation} \begin{split} f_{ps}[l] &= \frac{1}{N} \sum_{m=0}^{N-1} \hat{f}_{dtft}[m] e^{i 2\pi \frac{ml}{N}} \end{split} \end{equation} fps[l]=N1m=0∑N−1f^dtft[m]ei2πNml However, this recovery does not ensure that we can recover the original sequence f[n]f[n]f[n] with those DTFT samples. fps[l]f_{ps}[l]fps[l] is a sum of shifted replicates of f[n]f[n]f[n]. Thus there is no perfect reconstruction for non time-limited sequences since time-domain replicates overlap and aliasing occurs. There is a special case where time-domain replicates do not overlap. Considering a time-limited sequence f[n]f[n]f[n] with duration LLL which it has nonzero values only in the interval 0,...,L−10,..., L-10,...,L−1. If N≥LN \ge LN≥L, then there is no overlap in the replicates. The original sequence f[n]f[n]f[n] can be recovered from fps[l]f_{ps}[l]fps[l]: f[n]={fps[n]n∈[0,L−1)0otherwise\begin{equation} f[n] = \begin{cases} f_{ps}[n] & n \in [0, L-1)\\ 0 & \text{otherwise} \end{cases} \end{equation} f[n]={fps[n]0n∈[0,L−1)otherwise In fact, if f[n]f[n]f[n] is time-limited, the DTFT samples simplifies to a limited summation: f^[m]=f^dtft[m]=∑n=0L−1f[n]e−i2πnmM=∑n=0N−1f[n]e−i2πnmM\begin{equation} \begin{split} \hat{f}[m] &= \hat{f}_{dtft}[m]\\ &= \sum_{n=0}^{L-1} f[n] e^{-i2\pi n \frac{m}{M}}\\ &= \sum_{n=0}^{N-1} f[n] e^{-i2\pi n \frac{m}{M}} \end{split} \end{equation} f^[m]=f^dtft[m]=n=0∑L−1f[n]e−i2πnMm=n=0∑N−1f[n]e−i2πnMm where f[n]=0f[n]=0f[n]=0 for n≥Ln \ge Ln≥L and the above formula is called DFT and f[n]f[n]f[n] can be recovered from the inverse DFT formula: f[n]={1N∑m=0N−1f^[m]ei2πmnNn=0,...,N−10otherwise\begin{equation} \begin{split} f[n] = \begin{cases} \frac{1}{N} \sum_{m=0}^{N-1} \hat{f}[m] e^{i 2\pi \frac{mn}{N}} & n=0, ..., N-1\\ 0 & \text{otherwise} \end{cases} \end{split} \end{equation} f[n]={N1∑m=0N−1f^[m]ei2πNmn0n=0,...,N−1otherwise Properties TODO Common DFT Pairs TODO Non-uniform Fast Fourier Transform TODO
出于工作需要,我要开始系统学习c++了。目前我的主力台式机是Linux系统,最常用的编辑器是VS Code,所以想要得到一个比较完整的C/C++工程方案,似乎学习Makefile的相关规则是必不可少的。本文的内容主要源于makefile tutorial by example(感谢作者大大)。 为什么要使用makefile 对于大型C/C++项目而言,完整的程序功能往往是很多子模块组合得到的。试想开发者改变或添加了几个新的功能,如果手动重新编译整个项目(比如在shell里输入一长串的g++命令以及一堆的lib),不仅费时费力、还容易出错。make工具正是用来处理这种某些模型源代码文件需要重新编译的情况,它所依赖的配置文件就是Makefile。简而言之,make通过预先配置好的Makefile,按一定的规则负责处理C/C++项目的编译,从而得到最终的可执行程序。目前,大多数开源项目都会提供Makefile,开发者可以很方便的调用make命令编译整个开源项目。 当然,make并不是唯一用于处理C/C++项目编译的解决方案,一些替代的解决方案包括CMake、Bazel等等。在Windows平台上,Visual Studio也有其内置的build工具。 本篇所使用的make是指GNU make,在Linux和MacOS上是默认安装的make实现。 基本语法 现在我们来创建一个最简单的Makefile。首先在任何目录下创建一个名为Makefile的文件(注意文件名就是Makefile,大些的M,也没有类似.txt的后缀),然后在其中输入这些文本: hello: echo "hello make" 注意,Makefile的缩进只能是Tab,不可以用空格,不然make会报缺失分隔符错误,要小心你的编辑器是不是自动将Tab缩进转换成了4个空格。接下来在该目录下的terminal里运行make命令,输出如下: $ makeecho "hello make"hello make 表明需要执行echo "hello make"命令,执行结果是hello make。 Makefile文件是由一系列rule组成的,一个rule定义如下: targets: prerequisties command command command targets是本次操作希望达成的目标,它实质是一系列文件名,相互之间用空格隔开,通常只有一个文件名,比如例子中的hello文件,而每一行的command则是用来达成targets的手段,通过执行command,我们希望在所有命令执行完成后能够产生targets所规定的文件。prerequisties同样也是一系列文件名,相互之间用空格隔开,它表示执行命令前这些文件应当已经存在了,有些类似程序的依赖概念。一个rule可以简单理解为在prerequisties存在的情况下,运行一系列command,希望最终能够得到targets中的文件。 基本执行关系 blah: blah.o gcc blah.o -o blah # Runs thirdblah.o: blah.c gcc -c blah.c -o blah.o # Runs secondblah.c: echo "int main() { return 0; }" > blah.c # Runs firstclean: rm -f blah blah.o blah.c 上面是一个有四个rule的Makefile文件,不指定target情况下,运行make会默认以第一个rule中的targets为目标,从而进行如下操作: 第一个rule想产生名为blah的文件,但是需要文件blah.o,所以跳转到第二个rule 第二个rule又需要文件blah.c,所以跳转到第三个rule 第三个rule为了产生blah.c,执行命令产生一个blah.c文件,接着返回第二个rule 现在有了blah.c,第二个rule编译生成blah.o二进制文件,接着返回第一个rule 现在有了blah.o,第一个rule链接生成blah可执行文件,任务完成 现在我们的目录下已经有blah文件了,所以再运行一次make不会产生任何效果。注意到blah目标的产生与clean无关,所以clean所代表的rule不会执行,不过可以显式运行make clean命令来调用该rule规定的清理文件操作。 最后需要注意的是,make似乎只会检查一次targets和prerequisties是否存在,至于调用之后是不是成功产生了目标文件,make是不会去检查的,而是默认成功了。 TODO
我经常会在线性代数教材以及论坛讨论中看到不建议使用逆矩阵A−1\mathbf{A}^{-1}A−1来求解线性方程Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}Ax=b,尽管我一直遵循这样的原则(实践中逆矩阵确实不够稳定),但仍然不明白不使用逆矩阵的理由。本文总结了我在网上看到的一些关于逆矩阵的讨论,希望能解释为什么要少用逆矩阵来求解线性方程。 数学原理 解线性方程组(逆矩阵) 考虑求解线性方程组中的x∈Rn\mathbf{x} \in \mathbb{R}^nx∈Rn: Ax=b\begin{equation} \mathbf{A} \mathbf{x} = \mathbf{b} \end{equation} Ax=b 其中A∈Rn×n,b∈Rn\mathbf{A} \in \mathbb{R}^{n \times n},\mathbf{b} \in \mathbb{R}^nA∈Rn×n,b∈Rn。 显然一种简单直观的求解方法是计算A\mathbf{A}A的逆矩阵A−1\mathbf{A}^{-1}A−1: x=A−1b\begin{equation} \mathbf{x} = \mathbf{A}^{-1} \mathbf{b} \end{equation} x=A−1b 求解逆矩阵通常需要2n32n^32n3次浮点数操作(floating-point operations,flops),此外计算A−1b\mathbf{A}^{-1}\mathbf{b}A−1b需要2n22n^22n2 flops,总共2n3+2n22n^3+2n^22n3+2n2 flops。 不过实践中不推荐逆矩阵求解,因为可能在准确性上产生问题。求解线性方程更常用的是分解方法,比如接下来的LU分解。 解线性方程组(LU分解) LU分解(lower–upper decomposition)是一种常用的矩阵分解方法,基本思路是利用高斯消元法将A\mathbf{A}A转化为上三角矩阵U\mathbf{U}U,主要过程如下所示: [×××××××××]⏞A→[×××0××0××]⏞L1A→[×××0××00×]⏞L2L1A\begin{equation}\nonumber \overbrace{\begin{bmatrix} \times & \times & \times\\ \times & \times & \times\\ \times & \times & \times\\ \end{bmatrix}}^{\mathbf{A}} \rightarrow \overbrace{\begin{bmatrix} \times & \times & \times\\ 0 & \times & \times\\ 0 & \times & \times\\ \end{bmatrix}}^{\mathbf{L}_1\mathbf{A}} \rightarrow \overbrace{\begin{bmatrix} \times & \times & \times\\ 0 & \times & \times\\ 0 & 0 & \times\\ \end{bmatrix}}^{\mathbf{L}_2\mathbf{L}_1\mathbf{A}} \end{equation} ×××××××××A→×00××××××L1A→×00××0×××L2L1A 其中L1,L2\mathbf{L}_1,\mathbf{L}_2L1,L2是一系列下三角矩阵,用来表示第n行加上第n-1行乘以某个数的消元操作,所以矩阵A\mathbf{A}A可以消元成为一个上三角矩阵U\mathbf{U}U: Ln−1Ln−2⋯L1A=U\begin{equation} \mathbf{L}_{n-1}\mathbf{L}_{n-2} \cdots \mathbf{L}_{1}\mathbf{A} = \mathbf{U} \end{equation} Ln−1Ln−2⋯L1A=U 矩阵A\mathbf{A}A的LU分解为: A=(L1−1⋯Ln−2−1Ln−1−1)U=LU\begin{equation} \begin{split} \mathbf{A} &= \left(\mathbf{L}_{1}^{-1} \cdots \mathbf{L}_{n-2}^{-1} \mathbf{L}_{n-1}^{-1}\right) \mathbf{U}\\ &= \mathbf{L} \mathbf{U} \end{split} \end{equation} A=(L1−1⋯Ln−2−1Ln−1−1)U=LU 下三角矩阵的逆依然是下三角矩阵,并且一系列下三角矩阵的乘积也是一个三角矩阵,因此L\mathbf{L}L是下三角矩阵。此外,虽然涉及到求Li\mathbf{L}_iLi的逆,但是这一步骤实施起来是非常简单的,Li\mathbf{L}_iLi主对角线元素不变、下三角元素全部乘以-1即可得到Li−1\mathbf{L}_i^{-1}Li−1。因为Li\mathbf{L}_iLi每次都只对第iii列上的主元做消元操作,所以下三角部分只会在第iii列存在不为0的元素。以3×33 \times 33×3矩阵为例,L1\mathbf{L}_1L1可以写成如下的形式: [100a10b01]\begin{equation}\nonumber \begin{bmatrix} 1 & 0 & 0\\ a & 1 & 0\\ b & 0 & 1\\ \end{bmatrix} \end{equation} 1ab010001 很自然的可以证明L1−1\mathbf{L}_1^{-1}L1−1为: [100−a10−b01]\begin{equation}\nonumber \begin{bmatrix} 1 & 0 & 0\\ -a & 1 & 0\\ -b & 0 & 1\\ \end{bmatrix} \end{equation} 1−a−b010001 上述步骤的LU分解需要的浮点数操作近似为23n3\frac{2}{3}n^332n3 flops。 接下来我们看看如何利用LU分解解线性方程组。约定A\b\mathbf{A} \backslash \mathbf{b}A\b代表Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}Ax=b的解x\mathbf{x}x(这也是MATLAB求解线性方程组的代码),按照以下步骤求解出x\mathbf{x}x: LU=AL\b=yU\y=x\begin{equation} \begin{split} \mathbf{L} \mathbf{U} &= \mathbf{A}\\ \mathbf{L} \backslash \mathbf{b} &= \mathbf{y}\\ \mathbf{U} \backslash \mathbf{y} &= \mathbf{x}\\ \end{split} \end{equation} LUL\bU\y=A=y=x 所以先求解Ly=b\mathbf{L}\mathbf{y}=\mathbf{b}Ly=b得到中间变量y\mathbf{y}y,再求解Ux=y\mathbf{U}\mathbf{x}=\mathbf{y}Ux=y得到x\mathbf{x}x,看上去比直接求解Ax=b\mathbf{A}\mathbf{x} = \mathbf{b}Ax=b复杂了不少,为什么要这么做呢? 原因在于L\mathbf{L}L和U\mathbf{U}U作为三角矩阵能显著简化线性方程组求解,比如Ly=b\mathbf{L}\mathbf{y}=\mathbf{b}Ly=b,我们有: [1⋮⋱li,1⋯1⋮⋮⋮⋱ln,1⋯⋯⋯1][y1⋮yi⋮yn]=[b1⋮bi⋮bn]\begin{equation} \begin{bmatrix} 1 & & & & \\ \vdots&\ddots& & &\\ l_{i,1}&\cdots&1& &\\ \vdots&\vdots&\vdots&\ddots&\\ l_{n,1}&\cdots&\cdots&\cdots&1\\ \end{bmatrix} \begin{bmatrix} y_1\\ \vdots\\ y_i\\ \vdots\\ y_n\\ \end{bmatrix} = \begin{bmatrix} b_1\\ \vdots\\ b_i\\ \vdots\\ b_n\\ \end{bmatrix} \end{equation} 1⋮li,1⋮ln,1⋱⋯⋮⋯1⋮⋯⋱⋯1y1⋮yi⋮yn=b1⋮bi⋮bn 可以用递归方法求解y\mathbf{y}y: y1=b1yi=bi−li,1y1−⋯−li,i−1yi−1\begin{equation} \begin{split} y_1 &= b_1\\ y_i &= b_i - l_{i,1}y_1 - \cdots - l_{i, i-1}y_{i-1} \end{split} \end{equation} y1yi=b1=bi−li,1y1−⋯−li,i−1yi−1 这种思路叫forward substitution,即从上往下求解y\mathbf{y}y。之后可以用相似的方式从下往上求解x\mathbf{x}x,叫backward substitution。forward substitution的成本为(每一个元素各需要i−1i-1i−1次乘法和减法)n2−nn^2-nn2−n flops,同理可得backward substitution的计算成本为n2+nn^2+nn2+n flops,所以用LU分解求解线性方程组的总计算成本为23n3+2n2\frac{2}{3}n^3+2n^232n3+2n2 flops。 解线性方程组(QR分解) QR分解(QR decomposition)也常用于求解线性方程组。QR分解将A∈Rn×n\mathbf{A} \in \mathbb{R}^{n \times n}A∈Rn×n分解为正交矩阵Q∈Rn×n\mathbf{Q} \in \mathbb{R}^{n \times n}Q∈Rn×n和上三角矩阵R∈Rn×n\mathbf{R} \in \mathbb{R}^{n \times n}R∈Rn×n: A=QR\begin{equation} \mathbf{A} = \mathbf{Q} \mathbf{R} \end{equation} A=QR 对于正交矩阵Q\mathbf{Q}Q,它的逆矩阵Q−1\mathbf{Q}^{-1}Q−1实际就是QT\mathbf{Q}^TQT本身,所以对于线性方程组可以简化为: Ax=bQRx=bRx=QTb\begin{equation} \begin{split} \mathbf{A} \mathbf{x} &= \mathbf{b}\\ \mathbf{Q}\mathbf{R}\mathbf{x} &= \mathbf{b}\\ \mathbf{R}\mathbf{x} &= \mathbf{Q}^T\mathbf{b}\\ \end{split} \end{equation} AxQRxRx=b=b=QTb 因为R\mathbf{R}R是上三角矩阵,对于x\mathbf{x}x的求解可以采用LU分解中介绍的backward substitution方式求解,从而避免了求逆的操作。 QR分解可以采用Gram-Schmidt正交化过程计算得到,这一过程与正交的几何表示密切相关,因此易于理解,不过这种正交化过程容易数值不稳定,因此很少直接在实践中使用。实践中更常用的是使用Givens rotations方法来实现QR分解。 Givens rotations的步骤与LU分解相似,对于矩阵A\mathbf{A}A,采用一系列旋转矩阵Gij\mathbf{G}_i^jGij将A\mathbf{A}A逐步转化为上三角矩阵: [×××××××××]⏞A→[××××××0××]⏞G31A→[×××0××0××]⏞G21G31A→[×××0××00×]⏞G32G21G31A\begin{equation}\nonumber \overbrace{\begin{bmatrix} \times & \times & \times\\ \times & \times & \times\\ \times & \times & \times\\ \end{bmatrix}}^{\mathbf{A}} \rightarrow \overbrace{\begin{bmatrix} \times & \times & \times\\ \times & \times & \times\\ 0 & \times & \times\\ \end{bmatrix}}^{\mathbf{G}_3^1\mathbf{A}} \rightarrow \overbrace{\begin{bmatrix} \times & \times & \times\\ 0 & \times & \times\\ 0 & \times & \times\\ \end{bmatrix}}^{\mathbf{G}_2^1\mathbf{G}_3^1\mathbf{A}} \rightarrow \overbrace{\begin{bmatrix} \times & \times & \times\\ 0 & \times & \times\\ 0 & 0 & \times\\ \end{bmatrix}}^{\mathbf{G}_3^2\mathbf{G}_2^1\mathbf{G}_3^1\mathbf{A}} \end{equation} ×××××××××A→××0××××××G31A→×00××××××G21G31A→×00××0×××G32G21G31A 对于需要置0的第aija_{ij}aij个元素(i>j)(i \gt j)(i>j),Gij\mathbf{G}_i^jGij构造为: Gij=[1⋯0⋯0⋯0⋮⋱⋮⋮⋮0⋯cjj⋯−sji⋯0⋮⋮⋱⋮⋮0⋯sij⋯cii⋯0⋮⋮⋮⋱⋮0⋯0⋯0⋯1]\begin{equation} \mathbf{G}_i^j = \begin{bmatrix} 1 & \cdots & 0 & \cdots & 0 & \cdots & 0\\ \vdots & \ddots & \vdots & & \vdots & & \vdots\\ 0 & \cdots & c_{jj} & \cdots & -s_{ji} & \cdots & 0\\ \vdots & & \vdots & \ddots & \vdots & & \vdots\\ 0 & \cdots & s_{ij} & \cdots & c_{ii} & \cdots & 0\\ \vdots & & \vdots & & \vdots & \ddots & \vdots\\ 0 & \cdots & 0 & \cdots & 0 & \cdots & 1\\ \end{bmatrix} \end{equation} Gij=1⋮0⋮0⋮0⋯⋱⋯⋯⋯0⋮cjj⋮sij⋮0⋯⋯⋱⋯⋯0⋮−sji⋮cii⋮0⋯⋯⋯⋱⋯0⋮0⋮0⋮1 其中cjj=cii=cos(θ)c_{jj} = c_{ii} = \cos(\theta)cjj=cii=cos(θ),sij=sji=sin(θ)s_{ij} = s_{ji} = \sin(\theta)sij=sji=sin(θ)。显而易见这是一个对ijijij平面上的向量进行旋转的矩阵,对于任意向量,若想将其在第iii个轴上的分量置0,则需要将其按θ\thetaθ角旋转至对应的坐标轴上,旋转变化不会改变除第iii、jjj行外的其他元素,因此通过反复运用旋转变换,可以将A\mathbf{A}A的下三角区域置0,有: Gnn−1⋯Gn−11Gn1A=R\begin{equation} \mathbf{G}_n^{n-1} \cdots \mathbf{G}_{n-1}^1 \mathbf{G}_n^1 \mathbf{A} = \mathbf{R} \end{equation} Gnn−1⋯Gn−11Gn1A=R 又因为旋转矩阵本身就是正交矩阵,所以其逆就是矩阵转置本身,所以有: A=(Gnn−1⋯Gn−11Gn1)TR=QR\begin{equation} \begin{split} \mathbf{A} &= \left(\mathbf{G}_n^{n-1} \cdots \mathbf{G}_{n-1}^1 \mathbf{G}_n^1\right)^T \mathbf{R}\\ &= \mathbf{Q}\mathbf{R} \end{split} \end{equation} A=(Gnn−1⋯Gn−11Gn1)TR=QR Givens rotations实现的QR分解的复杂度大约是73n3\frac{7}{3}n^337n3 flops(我也不确定),优势在于可以较为容易的实现并行化操作。 复杂度分析 从上面的分析中可以看出,LU分解大概需要23n3\frac{2}{3}n^332n3 flops,逆矩阵大概需要2n32n^32n3 flops,是LU分解的3倍,因此线性方程组求解采用逆矩阵方法会更慢一些。尽管逆矩阵求解存在更快的算法,但是通常来说LU分解还是更有效率,所以一般而言还是应当直接求解线性方程组而不是计算逆矩阵。 此外,逆矩阵计算往往需要更大的内存。如果A\mathbf{A}A是稀疏矩阵(大多数元素是0),则可以使用更加有效率的稀疏结构存储。但是A−1\mathbf{A}^{-1}A−1通常是稠密的,因此存储逆矩阵需要更多的内存。 准确性分析 逆矩阵的另一个缺点在于求解的误差更大,特别在A\mathbf{A}A是病态矩阵的情况下尤为明显。 数值分析领域常用前向误差(forward error)和反向误差(backward error)来分析误差。比如对于函数y=f(x)y=f(x)y=f(x),用y^\hat{y}y^来近似输出yyy,那么前向误差被定义为∣y^−y∣|\hat{y}-y|∣y^−y∣(这似乎就是我们常说的误差),而反向误差则衡量问题的敏感性,即为了产生近似解,输入数据同真实数据的偏移情况。y^\hat{y}y^的反向误差是满足y^=f(x+Δx)\hat{y}=f(x+\Delta x)y^=f(x+Δx)的最小Δx\Delta xΔx。 前向误差和反向误差联合起来可以分析一个问题是良态(well-conditioned)还是病态(ill-conditioned)的。对于良态问题,输入较小的改变会导致输出产生较小的改变;而病态问题,输入较小的改变会导致输出较大的改变。条件数(condition number)就是用来判断良态和病态的工具,条件数越大,输入很小的差异就会产生较大的输出误差,更倾向于病态问题。矩阵的条件数κ(A)\kappa(\mathbf{A})κ(A)定义为: κ(A)=∥A∥∥A−1∥\begin{equation} \kappa(\mathbf{A}) = \|\mathbf{A}\| \|\mathbf{A}^{-1}\| \end{equation} κ(A)=∥A∥∥A−1∥ 如果采用矩阵的2范数,上式可以进一步简化为最大特征值λ1\lambda_1λ1与最小特征值λn\lambda_nλn之比: κ(A)=λ1λn\begin{equation} \kappa(\mathbf{A}) = \frac{\lambda_1}{\lambda_n} \end{equation} κ(A)=λnλ1 stackexchange和这篇blog详细分析了前向误差∥x−x^∥\|\mathbf{x}-\hat{\mathbf{x}}\|∥x−x^∥和反向误差∥b−Ax^∥\|\mathbf{b}-\mathbf{A}\hat{\mathbf{x}}\|∥b−Ax^∥(注意这里x\mathbf{x}x是我们求解问题的输出),有如下结论: 对于良态矩阵A\mathbf{A}A,LU分解的前向误差和逆矩阵的前向误差是很接近的 对于病态矩阵A\mathbf{A}A,逆矩阵的反向误差会远大于LU分解的反向误差 即使对于良态矩阵A\mathbf{A}A,逆矩阵的反向误差也比LU分解的反向误差更大 此外,前向误差和反向误差有如下不等式关系: ∥x−x^∥≤κ(A)∥b−Ax^∥\begin{equation} \|\mathbf{x}-\hat{\mathbf{x}}\| \le \kappa(\mathbf{A}) \|\mathbf{b}-\mathbf{A}\hat{\mathbf{x}}\| \end{equation} ∥x−x^∥≤κ(A)∥b−Ax^∥ 综合来看的话,利用LU分解、QR分解等方法直接求解线性方程组比逆矩阵求解要更加准确。 源码分析 TODO: compare python and matlab code
矩阵微分和矩阵求导几乎是求解优化问题不可避免的必学内容,这一方面的内容老实说我很难完全掌握。这里记录一下一些常用的矩阵微分求导的规范和技巧。 符号约定和布局规范 首先微分(differential)是在自变量微小变化下造成的因变量的微小变化,而导数(derivative)则是这种变化的速率。联系导数和微分的方程式叫做微分方程(differential equation)。比如函数y=f(x)y=\mathrm{f}(x)y=f(x),xxx的微小变化用符号dxdxdx表示,导致的yyy的微小变化用dydydy表示,xxx的变化引起的yyy的变化的速率用∂y∂x\frac{\partial y}{\partial x}∂x∂y表示,函数f(x)\mathrm{f}(x)f(x)的微分方程就是: dy=∂y∂xdx\begin{equation} dy=\frac{\partial y}{\partial x} dx \end{equation} dy=∂x∂ydx 矩阵微分/导数同函数微分/导数基本一致,只不过现在输入输出都是矩阵(向量也是矩阵的一种)的表现形式,比如y=f(x)\mathbf{y} = \mathrm{f}(\mathbf{x})y=f(x)。这里用加粗大写字母表示矩阵,例如A\mathbf{A}A;用加粗小写字母表示向量,例如a\mathbf{a}a;用不加粗小写字母表示标量,例如aaa。 矩阵微分求导的难点在于没有固定的布局规范,导致有些文章和教材看起来互相冲突。比如对于∂y∂x\frac{\partial \mathbf{y}}{\partial \mathbf{x}}∂x∂y,其中y∈Rm×1\mathbf{y} \in \mathbb{R}^{m \times 1}y∈Rm×1,x∈Rn×1\mathbf{x} \in \mathbb{R}^{n \times 1}x∈Rn×1,向量对向量的导数用矩阵形式可以有两种表示布局方案: 分子布局(numerator layout),这种布局要求y\mathbf{y}y是按列排的,x\mathbf{x}x是按行排列的(即xT\mathbf{x}^TxT),最终输出∂y∂x∈Rm×n\frac{\partial \mathbf{y}}{\partial \mathbf{x}} \in \mathbb{R}^{m \times n}∂x∂y∈Rm×n,我觉得可以简单理解为与分子上矩阵元素相关的输出排列规则不变(跟原来一致),而与分母上矩阵元素相关输出排列规则应当是原来元素的转置。 分母布局(denominator layout),这种布局要求y\mathbf{y}y是按行排的(即yT\mathbf{y}^TyT),x\mathbf{x}x是按列排列的,最终输出∂y∂x∈Rn×m\frac{\partial \mathbf{y}}{\partial \mathbf{x}} \in \mathbb{R}^{n \times m}∂x∂y∈Rn×m,正好跟分子布局相反。 为了方便推导和记忆,我选择采用分子布局,那么矩阵间导数的形式应当是这样的: yyy y∈Rm×1\mathbf{y} \in \mathbb{R}^{m \times 1}y∈Rm×1 Y∈Rm×n\mathbf{Y} \in \mathbb{R}^{m \times n}Y∈Rm×n xxx ∂y∂x\frac{\partial y}{\partial x}∂x∂y ∂y∂x∈Rm×1\frac{\partial \mathbf{y}}{\partial x} \in \mathbb{R}^{m \times 1}∂x∂y∈Rm×1 ∂Y∂x∈Rm×n\frac{\partial \mathbf{Y}}{\partial x} \in \mathbb{R}^{m \times n}∂x∂Y∈Rm×n x∈Rp×1\mathbf{x} \in \mathbb{R}^{p \times 1}x∈Rp×1 ∂y∂x∈R1×p\frac{\partial y}{\partial \mathbf{x}} \in \mathbb{R}^{1 \times p}∂x∂y∈R1×p ∂y∂x∈Rm×p\frac{\partial \mathbf{y}}{\partial \mathbf{x}} \in \mathbb{R}^{m \times p}∂x∂y∈Rm×p X∈Rp×q\mathbf{X} \in \mathbb{R}^{p \times q}X∈Rp×q ∂y∂X∈Rq×p\frac{\partial y}{\partial \mathbf{X}} \in \mathbb{R}^{q \times p}∂X∂y∈Rq×p 微分规则 矩阵求导常需要用到sum rule、product rule和chain rule,其中链式法则不适用于matrix-by-scalar或scalar-by-matrix的形式,所以直接复合函数求导蛮麻烦的。wiki上说可以先用微分的规则求微分,然后再转换成导数的形式。 sum rules: h(x)=f(x)+g(x)dh(x)=df(x)+dg(x)\begin{equation} \begin{split} h(x) &= f(x) + g(x)\\ dh(x) &= df(x) + dg(x)\\ \end{split} \end{equation} h(x)dh(x)=f(x)+g(x)=df(x)+dg(x) product rules: h(x)=f(x)g(x)dh(x)=df(x)g(x)+f(x)dg(x)\begin{equation} \begin{split} h(x) &= f(x)g(x)\\ dh(x) &= df(x)g(x) + f(x)dg(x)\\ \end{split} \end{equation} h(x)dh(x)=f(x)g(x)=df(x)g(x)+f(x)dg(x) chain rules: h(x)=f(g(x))dh(x)=f(g(x+dx))−f(g(x))=f(g(x)+dg(x))−f(g(x))=df(y)∣y=g(x),dy=dg(x)\begin{equation} \begin{split} h(x) &= f(g(x))\\ dh(x) &= f(g(x+dx)) - f(g(x))\\ &= f(g(x)+dg(x)) - f(g(x))\\ &= df(y)|_{y=g(x),dy=dg(x)}\\ \end{split} \end{equation} h(x)dh(x)=f(g(x))=f(g(x+dx))−f(g(x))=f(g(x)+dg(x))−f(g(x))=df(y)∣y=g(x),dy=dg(x) trace tricks: a=tr(a)tr(A)=tr(AT)tr(A+B)=tr(A)+tr(B)tr(ABC)=tr(BCA)=tr(CAB)\begin{equation} \begin{split} a &= \mathrm{tr}(a)\\ \mathrm{tr}(\mathbf{A}) &= \mathrm{tr}(\mathbf{A}^T)\\ \mathrm{tr}(\mathbf{A}+\mathbf{B}) &= \mathrm{tr}(\mathbf{A}) + \mathrm{tr}(\mathbf{B})\\ \mathrm{tr}(\mathbf{A}\mathbf{B}\mathbf{C}) &= \mathrm{tr}(\mathbf{B}\mathbf{C}\mathbf{A})\\ &= \mathrm{tr}(\mathbf{C}\mathbf{A}\mathbf{B})\\ \end{split} \end{equation} atr(A)tr(A+B)tr(ABC)=tr(a)=tr(AT)=tr(A)+tr(B)=tr(BCA)=tr(CAB) kronecker product rules: A⊗(B+C)=A⊗B+A⊗C(kA)⊗B=A⊗(kB)=k(A⊗B)(A⊗B)⊗C=A⊗(B⊗C)(A⊗B)(C⊗D)=(AC)⊗(BD)(A⊗B)∘(C⊗D)=(A∘C)⊗(B∘D)(A⊗B)−1=A−1⊗B−1(A⊗B)T=AT⊗BTtr(A⊗B)=tr(A)tr(B)\begin{equation} \begin{split} \mathbf{A}\otimes(\mathbf{B}+\mathbf{C}) &= \mathbf{A}\otimes\mathbf{B} + \mathbf{A}\otimes\mathbf{C}\\ (k\mathbf{A})\otimes\mathbf{B} &= \mathbf{A}\otimes(k\mathbf{B})=k(\mathbf{A}\otimes\mathbf{B})\\ (\mathbf{A}\otimes\mathbf{B})\otimes\mathbf{C} &= \mathbf{A}\otimes(\mathbf{B}\otimes\mathbf{C})\\ (\mathbf{A}\otimes\mathbf{B})(\mathbf{C}\otimes\mathbf{D}) &= (\mathbf{A}\mathbf{C})\otimes(\mathbf{B}\mathbf{D})\\ (\mathbf{A}\otimes\mathbf{B})\circ(\mathbf{C}\otimes\mathbf{D}) &= (\mathbf{A}\circ\mathbf{C})\otimes(\mathbf{B}\circ\mathbf{D})\\ (\mathbf{A}\otimes\mathbf{B})^{-1} &= \mathbf{A}^{-1} \otimes \mathbf{B}^{-1}\\ (\mathbf{A}\otimes\mathbf{B})^{T} &= \mathbf{A}^{T} \otimes \mathbf{B}^{T}\\ \mathrm{tr}(\mathbf{A}\otimes\mathbf{B}) &= \mathrm{tr}(\mathbf{A})\mathrm{tr}(\mathbf{B})\\ \end{split} \end{equation} A⊗(B+C)(kA)⊗B(A⊗B)⊗C(A⊗B)(C⊗D)(A⊗B)∘(C⊗D)(A⊗B)−1(A⊗B)Ttr(A⊗B)=A⊗B+A⊗C=A⊗(kB)=k(A⊗B)=A⊗(B⊗C)=(AC)⊗(BD)=(A∘C)⊗(B∘D)=A−1⊗B−1=AT⊗BT=tr(A)tr(B) hadamard product rules: A∘B=B∘AA∘(B+C)=A∘B+A∘C(kA)∘B=A∘(kB)=k(A∘B)(A∘B)∘C=A∘(B∘C)\begin{equation} \begin{split} \mathbf{A}\circ\mathbf{B} &= \mathbf{B}\circ\mathbf{A}\\ \mathbf{A}\circ(\mathbf{B}+\mathbf{C}) &= \mathbf{A}\circ\mathbf{B} + \mathbf{A}\circ\mathbf{C}\\ (k\mathbf{A})\circ\mathbf{B} &= \mathbf{A}\circ(k\mathbf{B})=k(\mathbf{A}\circ\mathbf{B})\\ (\mathbf{A}\circ\mathbf{B})\circ\mathbf{C} &= \mathbf{A}\circ(\mathbf{B}\circ\mathbf{C})\\ \end{split} \end{equation} A∘BA∘(B+C)(kA)∘B(A∘B)∘C=B∘A=A∘B+A∘C=A∘(kB)=k(A∘B)=A∘(B∘C) 总结矩阵常用的微分规则如下: 说明 表达式 微分结果 A\mathbf{A}A不是X\mathbf{X}X的函数 d(A)d\left(\mathbf{A}\right)d(A) 0\mathbf{0}0 aaa不是X\mathbf{X}X的函数 d(aX)d(a\mathbf{X})d(aX) adXad\mathbf{X}adX d(X⊗Y)d(\mathbf{X} \otimes \mathbf{Y})d(X⊗Y) (dX)⊗Y+X⊗(dY)(d\mathbf{X}) \otimes \mathbf{Y} + \mathbf{X} \otimes (d\mathbf{Y})(dX)⊗Y+X⊗(dY) d(X∘Y)d(\mathbf{X} \circ \mathbf{Y})d(X∘Y) (dX)∘Y+X∘(dY)(d\mathbf{X}) \circ \mathbf{Y} + \mathbf{X} \circ (d\mathbf{Y})(dX)∘Y+X∘(dY) d(XT)d(\mathbf{X}^T)d(XT) (dX)T(d\mathbf{X})^T(dX)T 共轭转置 d(XH)d(\mathbf{X}^H)d(XH) (dX)H(d\mathbf{X})^H(dX)H d(X−1)d(\mathbf{X}^{-1})d(X−1) −X−1(dX)X−1-\mathbf{X}^{-1}(d\mathbf{X})\mathbf{X}^{-1}−X−1(dX)X−1 nnn是正整数 d(Xn)d(\mathbf{X}^n)d(Xn) ∑i=0n−1Xi(dX)Xn−1−i\sum_{i=0}^{n-1} \mathbf{X}^i(d\mathbf{X})\mathbf{X}^{n-1-i}∑i=0n−1Xi(dX)Xn−1−i d(eX)d(e^{\mathbf{X}})d(eX) ∫01eaX(dX)e(1−a)Xda\int_0^1 e^{a\mathbf{X}}(d\mathbf{X}) e^{(1-a)\mathbf{X}}da∫01eaX(dX)e(1−a)Xda d(log(X))d(\mathrm{log}(\mathbf{X}))d(log(X)) ∫0∞(X+zI)−1(dX)(X+zI)−1dz\int_0^{\infty} (\mathbf{X}+z\mathbf{I})^{-1}(d\mathbf{X})(\mathbf{X}+z\mathbf{I})^{-1}dz∫0∞(X+zI)−1(dX)(X+zI)−1dz d(tr(X))d(\mathrm{tr}(\mathbf{X}))d(tr(X)) tr(dX)\mathrm{tr}(d\mathbf{X})tr(dX) d(det(X))d(\det(\mathbf{X}))d(det(X)) det(X)tr(X−1dX)\det(\mathbf{X})\mathrm{tr}(\mathbf{X}^{-1}d\mathbf{X})det(X)tr(X−1dX) d(log(det(X)))d(\log(\det(\mathbf{X})))d(log(det(X))) tr(X−1dX)\mathrm{tr}(\mathbf{X}^{-1}d\mathbf{X})tr(X−1dX) 微分-导数转换 在获得表达式的微分形式后,可以按如下规则进行导数形式的转化: 微分形式 导数形式 dy=adxdy=adxdy=adx ∂y∂x=a\frac{\partial y}{\partial x}=a∂x∂y=a dy=aTdxdy=\mathbf{a}^Td\mathbf{x}dy=aTdx ∂y∂x=aT\frac{\partial y}{\partial \mathbf{x}}=\mathbf{a}^T∂x∂y=aT dy=tr(AdX)dy=\mathrm{tr}(\mathbf{A}d\mathbf{X})dy=tr(AdX) ∂y∂X=A\frac{\partial y}{\partial \mathbf{X}}=\mathbf{A}∂X∂y=A dy=adxd\mathbf{y}=\mathbf{a}dxdy=adx ∂y∂x=a\frac{\partial \mathbf{y}}{\partial x}=\mathbf{a}∂x∂y=a dy=Adxd\mathbf{y}=\mathbf{A} d\mathbf{x}dy=Adx ∂y∂x=A\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\mathbf{A}∂x∂y=A dY=Adxd\mathbf{Y}=\mathbf{A}dxdY=Adx ∂Y∂x=A\frac{\partial \mathbf{Y}}{\partial x}=\mathbf{A}∂x∂Y=A
主成分分析(Principle Component Analysis,PCA)是常用的一种矩阵分解算法,PCA通过旋转原始空间来使得数据在各个正交轴上的投影最大,通过选择前几个正交轴可以实现数据降维的目的。 PCA数学原理 优化问题 PCA的优化问题如下: arg maxWtrace(WTXXTW)s.t.WTW=I\begin{equation} \begin{split} \argmax_{\mathbf{W}}\quad &\mathrm{trace}\left(\mathbf{W}^T\mathbf{X}\mathbf{X}^T\mathbf{W}\right)\\ \textrm{s.t.}\quad &\mathbf{W}^T\mathbf{W} = \mathbf{I} \end{split} \end{equation} Wargmaxs.t.trace(WTXXTW)WTW=I 其中X∈RM×N\mathbf{X} \in \mathbb{R}^{M \times N}X∈RM×N是数据,W∈RM×M\mathbf{W} \in \mathbb{R}^{M \times M}W∈RM×M是投影矩阵,NNN是样本点个数,MMM是特征个数。PCA要求数据X\mathbf{X}X做零均值处理,优化问题的解可以转化为如下特征值分解问题的解: (XXT)W=WΛ\begin{equation} \left(\mathbf{X}\mathbf{X}^T\right)\mathbf{W} = \mathbf{W}\mathbf{\Lambda} \end{equation} (XXT)W=WΛ 这里假设W\mathbf{W}W的列向量按相应特征值的大小从大到小排列,保留W\mathbf{W}W前K列即前K个成分的列向量W^\hat{\mathbf{W}}W^,降维后的数据特征为: X^=W^TX\begin{equation} \hat{\mathbf{X}} = \hat{\mathbf{W}}^T\mathbf{X} \end{equation} X^=W^TX 其中X^∈RK×N\hat{\mathbf{X}} \in \mathbb{R}^{K \times N}X^∈RK×N。 实现分析 svd替代eig sklearn中的PCA实现并未使用eig而是使用svd,主要原因是svd比eig具有更好的数值稳定性(当然代价是其计算时间要比eig更长)。使用svd代替eig也是很多学者如Andrew Ng建议的策略,在StackExchange上也有关于svd和eig的相关讨论讨论1、讨论2。sklearn中直接对数据矩阵X\mathbf{X}X而不是协方差矩阵XXT\mathbf{X}\mathbf{X}^TXXT做svd,其等价关系如下: X=UΣVTXXT=UΣ2UTW=UΛ=Σ2\begin{equation} \begin{split} \mathbf{X} &= \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T\\ \mathbf{X}\mathbf{X}^T &= \mathbf{U} \mathbf{\Sigma}^2 \mathbf{U}^T\\ \mathbf{W} &= \mathbf{U}\\ \mathbf{\Lambda} &= \mathbf{\Sigma}^2 \end{split} \end{equation} XXXTWΛ=UΣVT=UΣ2UT=U=Σ2 sign ambiguity问题 sklearn的PCA代码中还考虑了svd的sign ambiguity问题,即每个奇异向量的符号在求解过程中是不确定的(例如,将uk\mathbf{u}_kuk和vk\mathbf{v}_kvk同时乘以-1也满足求解条件)。svd算法(包括eig)中的奇异向量符号只是确保数值稳定性的副产品,类似随机分配符号,并无实际意义。 sklearn使用svd_flip(u, v, u_based_descision=True)函数来确保输出确定性的奇异向量符号,例如,如果u_based_decision=True,则要求U\mathbf{U}U的每一列奇异向量中绝对值最大的元素的符号始终为正,V\mathbf{V}V也要相对的做出调整。 鉴于MATLAB是算法开发的标准之一,我很好奇MATLAB是如何处理SVD的sign ambiguity问题的。MATLAB的svd函数的官方文档中有这样一句话: Different machines and releases of MATLAB® can produce different singular vectors that are still numerically accurate. Corresponding columns in U and V can flip their signs, since this does not affect the value of the expression A = USV’. MATLAB的eig函数的官方文档中亦提到: For real eigenvectors, the sign of the eigenvectors can change. 可以看出MATLAB也未保证符号的确定性。同样在MATALB的社区里也有人问了这个问题,并引导我看了这篇Resolving the Sign Ambiguity in the Singular Value Decompostion的文献。 文献中指出,sklearn的svd_flip方法是一种临时方案(ad hoc),并未从数据分析或者解释的角度来解决sign ambiguity问题。解决sign ambiguity的核心是如何为奇异向量选择一个“有意义”的符号。什么叫“有意义”?比方说我们要研究4种品牌汽车的每公里耗油量,做了4次抽样,构成数据矩阵: X=[4223515111169101411691014]\begin{equation} \begin{split} \mathbf{X} = \begin{bmatrix} 4 &22&3 &5 \\ 1 &5 &1 &1 \\ 11&69&10&14 \\ 11&69&10&14 \\ \end{bmatrix} \end{split} \end{equation} X=4111112256969311010511414 其中每一列是一种品牌汽车的耗油量,每一行为抽样情况,svd分解有X=UΣVT\mathbf{X}=\mathbf{U} \mathbf{\Sigma} \mathbf{V}^TX=UΣVT。我们来看一下numpy.linalg.svd计算得到的V\mathbf{V}V的第一个奇异向量: v0=[−0.15−0.96−0.14−0.20]\begin{equation} \begin{split} \mathbf{v}_0 = \begin{bmatrix} -0.15 \\ -0.96 \\ -0.14 \\ -0.20 \\ \end{bmatrix} \end{split} \end{equation} v0=−0.15−0.96−0.14−0.20 v0\mathbf{v}_0v0实际指明了耗油量空间的一个向量,然而我们知道耗油量没有负值(如果有的话人类就拥有无限能源了),一个完全指向负的方向意义不大,如果改变v0\mathbf{v}_0v0的符号,变为: v0=[0.150.960.140.20]\begin{equation} \begin{split} \mathbf{v}_0 = \begin{bmatrix} 0.15 \\ 0.96 \\ 0.14 \\ 0.20 \\ \end{bmatrix} \end{split} \end{equation} v0=0.150.960.140.20 结果就合理多了。 文献中指出,奇异向量的符号应当与大多数数据样本向量的符号相同,从几何上来看,奇异向量应当指向大多数向量指向的方向。下图是我从文献中截取的,深色蓝线是正确的奇异向量方向,浅色蓝线是数据向量。 翻译成数学语言(我按照自己的理解和习惯转化成优化问题,与文献的原始表述并不一致,不一定对,有兴趣的读者可以看原始文献😃),纠正符号算法的核心是对于每一对奇异向量uk\mathbf{u}_kuk和vk\mathbf{v}_kvk,寻找符号sks_ksk优化以下目标函数 arg maxsk∈{1,−1}sk(∑j=1NukTX⋅,j+∑i=1MXi,⋅vk)\begin{equation} \argmax_{s_k \in \{1,-1\}}\quad s_k \left(\sum_{j=1}^N \mathbf{u}_k^T\mathbf{X}_{\cdot,j} + \sum_{i=1}^M\mathbf{X}_{i,\cdot}\mathbf{v}_k\right) \end{equation} sk∈{1,−1}argmaxsk(j=1∑NukTX⋅,j+i=1∑MXi,⋅vk) 根据两项求和项的符号即可决定sks_ksk的符号。对于可能存在的左右奇异向量符号冲突的情况(例如单独看左奇异向量有意义的符号是-1,单独看右奇异向量有意义的符号为1),该算法选择以求和绝对值最大的一项的符号为主(反应在上式就是两项求和)。文献中指出,该算法仅在上述求和项不为0的情况下有效(即在0附近奇异向量的符号可以为任意情况)。 Python版具体算法实现如下,Matlab可以使用这个版本: import warningsimport numpy as npdef sign_flip(u, s, vh=None): """Flip signs of SVD or EIG. """ left_proj = 0 if vh is not None: left_proj = np.sum(s[:, np.newaxis]*vh, axis=-1) right_proj = np.sum(u*s, axis=0) total_proj = left_proj + right_proj signs = np.sign(total_proj) random_idx = (signs==0) if np.any(random_idx): signs[random_idx] = 1 warnings.warn("The magnitude is close to zero, the sign will become arbitrary.") u = u*signs if vh is not None: vh = signs[:, np.newaxis]*vh return u, s, vh
从2019年到2022年,manjaro发行版渡过了我的整个博士生涯。最近毕业重新装了系统,依然选择了最新的manjaro KDE Plasma 21.2.4(本来装了arch,大小问题不断被劝退了😜)。基本上这台linux主机要跟着我进入人生下一阶段,作为主力台式机也不打算再折腾了。安装过程中有一些新的学习体会(坑),在这里更新记录一下,希望能帮到有需要的朋友~ 三年manjaro使用感悟 总体而言,作为一个linux系统小白,我对manjaro还是相当满意的,基本上,manjaro能满足我日常工作、娱乐的需要。manjaro系统安装显卡驱动和切换内核确实简单,只需在系统设置里改变即可,此外,arch文档翔实、aur软件丰富,大部分问题和需求都能找到对应的解决方案。linux下的开发、科研等涉及编程的工作确实要比windows下爽很多,一行命令搞定一堆安装包,然后用就完了,计算速度上似乎还比window下快一点(也许是心理作用?😁)。游戏方面steam上兼容linux的游戏还是挺多的,我常玩的饥荒、博德之门3等游戏都还能运行,偶尔有问题的话去protonDB上查一查还是能找到解决方案的,Valve不愧是要搞Steam Deck,估计这方面的兼容性支持会更好。 manjaro的缺点也是大多数linux系统的通病。为了满足正常使用,用户要做一些文本方面的配置,因此至少需要知道一些命令行的基础知识,这一点上远不如windows点点点直观,而且经常会出现一些奇奇怪怪的小问题,很影响使用体验。此外,linux系统的驱动相对windows依然是个大问题,驱动(尤其是显卡)出问题小白用户就直接GAME OVER了。最后,尽管9成需求我都能在manjaro下解决,仍有1成的需求由于各种原因必须要使用windows,我通常都是挂个windows虚拟机以备不时之需。 综上所述,我感觉manjaro系统适合满足以下条件的小白群体使用: 至少有1台独立的windows笔记本 不惧怕查阅资料,能科学上网 有一段完整的折腾时间 没有大型3A游戏或windows专用软件需求 安装manjaro的硬件不是最新的 不满足第一条的朋友还是老老实实用windows,在虚拟机里尝尝鲜得了😜,至于mac用户,俺们不跟土豪做朋友😢 系统选择与安装 manjaro提供了XFCE、KDE和GNOME三个桌面的环境的安装iso,我个人偏向于KDE,用着舒服,看着也不赖,倒是没必要再去捣鼓桌面美化啥的。这里建议下载Minimal LTS(长期支持版内核)安装镜像,以最大限度的避免硬件驱动等各种乱七八糟的问题。我选用的5.4版内核会一直支持到2025年12月,虽然听说kernel版本越高,硬件支持越好,但我实际装的时候最新版本各种诡异驱动问题(咱也不懂,就很玄学),所以还是从LTS出发,先达成一个基本可用的环境,再慢慢升级比较靠谱。 最好进BIOS把内存频率调低,比如2400MHz或2666MHz,我的内存一开始是3000MHz,很容易卡在进图形界面的步骤,查资料好像是啥显卡驱动没有加载上,需要Early Loading,但是我没有成功过,后来发现把内存频率调低就可以了,就很玄学 下载ISO文件后,用空余的U盘制作启动盘,插上U盘,进BIOS里关掉安全启动(Secure Boot)选项,把U盘的启动顺序调到前面,保存退出后就能进入manjaro的启动界面环境。 这里设置一下时区为Asia/Shanghai,选择以开源驱动boot,其它的选项都不用改,反正后期都能调整,核心目的是进入live环境。 进去之后会弹出欢迎界面,选择中文语言,一路点击下一步直到分区步骤(这里有时候会卡一会,可能是在联网检测啥东西)。 分区界面根据实际硬件的不同会有各种选项,抹除磁盘是自动分区安装的意思,适合不太清楚什么是分区的朋友,一路点点点就行。如果硬盘上还装了windows,manjaro还会有双引导的安装选项,可以说挺简单智能的了,桌面上的Installation Guide会有这些选项的详细介绍。由于manjaro会将所有可用空间全部归到root分区,我想单独划个home分区出来,所以选择了手动方法。这里我分了一个500MB的efi分区,2GB的swap分区(感觉用不上),64GB的root分区,剩下的都划到home分区了。划好分区后点击下一步,设置一些用户名、密码啥的,就可以进入安装过程,安装结束后重启、拔U盘,一切顺利的话就进入manjaro系统了~ 基础设置 设置manjaro更新镜像源 由于众所周知的原因,不更改镜像源和设置科学上网,大部分的开发工具在国内基本没法用。所以进系统的第一步是更改manjaro的系统更新镜像源,选择所在地区的镜像。 sudo pacman-mirrors -c China -m rank && sudo pacman -Syyu 该命令选择China地区的镜像源,并对系统做一次更新,因此可能需要等待一会,更新完最好直接重启。 安装Nvidia显卡驱动 重启过后,可以选择安装显卡驱动了。在系统设置-硬件设定里选择闭源驱动。我的显卡是GTX1080,好几年前的老卡了,video-nvidia-470xx驱动比较靠谱,如果想用最新的驱动,选择video-nvidia驱动就可以。 右键安装,输入管理员密码,安装完毕后重新启动,如果一切顺利进入桌面就表明没问题啦!!! 科学上网 国内软件安装的大部分问题都是因为众所周知的原因,并且优秀开发和参考资料多为英文,因此科学上网属于一切学术研究和开发工作的必要条件,将科学上网作为终身学习的课题,花时间研究是值得的。 我采用的是proxychains结合v2ray的方式,不再使用之前的shadowsocks: sudo pacman -S proxychains-ng v2ray proxychains的配置文件为/etc/proxychains.conf,用kate打开该文件,修改最后一行: socks5 127.0.0.1 1080 v2ray的配置文件在/etc/v2ray目录下,这一部分有很多的学习资料了,我写了一个脚本自动获取生成配置文件config1.json,启动部分我采用手动挡输入命令,以后有空再研究自动挡的方式: v2ray -c /etc/v2ray/config1.json 科学上网的基本设置就结束了,浏览器可以在网络设置中选择socks5代理,转发本地1080端口;想要代理命令行程序,可以采用proxychains+命令的方式,比如: proxychains -q wget www.google.com 安装yay 安装yay yay可以当作pacman使用,也是用来安装AUR里软件包的工具,尽管manjaro自带的软件包管理器Pamac可以开启aur选项,以图形化界面的形式安装软件,但是Pamac似乎有许多bug,所以还是使用yay这一更常用的命令行工具。 manjaro下yay的安装非常简单,甚至不需要自己去编译: sudo pacman -S yay proxychains+yay 至此yay已经可以正常使用了,不过AUR里的软件包经常需要下载github等外网代码、文件,由于众所周知的原因,速度会慢的跟龟爬一样,所以最好还是搭配proxychains等工具使用。默认的yay采用go编译,这一版本同proxychains等代理工具有冲突,解决方案是用gcc-go重新编译,但是目前的v11.1.2版本的yay编译过不去,我没有能力解决问题,只能选择v11.1.1的yay。 yay -S base-devel gcc-gomkdir build && cd build && git clone https://aur.archlinux.org/yay.gitcd yayproxychains -q wget https://github.com/Jguer/yay/archive/v11.1.1.tar.gz 然后用kate修改PKGBUILD里如下部分: pkgname=yaypkgver=11.1.1 # 修改版本为11.1.1...makedepends=('gcc-go>=1.16') # 修改为gcc-go>=1.16source=("${pkgname}-${pkgver}.tar.gz::https://github.com/Jguer/yay/archive/v${pkgver}.tar.gz")sha256sums=('31ed6d828574601e77b8df90c6e734a230ea60531b704934038d52fe213c0898') # 修改sha256的值... 由于yay会下载一些go的依赖,所以也要设置go的代理(众所周知😥),最后yay的目录下直接编译安装,此时的yay就可以跟proxychains完美配合啦,接下来我基本都使用yay安装manjaro官方和AUR的软件~ export GO111MODULE=onexport GOPROXY=https://goproxy.cnmakepkg -sic 忽略yay的更新 由于目前11.1.2版本的yay是manjaro默认的版本,更新系统时会自动替换老版本11.1.1,如果不想更新yay,可以在/etc/pacman.conf中忽略yay的更新,添加如下内容: IgnorePkg = yay 中文字体和中文输入法 开源中文字体 国内习惯了用windows自带的中文字体,比如楷体、宋体等,而这些在linux上因为版权问题发行版不会默认自带,需要我们自己“安装”使用(毕竟已经买了windows的笔记本了,用就完了哈哈哈)。当然有些字体是免费开源的: yay -S wqy-microhei wqy-microhei-lite wqy-zenhei noto-fonts-cjk adobe-source-han-sans-cn-fonts adobe-source-han-serif-cn-fonts 中文输入法 中文输入法采用fcitx5,输入以下命令安装: yay -S fcitx5 fcitx5-configtool fcitx5-chinese-addons fcitx5-qt fcitx5-gtk fcitx5-lua 安装完毕后用kate打开/etc/environment文件,在其中输入如下变量,然后注销再重新登陆,就可以使用中文输入法了: GTK_IM_MODULE=fcitxQT_IM_MODULE=fcitxXMODIFIERS=@im=fcitxINPUT_METHOD=fcitxSDL_IM_MODULE=fcitxGLFW_IM_MODULE=ibus 默认拼音和英文的切换快捷键是ctrl+shift,不喜欢的话可以在系统设置-区域设置-输入法里进行调整,更多的相关设置也可以参考arch的中文输入法。 win10字体安装 想要安装win10的字体(十分有必要),AUR提供了ttf-ms-win10的安装包,不过不提供字体文件,需要自己从已有的win10系统(拷贝所有C:\Windows\Fonts下的字体文件)或从win10镜像中抽取字体文件,这里介绍如何抽取字体,首先从AUR拷贝tff-ms-win10: mkdir -p build && cd buildgit clone https://aur.archlinux.org/ttf-ms-win10.git 然后挂载win10安装镜像,manjaro下只需要右键选择挂载ISO,Dolphin的左边即可出现ISO的访问文件路径,找出source文件夹下的install.esd或install.wim文件,把该文件拷贝到ttf-ms-win10文件夹下,执行如下命令解锁所有字体文件: wimextract install.esd 1 /Windows/{Fonts/"*".{ttf,ttc},System32/Licenses/neutral/"*"/"*"/license.rtf} --dest-dir . 然后修改PKGBUILD如下,添加的字体文件表示仿宋、黑体和楷体: _ttf_ms_win10_zh_cn=(simsun.ttc simfang.ttf simhei.ttf simkai.ttf # 增加这行内容simsunb.ttf msyh.ttc msyhbd.ttc msyhl.ttc 最后在ttf-ms-win10目录下执行安装命令,注意如果报错,大概率是当前抽取的字体文件中没有该字体,可以按照错误提示从网上下载ttf文件加入其中,或者在PKGBUILD里删掉该字体,毕竟只有中文字体比较重要: makepkg -sic --skipchecksums 安装完成后可以在系统设置-外观-字体管理中检查字体安装是否正确。 Ryzen随机卡死问题 这个问题三年前就遇到了,当时系统会随机卡死无响应(切terminal什么都没用)。这个问题是Ryzen处理器的一个bug,不知道现在的Ryzen系列有没有解决这个问题(我是AMD Ryzen 5 1600,也是老处理器了),总之我重装系统后依然有这个问题。解决方案就是disable C6 state,最好重启后再执行如下命令: yay -S disable-c6-systemdsudo modprobe msr 编辑/etc/modules-load.d/modules.conf,添加msr这一行,以便在启动时加载msr模块: msr 最后,启动如下service: sudo systemctl enable disable-c6.servicesudo systemctl start disable-c6.service 过去三年里基本没有出现这种随机卡死的问题了,感恩大佬。 其他优化 SSD优化 sudo systemctl enable fstrim.timersudo systemctl start fstrim.timer 切换登陆终端 manjaro默认的zsh十分好用,不过非图形界面下的terminal还是bash,可以设为zsh: cat /etc/shellschsh -s /bin/zsh 切换内核版本 在系统设置-内核中点点就好啦,会安装一大堆东西,装完重启一下。 我切回了5.4的内核,新换内核后原内核最好保留一段时间,避免系统挂掉,还可以在初始启动界面选择从哪个内核进入系统。 常用软件安装 miniconda+python python作为我科研的主力编程语言,我选择用conda管理不同的python版本,首先安装miniconda,运行后一路回车或yes,最后会询问要不要把conda加入环境变量,这里可以选择no: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.shchmod +x Miniconda3-latest-Linux-x86_64.sh./Miniconda3-latest-Linux-x86_64.sh 注意在terminal中还是没法直接使用conda的,因为不知道conda安装在哪,这里执行如下命令写入环境变量: ~/miniconda3/bin/conda init zsh 退出terminal再重开,就能使用conda啦~ manjaro原来的terminal使用的是bash,2022版konsole使用了zsh(自带颜色、命令记忆补全,超级赞😃),如果想在bash中使用conda,将上面的zsh换成bash即可完成初始化的操作。bash的相关设置在.bashrc里,zsh的相关设置在.zshrc里,两者是默认不互通的。 老规矩,由于众所周知的原因,需要更换conda的镜像源,这里用清华tuna的镜像: conda config --set show_channel_urls yes 在.condarc文件里粘贴以下内容: channels: - defaultsshow_channel_urls: truedefault_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2custom_channels: conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud 另外pip的源最好也更改一下: pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple wps office办公 linux上最好用的office软件套装,搭配前面的win10字体可以做很多文字工作,不用切换到windows,安装如下AUR软件包: yay -S wps-office wps-office-mui-zh-cn ttf-wps-fonts 装完最好重启一下,目前我没遇到啥大问题,如果有问题的话可以看看arch的wiki。 vscode编辑器 超好用的编辑器,建议安装微软的二进制版本,可以多搜到一些好用的扩展包: yay -S visual-studio-code-bin virtualbox虚拟机 如果要使用微信之类国产软件的话,用虚拟机装个windows就行了,还能跟主系统隔离开来,台式机也不太考虑性能问题,这里建议参考manjaro的wiki,注意自己的linux内核版本,比如我的是linux54,别的版本需要替换下面命令中的linux54。 sudo pacman -S virtualbox linux54-virtualbox-host-modules 安装完成后最好重启再完成后续工作。重启后需要安装扩展增强包,先看一下自己的virtualbox版本: vboxmanage --version 比如我是6.1.32r149290,那我就需要安装对应版本的扩展包,使用yay搜索可用的扩展包: yay virtualbox-ext-oracle 会弹出很多选项,要安装哪个输入序号回车。 最后需要将当前的用户加入vboxuser组,重启或注销就可以使用虚拟机啦~ sudo gpasswd -a $USER vboxusers LaTex论文写作 即使作为科研垃圾,也不得不产出论文😟。科技论文写作里LaTex可比word好用多了(前提是有模板),manjaro安装LaTex也很简单,配合vscode的LaTex扩展写论文不要太爽。 yay -S texlive-most texlive-lang biber texlive-bibtexextra texlive-fontsextra 安完之后去vscode扩展里装LaTex Workshop,就可以开始写作了。如果要写中文论文,可以在vscode的settings.json里输入如下内容(一般vscode敲完latex-workshop.latex.tools之后就会自动补全后面值,在最前面添加一个就好): { "latex-workshop.latex.tools": [ { "name": "xelatexmk", "command": "latexmk", "args": [ "-synctex=1", "-interaction=nonstopmode", "-file-line-error", "-xelatex", "-outdir=%OUTDIR%", "%DOC%" ], "env": {} } ], "latex-workshop.latex.recipes": [ { "name": "xelatexmk 🔃", "tools": [ "xelatexmk" ] } ], "latex-workshop.view.pdf.viewer": "external", "latex-workshop.latex.recipe.default": "lastUsed"} 然后使用扩展菜单中的xelatexmk就可以编译中文内容啦~ OneDrive网盘 一直用onedrive习惯了,配合代理速度也还行,不太想用国内的其他网盘🤐。manjaro下使用这个项目的onedrive命令行来同步: yay -S onedrive-abrauneggonedrive 按照提示进行设置,设置完成后就可以使用了,因为我比较懒,没有研究自动同步功能,所以都是配合代理手动同步: proxychains -q onedrive --synchronize 反正又不是不能用,有空再看看自动同步咋搞。 xmind思维导图 AUR仓库里自带xmind8,直接输入以下命令即可: yay -S xmind 不过这一版本的xmind需要openjdk8的依赖才能运行,执行以下命令安装openjdk8: yay -S jdk8-openjdk 然后用kate或code打开/usr/share/xmind/XMind/XMind.ini文件,在文件开头添加如下文本: -vm/usr/lib/jvm/java-8-openjdk/bin 保存退出,xmind就能正常运行啦! hexo博客管理 我的博客部署在github pages上,采用hexo管理,首先需要安装nodejs,用AUR的nvm管理不同的node版本: yay -S nvm#使用nvm前需要运行这一句,可以将其写入.zshrc或.bashrcsource /usr/share/nvm/init-nvm.shnvm install node 然后安装npm及hexo: yay -S npmnpm install hexo-cli -gnpm install hexo-deployer-git --save hexo的使用方法可见参考文档。 其他软件 qBittorrent 还没有把硬盘填满吗?快使用qBittorrent吧~ yay -S qbittorrent yesplaymusic+spotify yesplaymusic是网易云音乐的替代,超漂亮的云音乐播放器,没有乱七八糟的功能,颜值党狂喜,安装简单(需要proxychains,老实讲大部分从github下载文件的都需要): proxychains -q yay -S yesplaymusic 除此之外也可以安装spotify,让我们一起聆听IU的美妙歌声😍。 proxychains -q yay -S spotify 大陆地区反正不挂代理能用,不能用再说,又没有交钱,要什么自行车~ PS:视频播放器直接用自带的VLC就好,功能强大,没啥不能播的。 文件名编码转换 windows默认GB2312,linux一般用UTF-8,从windows拷贝过来的中文文件名有时候是乱码,可以用convmv转化一下: yay -S convmv# 测试转换是否成功,不实际执行转换convmv -f GBK -t UTF-8 -r your_folder_or_file# 执行实际转换convmv -f GBK -t UTF-8 -r --notest your_folder_or_file ufw防火墙服务 manjaro默认不带ufw防火墙,虽然我听说可以用iptables添加规则,但目前不懂怎么设置,先装了gufw: yay -S ufw gufwsudo systemctl enable ufw.servicesudo systemctl start ufw.service 开始菜单里就会出现防火墙配置的程序,先使用默认的就好,以后再研究。 远程桌面 如果有远程桌面的需求,比如连接windows笔记本、树莓派之类的,可以使用Remmina: yay -S remmina freerdp libvncserver spice-gtk caj2pdf 中国知网大部分论文都是caj格式(什么垃圾玩意),在linux下先转换成pdf格式再阅读比较方便,这里推荐caj2pdf工具,当然成功与否全部靠命。 proxychains -q yay -S caj2pdf# 转换caj到pdfcaj2pdf convert 某篇博士论文.caj -o 某篇博士论文.pdf colorpicker 还在为做PPT找不到好配色烦恼吗?安装colorpicker,运行命令,鼠标一点即可获取颜色的RGB和Hex值,获取完直接ctrl+c退出。 proxychains -q yay -S colorpickercolorpicker Troubleshooting 这里放一些或许有的问题,方便大家排查,没事干时多看看KSystemlog~ spam log baloo limit USB无线网卡不工作 大概率是驱动问题,我的无线网卡是TP-Link,连接后输入lsusb命令查看芯片组信号: Bus 005 Device 003: ID 148f:7601 Ralink Technology, Corp. MT7601U Wireless Adapter 可以看到用了Ralink的MT7601U,输入如下命令安装对应的开源驱动: proxychains -q yay -S mt7601u-dkms-git 重启后就有无线网络啦。 Ark解压中文乱码 proxychains -q yay -S p7zip-natspec `` 在Ark设置中禁用LibZip插件
本篇的内容可能过时啦 虽然我很久不用MATLAB处理日常工作,但是实验室主流依然是MATLAB(用Python的就那么几个T_T)。以前小伙伴们跑程序都是拷贝程序和数据到实验室的计算服务器上,手工开N个MATLAB窗口做运算。现在实验室规模扩大,这种手工的方式越来越繁琐。我从前用MATLAB时就想试试集群计算,奈何当时实验室没啥硬件条件,正好现在有机会,我干脆搭了个MATLAB集群供小伙伴使用。 软硬件 硬件方面: 4核心, 16GB内存, 百兆网卡普通台式机(manage节点) 40核心, 128GB内存, 千兆网卡计算服务器(compute1节点) 346TB存储, 千兆网卡存储服务器(storage节点) 软件方面: Windows10专业版系统 centos7 matlab2017b 网络环境: 192.168.130.12(matlab-manage.xxx.org) – manage节点 192.168.130.11(matlab-compute1.xxx.org) – compute1节点 192.168.130.10 – storage节点 MATLAB的帮助文档中提出,想要使用集群计算服务应该满足以下条件: 推荐一个CPU核心最多创建一个worker 推荐每个worker最少可以使用2GB内存 最少5GB的硬盘空间容纳暂时性的数据文件 计算集群之间应当使用同构的计算架构(要求计算节点的硬件配置、系统和软件配置一致) 集群安装配置 ip域名设置 修改compute节点和manage节点的计算机名、ip地址以及DNS域名解析,例如compute1节点的计算机名为matlab-compute1.xxx.org(xxx.org为后缀域名),DNS域名也应该为matlab-compute1.xxx.org,ip地址为192.168.130.11。 MATLAB分布式计算服务似乎要求计算机名要添加后缀域名(xxx.org),否则在集群测试时会有解析不匹配的警告,Windows专业版可在这台电脑-属性-更改设置-更改-其他中添加主DNS后缀。 manage节点 在manage节点安装matlab2017b,manage节点在安装过程中应该勾选MATLAB License Server和MATLAB Distributed Computing Server工具箱,前者为集群提供license认证服务,后者是分布式计算的核心服务组件。对于破解版的MATLAB,应该输入floating license的key而不是standalone的key,才能安装MATLAB License Server。安装完毕(并破解)后,在Windows服务选项卡中启动MATLAB License Server服务。 同时修改C:\Program Files\MATLAB\R2017b\licenses\network.lic为如下内容 SERVER this_host ANYUSE_SERVER 修改C:\Program Files\MATLAB\R2017b\toolbox\distcomp\bin\mdce_def.bat其中的security level为2 set SECURITY_LEVEL=2 设置security level为2的效果是要求用户在使用分布式计算服务时输入用户名,从而可以监控集群使用情况。 启动MATLAB,切换到C:\Program Files\MATLAB\R2017b\toolbox\distcomp\bin目录下,在MATLAB命令行窗口输入如下命令安装并启动mdce服务 !mdce install !mdce start 最好在MATLAB命令行窗口内启动mdce服务,如果在Windows服务选项卡中启动服务,会出现权限问题导致集群worker无法连接。 启动mdce服务后最好双击运行C:\Program Files\MATLAB\R2017b\toolbox\distcomp\bin\addMatlabToWindowsFirewall.bat文件(我的做法是直接关闭Windows防火墙避免多余的问题) compute节点 compute节点的安装配置同manage节点,仅以下内容不同 安装matlab时无需勾选MATLAB License Server工具箱 修改C:\Program Files\MATLAB\R2017b\licenses\network.lic为如下内容 SERVER matlab-manage.xxx.org ANYUSE_SERVER 此外为了跟storage节点连接,compute节点需要安装NFS服务,在程序和功能-启用或关闭Windows功能中勾选NFS服务 storage节点 storage节点设置NFS服务,NFS服务端安装和配置网上都有,我就不写了。 添加集群节点 在manage节点运行C:\Program Files\MATLAB\R2017b\toolbox\distcomp\bin\admincenter.bat,启动管理面板,点击Add or Find,添加manage节点和compute1节点,添加完毕后,点击Test Connectivity,测试通过如下图 在MATLAB Job Scheduler面板点击start启动scheduler,输入名称,选择scheduler的节点为matlab-manage.xxx.org,因为security level为2,还需要设置管理员的密码。 设置好scheduler后,右键scheduler点击Start Workers,勾选compute1节点,设置启动的worker数量(我只有40个核心,所以启动40个worker)。 客户端配置和使用 MATLAB集群计算要求客户端的matlab版本和服务端一致,因为我服务端安装的是2017b,客户端也应该是matlab2017b。客户端可以选择standalone安装方式,也需要安装mdce服务添加防火墙配置并启动。 如果客户端想直接使用NFS服务,也需要在程序和功能-启用或关闭Windows功能中勾选NFS服务。 安装完毕后,在MATLAB主页中的Parallel选项选择Discover Cluster,勾选On your network,点击Next等待发现集群mjs40_2,选择集群,点击Next,Finish,就可以使用集群了,集群的使用情况可以在Parallel选项里Monitor Jobs查看。 这里提供两个matlab并行计算脚本检测集群配置是否正确 %This demo shows how to use distributed computing serverprimeNumbers = primes(uint64(2^21));compositeNumbers = primeNumbers.*primeNumbers(randperm(numel(primeNumbers)));factors = zeros(numel(primeNumbers),2);tic;parfor idx = 1:numel(compositeNumbers) factors(idx,:) = factor(compositeNumbers(idx));endtoc %This demo shows how to load data from nfs server, target_folder is nfs server ip addresstarget_folder='\\192.168.130.10\pub\data\';factors=zeros(400,2);tic;parfor i=0:399 tmp = load([target_folder,num2str(i),'.mat']); data = tmp.data; factors(i+1, :)=factor(data);endtoc
Psychopy事件响应 Psychopy提供了很多IO交互方式,当然,最根本的还是键盘和鼠标。本节介绍Psychopy鼠标和键盘的编程技巧。 全局按键响应 编写刺激界面免不了要反复调试,要看看字体颜色对不对、图形大小合不合适,一旦发现刺激界面需要改进就得退出程序修改源代码。 如果采用普通的按键检测方式,则需要在一个循环体内检查按键状态,这显然有可能造成不可知的错误(比如在检测按键前进入一个死循环函数,程序永远无法退出啦),这个时候全局按键响应就很有用了。 Psychopy用psychopy.event.globalkeys来设置全局按键,官方文档里没有如何使用全局按键的说明,但在coder的Demo里有演示global_event_keys.py。 global_event.keys.py程序注册了三个按键,按键“b”调用python的setattr函数,设置rect对象的填充颜色为蓝色,按键“ctrl”+“r”调用python的setattr函数,设置rect对象的填充颜色为红色。按键“q”调用core.quit方法终止程序退出。 # -*- coding: utf-8 -*-from psychopy import core, event, visual, monitorsif __name__=='__main__': mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息 mon.setSizePix((1920, 1080)) # 设置显示器分辨率 mon.save() # 保存显示器信息 win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=False) rect = visual.Rect(win, fillColor='blue', pos=(0, -0.2)) text = visual.TextStim( win, pos=(0, 0.5), text=('Press\n\n' 'B for blue rectangle,\n' 'CTRL + R for red rectangle,\n' 'Q or ESC to quit.')) # Add an event key. event.globalKeys.add(key='b', func=setattr, func_args=(rect, 'fillColor', 'blue'), name='blue rect') # Add an event key with a "modifier" (CTRL). event.globalKeys.add(key='r', modifiers=['ctrl'], func=setattr, func_args=(rect, 'fillColor', 'red'), name='red rect') # Add multiple shutdown keys "at once". for key in ['q', 'escape']: event.globalKeys.add(key, func=core.quit) # Print all currently defined global event keys. print(event.globalKeys) print(repr(event.globalKeys)) while True: text.draw() rect.draw() win.flip() 以下是event.globalkeys.add()方法的参数介绍 event.globalkeys.add(key, func, func_args=(), func_kwargs=None, modifiers=(), name=None) parameters type description key string 按键字符串 func function 按键时执行的函数 func_args iterable 函数的args参数 func_kwargs dict 函数的kwargs参数 modifiers iterable 组合按键字符串列表,例如’shift’,‘ctrl’,‘alt’,‘capslock’,'scrollock’等 name string 按键事件的名称 此外还有event.globalkeys.remove()方法以移除全局按键 event.globalkeys.remove(key, modifiers=()) parameters type description key string 按键字符串 modifiers iterable 组合按键字符串列表,例如’shift’,‘ctrl’,‘alt’,‘capslock’,'scrollock’等 等待按键和检测按键 除了全局按键响应,Psychopy还提供了等待按键响应和检测按键响应两种方式。 以下为等待按键函数event.waitKeys()的演示程序,按’esc’或五次其他按键退出程序。 # -*- coding: utf-8 -*-from psychopy import core, event, visual, monitorsif __name__=='__main__': mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息 mon.setSizePix((1920, 1080)) # 设置显示器分辨率 mon.save() # 保存显示器信息 win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=False) msg = visual.TextStim(win, text='press a key\n < esc > to quit') msg.draw() win.flip() k = [''] count = 0 while k[0] not in ['escape', 'esc'] and count < 5: k = event.waitKeys() print(k) count += 1 win.close() core.quit() event.waitKeys()阻塞函数进程直到被试按键,以下是event.waitKeys()方法的参数介绍 event.waitKeys(maxWait=inf, keyList=None, modifiers=False, timeStamped=False, clearEvents=True) parameters type description maxWait numeric value 最大等待时间,默认为inf keyList iterable 指定函数检测的按键名称,函数仅在按指定键时返回 modifiers bool 如果True,返回(keyname, modifiers)的tuple timeStamped bool 如果True,返回(keyname, time) clearEvents bool 如果True,在检测新的按键前清理event buffer return type description keys iterable 按键列表;超时返回None 等待按键会阻塞进程,Psychopy还提供了另一种非阻塞检测方式event.getKeys()。 以下代码如下不断检测按键并输出,直到按’escape’键退出程序。 # -*- coding: utf-8 -*-from psychopy import core, event, visual, monitorsif __name__=='__main__': mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息 mon.setSizePix((1920, 1080)) # 设置显示器分辨率 mon.save() # 保存显示器信息 win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=False) msg = visual.TextStim(win, text='press a key\n < esc > to quit') msg.draw() win.flip() count = 0 while True: k = event.getKeys() if k: if 'escape' in k: break print(k) win.close() core.quit() 以下是event.getKeys()方法的参数介绍 event.getKeys(keyList=None, modifiers=False, timeStamped=False) parameters type description keyList iterable 指定函数检测的按键名称,函数仅在按指定键时返回 modifiers bool 如果True,返回(keyname, modifiers)的tuple timeStamped bool 如果True,返回(keyname, time) return type description keys iterable 按键列表;超时返回None 鼠标事件 Psychopy提供event.Mouse类来处理鼠标相关的事件,官方文档对此有详细的介绍。以下的代码显示了一个含有矩形的窗,在矩形内部单击左右键可以改变颜色,而按中央滚轮键则退出程序。 # -*- coding: utf-8 -*-from psychopy import core, event, visual, monitorsif __name__=='__main__': mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息 mon.setSizePix((1920, 1080)) # 设置显示器分辨率 mon.save() # 保存显示器信息 win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=True) rect = visual.Rect(win, fillColor='blue', pos=(0, 0)) # 创建Mouse类 mouse = event.Mouse(visible=True, newPos=(0, 0), win=win) while True: # 重置单击事件状态 mouse.clickReset() # 检测左键是否在矩形内单击 if mouse.isPressedIn(rect, buttons=[0]): rect.fillColor = 'red' # 检测右键是否在矩形内单击 if mouse.isPressedIn(rect, buttons=[2]): rect.fillColor = 'blue' # 检测是否单击滚轮键 # button1: left click # button2: middle click # button3: right click button1, button2, button3 = mouse.getPressed(getTime=False) if button2: break rect.draw() win.flip() core.quit()
共空间模式(common spatial pattern,CSP)是脑-机接口领域常用的一类空间滤波算法,尤其在运动想象范式分类上具有较好的效果,是运动想象范式的基准算法之一。 目前,CSP及其改进算法的发展速度放缓,看似到达了算法的瓶颈期,近几年鲜少有较大的突破。尽管如此,CSP中的一些思想对脑-机接口算法设计仍然具有一定的启示作用。本文从CSP原始算法出发,讨论其变形和一系列改进算法,试图阐明其中的数学思想与神经科学的联系。 CSP数学原理 原始形式 2000年Graz的论文中提出的CSP是为2分类问题设计的,形式较为简单,然而如果你读CSP相关论文,就会发现CSP存在至少三种表述形式。这三种方式相互联系,又有所区分,很容易让初学者陷入混乱,不知道哪一种是正确形式。我接下来从2000年Graz的论文中的算法出发,讨论三种形式间的联系和不同。 假设我们采集脑电数据为{(X(i),y(i))}i=1Nt\{(\mathbf{X}^{(i)},y^{(i)})\}_{i=1}^{N_t}{(X(i),y(i))}i=1Nt,其中X(i)∈RNc×Ns\mathbf{X}^{(i)} \in \mathbb{R}^{N_c \times N_s}X(i)∈RNc×Ns是第iii个样本,NcN_cNc是EEG导联个数,NsN_sNs是采样时间点的个数,y(i)y^{(i)}y(i)是iii个样本的标签,NtN_tNt为总样本个数。第iii个样本的协方差矩阵C(i)∈RNc×Nc\mathbf{C}^{(i)} \in \mathbb{R}^{N_c \times N_c}C(i)∈RNc×Nc为(所有样本均经过零均值处理): C(i)=1Nt−1X(i)(X(i))T\begin{equation} \mathbf{C}^{(i)} = \frac{1}{N_t-1}\mathbf{X}^{(i)}\left(\mathbf{X}^{(i)}\right)^T \end{equation} C(i)=Nt−11X(i)(X(i))T 第lll类的平均协方差矩阵C‾Cl\overline{\mathbf{C}}\vphantom{C}^lCCl为: C‾Cl=1∣Il∣∑i∈IlC(i)tr(C(i))\begin{equation} \overline{\mathbf{C}}\vphantom{C}^l = \frac{1}{|\mathcal{I}_l|} \sum_{i \in \mathcal{I}_l} \frac{\mathbf{C}^{(i)}}{\mathrm{tr}\left(\mathbf{C}^{(i)}\right)} \end{equation} CCl=∣Il∣1i∈Il∑tr(C(i))C(i) 其中Il\mathcal{I}_lIl是标签为lll的样本索引集合,∣Il∣|\mathcal{I}_l|∣Il∣则是集合中样本的个数,tr(⋅)\mathrm{tr}\left(\cdot\right)tr(⋅)求矩阵的迹。 为什么要使用tr(⋅)\mathrm{tr}\left(\cdot\right)tr(⋅)来对协方差矩阵实现迹归一化? 1990年Koles等人的文章中指出,迹归一化的目的是为了消除"被试间脑电信号幅值的变化",注意到Koles等人的主要目的是区分健康人群和精神疾病人群,而个体的脑电幅值是有差异的。方差可以表征信号在时域上的能量高低,不同人群的协方差矩阵的绝对值不同。为了消除这种差异带来的影响,利用tr()\mathrm{tr}()tr()函数求得所有导联的总体能量,并对协方差矩阵迹归一化,从而安排除不同个体带来的干扰。Graz小组对同一个体不同试次的数据沿用了这种归一化方式,试图消除试次间的差异,发现也有一定的作用,这种迹归一化方式就一直流传下来。 然而,有些分析显示这种归一化方式会不利于最终的空间滤波器排序,建议不要使用迹归一化。实践中使不使用迹归一化还是要具体问题具体分析。我的感觉是没有必要在这里加入迹归一化,因为很多时候EEG预处理阶段已经使用了各种归一化手段来减弱噪声的影响。 接下来构建复合协方差矩阵C‾Cc\overline{\mathbf{C}}\vphantom{C}_cCCc,并特征值分解,构建白化(whitening)矩阵P\mathbf{P}P: C‾Cc=C‾C1+C‾C2=UcΛc(Uc)TP=(Λc)−1/2(Uc)T\begin{equation} \begin{split} \overline{\mathbf{C}}\vphantom{C}^c &= \overline{\mathbf{C}}\vphantom{C}^1 + \overline{\mathbf{C}}\vphantom{C}^2\\ &= \mathbf{U}^c \mathbf{\Lambda}^c \left(\mathbf{U}^c\right)^T\\ \mathbf{P} &= \left(\mathbf{\Lambda}^c\right)^{-1/2}\left(\mathbf{U}^c\right)^T \end{split} \end{equation} CCcP=CC1+CC2=UcΛc(Uc)T=(Λc)−1/2(Uc)T 其中Uc\mathbf{U}^cUc是特征向量矩阵(每一列是特征向量),Λc\mathbf{\Lambda}^cΛc是由特征值组成的对角矩阵。P\mathbf{P}P是白化矩阵,使得PC‾Cc(P)T=I\mathbf{P}\overline{\mathbf{C}}\vphantom{C}^c\left(\mathbf{P}\right)^T= \mathbf{I}PCCc(P)T=I成立,注意到: I=PC‾Cc(P)T=PC‾C1(P)T+PC‾C2(P)T=S1+S2S1=PC‾C1(P)TS2=PC‾C2(P)T\begin{equation} \begin{split} \mathbf{I} &= \mathbf{P}\overline{\mathbf{C}}\vphantom{C}^c\left(\mathbf{P}\right)^T\\ &= \mathbf{P}\overline{\mathbf{C}}\vphantom{C}^1\left(\mathbf{P}\right)^T + \mathbf{P}\overline{\mathbf{C}}\vphantom{C}^2\left(\mathbf{P}\right)^T\\ &= \mathbf{S}^1 + \mathbf{S}^2\\ \mathbf{S}^1 &= \mathbf{P}\overline{\mathbf{C}}\vphantom{C}^1\left(\mathbf{P}\right)^T\\ \mathbf{S}^2 &= \mathbf{P}\overline{\mathbf{C}}\vphantom{C}^2\left(\mathbf{P}\right)^T \end{split} \end{equation} IS1S2=PCCc(P)T=PCC1(P)T+PCC2(P)T=S1+S2=PCC1(P)T=PCC2(P)T 对S1\mathbf{S}^1S1或S2\mathbf{S}^2S2做特征值分解,得到最终的空间滤波器W\mathbf{W}W: S1=UΛ1(U)TS2=UΛ2(U)TW=PTU\begin{equation} \begin{split} \mathbf{S}^1 &= \mathbf{U} \mathbf{\Lambda}^1 \left(\mathbf{U}\right)^T\\ \mathbf{S}^2 &= \mathbf{U} \mathbf{\Lambda}^2 \left(\mathbf{U}\right)^T\\ \mathbf{W} &= \mathbf{P}^T\mathbf{U} \end{split} \end{equation} S1S2W=UΛ1(U)T=UΛ2(U)T=PTU 其中S1\mathbf{S}^1S1和S2\mathbf{S}^2S2具有相同的特征向量U\mathbf{U}U(这也是共空间模式名称的由来),这里假设U\mathbf{U}U的每一列是按照Λ1\mathbf{\Lambda}^1Λ1中的特征值从大到小排列的,可以看出Λ2\mathbf{\Lambda}^2Λ2中的特征值是从小到大排列的,满足Λ1+Λ2=I\mathbf{\Lambda}^1+\mathbf{\Lambda}^2=\mathbf{I}Λ1+Λ2=I的关系。 为什么S1\mathbf{S}^1S1和S2\mathbf{S}^2S2具有同样的特征向量和此消彼长的特征值关系? 这一点可以简单的证明如下: 假设uj\mathbf{u}_juj和λj1\lambda_j^{1}λj1分别是S1\mathbf{S}^1S1的特征向量和特征值,即: S1uj=λj1uj\begin{equation} \mathbf{S}^1\mathbf{u}_j=\lambda_j^{1}\mathbf{u}_j \end{equation} S1uj=λj1uj 注意到S1+S2=I\mathbf{S}^1+\mathbf{S}^2=\mathbf{I}S1+S2=I,把上式中的S1\mathbf{S}^1S1置换掉可得: (I−S2)uj=λj1uj\begin{equation} \left(\mathbf{I}-\mathbf{S}^2\right)\mathbf{u}_j=\lambda_j^{1}\mathbf{u}_j \end{equation} (I−S2)uj=λj1uj 把上式变形一下可得: S2uj=(1−λj1)uj\begin{equation} \mathbf{S}^2\mathbf{u}_j=(1-\lambda_j^{1})\mathbf{u}_j \end{equation} S2uj=(1−λj1)uj 显然uj\mathbf{u}_juj也是S2\mathbf{S}^2S2的特征向量,只不过其特征值为1−λj11-\lambda_j^{1}1−λj1。 脑-机接口中的空间滤波器是一组作用于EEG导联信号的向量,目的是为了加强空间分辨率或信噪比,可以简单理解为对导联信号的线性组合。事实上,不少空间滤波器本质上就是某些特征值分解问题的特征向量。 以上就是原始CSP算法的基本内容,在得到空间滤波器矩阵W\mathbf{W}W后(W\mathbf{W}W的每一列都是一个空间滤波器),选择前后各mmm个空间滤波器构建特征向量x~\tilde{\mathbf{x}}x~如下: W~=[w1,⋯ ,wm,wNc−m+1,⋯ ,wNc]x~=log(diag(W~TXXTW~)tr(W~TXXTW~))\begin{equation} \begin{split} \tilde{\mathbf{W}} &= \begin{bmatrix} \mathbf{w}_1, \cdots, \mathbf{w}_m, \mathbf{w}_{N_c-m+1}, \cdots, \mathbf{w}_{N_c} \end{bmatrix}\\ \tilde{\mathbf{x}} &= \mathrm{log}\left(\frac{\mathrm{diag}\left(\tilde{\mathbf{W}}^T\mathbf{X}\mathbf{X}^T\tilde{\mathbf{W}}\right)}{\mathrm{tr}\left(\tilde{\mathbf{W}}^T\mathbf{X}\mathbf{X}^T\tilde{\mathbf{W}}\right)}\right) \end{split} \end{equation} W~x~=[w1,⋯,wm,wNc−m+1,⋯,wNc]=logtr(W~TXXTW~)diag(W~TXXTW~) 其中wm\mathbf{w}_mwm表示W\mathbf{W}W的第mmm列,W~\tilde{\mathbf{W}}W~是最终选定的空间滤波器组,diag(⋅)\mathrm{diag}\left(\cdot\right)diag(⋅)是矩阵主对角线上的元素,log(⋅)\mathrm{log}\left(\cdot\right)log(⋅)对每个元素做对数变换,其主要目的是使数据近似正太分布。获得特征向量x~\tilde{\mathbf{x}}x~后,则可以使用线性判别分析(Linear Discriminant Analysis,LDA)、支持向量机(Support Vector Machine,SVM)等常见的机器学习模型构建分类器。 以上就是原始CSP算法的基本内容,简单回顾一下CSP算法,不难发现CSP实质求解的是这样一个问题,寻找正交矩阵W\mathbf{W}W对角化C‾C1\overline{\mathbf{C}}\vphantom{C}^1CC1和C‾C2\overline{\mathbf{C}}\vphantom{C}^2CC2,使得以下条件成立: WTC‾C1W=Λ1WTC‾C2W=Λ2Λ1+Λ2=I\begin{equation} \begin{split} \mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W} &= \mathbf{\Lambda}^1\\ \mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^2\mathbf{W} &= \mathbf{\Lambda}^2\\ \mathbf{\Lambda}^1 + \mathbf{\Lambda}^2 &= \mathbf{I}\\ \end{split} \end{equation} WTCC1WWTCC2WΛ1+Λ2=Λ1=Λ2=I 让我们对以上的公式做一些变换,把第一个和第二个公式相加: WT(C‾C1+C‾C2)W=Λ1+Λ2=I\begin{equation} \begin{split} \mathbf{W}^T\left(\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2\right)\mathbf{W} &= \mathbf{\Lambda}^1 + \mathbf{\Lambda}^2\\ &= \mathbf{I}\\ \end{split} \end{equation} WT(CC1+CC2)W=Λ1+Λ2=I 又因为W\mathbf{W}W是正交矩阵,故W−1=WT\mathbf{W}^{-1}=\mathbf{W}^TW−1=WT,从而: (C‾C1+C‾C2)W=W\begin{equation} \left(\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2\right)\mathbf{W} = \mathbf{W} \end{equation} (CC1+CC2)W=W 把上式代入C‾C1W=WΛ1\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W}=\mathbf{W}\mathbf{\Lambda}^1CC1W=WΛ1右边的W\mathbf{W}W,可得: C‾C1W=(C‾C1+C‾C2)WΛ1\begin{equation} \overline{\mathbf{C}}\vphantom{C}^1\mathbf{W} = \left(\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2\right)\mathbf{W}\mathbf{\Lambda}^1 \end{equation} CC1W=(CC1+CC2)WΛ1 这个式子看起来很像特征向量定义的公式C‾C1W=WΛ1\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W}=\mathbf{W}\mathbf{\Lambda}^1CC1W=WΛ1,只不过等式右边多了一个矩阵C‾C1+C‾C2\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2CC1+CC2。这类形式的特征值求解问题叫广义特征值问题,求解广义特征值问题是脑-机接口领域传统空间滤波方法的基础,大量的算法都可以转化为这一形式。 第二种形式 CSP的第二种形式与C‾C1W=(C‾C1+C‾C2)WΛ1\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W} = \left(\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2\right)\mathbf{W}\mathbf{\Lambda}^1CC1W=(CC1+CC2)WΛ1密切相关,首先我们需要了解一个数学概念广义雷利商(generalized Rayleigh quotient)。 广义雷利商λ\lambdaλ长这样: λ=wTAwwTBwA,B⪰0\begin{equation} \begin{split} \lambda =\frac{\mathbf{w}^T\mathbf{A}\mathbf{w}}{\mathbf{w}^T\mathbf{B}\mathbf{w}}\\ \mathbf{A}, \mathbf{B} \succeq 0\\ \end{split} \end{equation} λ=wTBwwTAwA,B⪰0 其中A\mathbf{A}A和B\mathbf{B}B为半正定矩阵,w\mathbf{w}w是列向量。 如果我们求如下广义雷利商的优化问题,就会有一些有趣的结果: maxwwTAwwTBw\begin{equation} \begin{split} \max_{\mathbf{w}} \frac{\mathbf{w}^T\mathbf{A}\mathbf{w}}{\mathbf{w}^T\mathbf{B}\mathbf{w}} \end{split} \end{equation} wmaxwTBwwTAw 寻找w\mathbf{w}w使得λ\lambdaλ最大,通常令wTBw=1\mathbf{w}^T\mathbf{B}\mathbf{w}=1wTBw=1,在数学上可以等价为求解下式: Aw=λBw\begin{equation} \mathbf{A}\mathbf{w} = \lambda\mathbf{B}\mathbf{w} \end{equation} Aw=λBw 这个公式就是上一节提到的广义特征值问题,也就是说,寻找w\mathbf{w}w使广义雷利商最大的优化问题可以等价为求解A\mathbf{A}A和B\mathbf{B}B的广义特征值问题。如果我们继续寻找能够使λ\lambdaλ第二大、第三大的w\mathbf{w}w,就会发现只要解出广义特征值问题的矩阵形式即可: AW=BWΛ\begin{equation} \mathbf{A}\mathbf{W} = \mathbf{B}\mathbf{W}\mathbf{\Lambda} \end{equation} AW=BWΛ 不难发现,上一节中推导的CSP求解问题可以变形为求解广义雷利商问题: C‾C1W=(C‾C1+C‾C2)WΛ1 ⟺ arg maxWtr(WTC‾C1W)tr(WT(C‾C1+C‾C2)W)\begin{equation} \begin{split} \overline{\mathbf{C}}\vphantom{C}^1\mathbf{W} = \left(\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2\right)\mathbf{W}\mathbf{\Lambda}^1 \ \ \Longleftrightarrow \ \ \argmax_{\mathbf{W}} \frac{\mathrm{tr}\left(\mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W}\right)}{\mathrm{tr}\left(\mathbf{W}^T\left(\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2\right)\mathbf{W}\right)} \end{split} \end{equation} CC1W=(CC1+CC2)WΛ1 ⟺ Wargmaxtr(WT(CC1+CC2)W)tr(WTCC1W) 其中应满足WT(C‾C1+C‾C2)W=I\mathbf{W}^T\left(\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2\right)\mathbf{W}=\mathbf{I}WT(CC1+CC2)W=I的约束条件。 这就是CSP常见的第二种形式,它跟原始形式在数学上相互等价,由于分母在约束下是单位矩阵,也常写作如下优化问题: arg maxWtr(WTC‾C1W)s.t.WT(C‾C1+C‾C2)W=I\begin{equation} \begin{split} \argmax_{\mathbf{W}}\quad &\mathrm{tr}\left(\mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W}\right)\\ \textrm{s.t.}\quad &\mathbf{W}^T\left(\overline{\mathbf{C}}\vphantom{C}^1+\overline{\mathbf{C}}\vphantom{C}^2\right)\mathbf{W} = \mathbf{I}\\ \end{split} \end{equation} Wargmaxs.t.tr(WTCC1W)WT(CC1+CC2)W=I 第三种形式 CSP的第三种表述形式需要绕点弯路。首先还是从CSP的原始形式出发,即寻找正交矩阵W\mathbf{W}W使得以下条件成立: WTC‾C1W=Λ1WTC‾C2W=Λ2Λ1+Λ2=I\begin{equation} \begin{split} \mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W} &= \mathbf{\Lambda}^1\\ \mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^2\mathbf{W} &= \mathbf{\Lambda}^2\\ \mathbf{\Lambda}^1 + \mathbf{\Lambda}^2 &= \mathbf{I}\\ \end{split} \end{equation} WTCC1WWTCC2WΛ1+Λ2=Λ1=Λ2=I 在第二个公式的左右两边同时右乘矩阵W−1(C‾C2)−1\mathbf{W}^{-1}\left(\overline{\mathbf{C}}\vphantom{C}^2\right)^{-1}W−1(CC2)−1,可以得到: WT=Λ2W−1(C‾C2)−1\begin{equation} \mathbf{W}^T=\mathbf{\Lambda}^2\mathbf{W}^{-1}\left(\overline{\mathbf{C}}\vphantom{C}^2\right)^{-1} \end{equation} WT=Λ2W−1(CC2)−1 将该式带入WTC‾C1W=Λ1\mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W} = \mathbf{\Lambda}^1WTCC1W=Λ1,替换掉WT\mathbf{W}^TWT,可得: Λ2W−1(C‾C2)−1C‾C1W=Λ1\begin{equation} \mathbf{\Lambda}^2\mathbf{W}^{-1}\left(\overline{\mathbf{C}}\vphantom{C}^2\right)^{-1}\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W} = \mathbf{\Lambda}^1 \end{equation} Λ2W−1(CC2)−1CC1W=Λ1 左右两边左乘C‾C2W(Λ2)−1\overline{\mathbf{C}}\vphantom{C}^2 \mathbf{W} \left(\mathbf{\Lambda}^2\right)^{-1}CC2W(Λ2)−1,有: C‾C1W=C‾C2W(Λ2)−1Λ1=C‾C2WΛΛ=(Λ2)−1Λ1\begin{equation} \begin{split} \overline{\mathbf{C}}\vphantom{C}^1\mathbf{W} &= \overline{\mathbf{C}}\vphantom{C}^2 \mathbf{W} \left(\mathbf{\Lambda}^2\right)^{-1} \mathbf{\Lambda}^1\\ &= \overline{\mathbf{C}}\vphantom{C}^2 \mathbf{W} \mathbf{\Lambda}\\ \mathbf{\Lambda} &= \left(\mathbf{\Lambda}^2\right)^{-1} \mathbf{\Lambda}^1\\ \end{split} \end{equation} CC1WΛ=CC2W(Λ2)−1Λ1=CC2WΛ=(Λ2)−1Λ1 没错,我们又推出了熟悉的广义特征值问题,再考虑广义雷利商与之的联系,可以得到CSP的第三种形式: arg maxWtr(WTC‾C1W)s.t.WTC‾C2W=I\begin{equation} \begin{split} \argmax_{\mathbf{W}}\quad &\mathrm{tr}\left(\mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^1\mathbf{W}\right)\\ \textrm{s.t.}\quad &\mathbf{W}^T\overline{\mathbf{C}}\vphantom{C}^2\mathbf{W} = \mathbf{I}\\ \end{split} \end{equation} Wargmaxs.t.tr(WTCC1W)WTCC2W=I 相比CSP的原始形式和第二种形式,第三种形式更适合从直观上解释CSP在运动想象上有效的原因。运动想象会产生事件相关同步(ERS)和事件相关去同步(ERD)的现象,简单来说就是从电信号上看,某些脑区能量升高,某些脑区能量降低,故能量变化才是运动想象分类的关键特征。而方差可以看作一导信号能量的高低(协方差矩阵则是多导信号的综合反应),因此CSP的第三种形式实质体现的是这样一个问题: 寻找一种变换方式w\mathbf{w}w,使得变换后任务1的能量(wTC‾C1w\mathbf{w}^T\overline{\mathbf{C}}\vphantom{C}^1\mathbf{w}wTCC1w)和任务2的能量(wTC‾C2w\mathbf{w}^T\overline{\mathbf{C}}\vphantom{C}^2\mathbf{w}wTCC2w)差异最大化(其比值最大)。 CSP的这种特性恰好和运动想象产生的神经机制变化现象一致,CSP对能量特征做转换,从而强化了不同任务间能量的差异。 关于CSP的第三种形式,最后还需要注意的一点是其同CSP的第二种形式(或原始形式)并不完全等价,我们在推导第三种形式过程种始终没有用到这样一个约束条件Λ1+Λ2=I\mathbf{\Lambda}^1 + \mathbf{\Lambda}^2 = \mathbf{I}Λ1+Λ2=I。 这表明,第三种形式是CSP的一种泛化形式,其和CSP原始形式和第二种表述的差异仅在于特征值Λ\LambdaΛ不要求在0~1的范围内,具体来说,它们的特征值间存在这样一种关系: Λ=(Λ2)−1Λ1Λ1=(Λ+I)−1ΛΛ2=(Λ+I)−1\begin{equation} \begin{split} \mathbf{\Lambda} &= \left(\mathbf{\Lambda}^2\right)^{-1} \mathbf{\Lambda}^1\\ \mathbf{\Lambda}^1 &= (\mathbf{\Lambda} + I)^{-1}\mathbf{\Lambda}\\ \mathbf{\Lambda}^2 &= (\mathbf{\Lambda} + I)^{-1}\\ \end{split} \end{equation} ΛΛ1Λ2=(Λ2)−1Λ1=(Λ+I)−1Λ=(Λ+I)−1 实现分析 CSP作为经典算法有各种实现,这里主要分析MNE的CSP源码,看看有啥可以学习的地方。 空间滤波器的选择 基本上,目前CSP算法中m的选择方案大多是根据经验选择(通常选择2~4)个。MNE的CSP选择了第二种形式的CSP算法,最后求解的特征值范围在0~1之间,因此可以对∣λi1−0.5∣\left| \lambda_i^1 - 0.5 \right|λi1−0.5先排序再取前M个成分的特征向量组成空间滤波器组W~\tilde{\mathbf{W}}W~(与前后各m个的做法有些许差别,但实践中很难有显著性上的差异,这种做法相对方便一些)。 实际上,针对2分类CSP算法,特征值与类平均协方差矩阵间黎曼距离在各个特征向量分量上的投影长度密切相关,具体的证明细节可以看Alexandre Barachant的这篇会议,对于C‾C1\overline{\mathbf{C}}\vphantom{C}^1CC1和C‾C2\overline{\mathbf{C}}\vphantom{C}^2CC2,有如下关系: δR(C‾C1,C‾C2)=∑i=1Nclog2(λi11−λi1)\begin{equation} \mathrm{\delta}_R\left(\overline{\mathbf{C}}\vphantom{C}^1,\overline{\mathbf{C}}\vphantom{C}^2\right) = \sqrt{\sum_{i=1}^{N_c} \mathrm{log}^2\left(\frac{\lambda_i^1}{1-\lambda_i^1}\right)} \end{equation} δR(CC1,CC2)=i=1∑Nclog2(1−λi1λi1) 也就是说,我们可以选定一个阈值ϵ\epsilonϵ,将特征值及特征向量按log2(λ1−λ)\mathrm{log}^2\left(\frac{\lambda}{1-\lambda}\right)log2(1−λλ)从大到小排序,选取最小的M使下式成立: ∑i=1Mlog2(λi11−λi1)δR(C‾C1,C‾C2)≥ϵ\begin{equation} \frac{\sqrt{\sum_{i=1}^{M} \mathrm{log}^2\left(\frac{\lambda_i^1}{1-\lambda_i^1}\right)}}{\mathrm{\delta}_R\left(\overline{\mathbf{C}}\vphantom{C}^1,\overline{\mathbf{C}}\vphantom{C}^2\right)} \ge \epsilon \end{equation} δR(CC1,CC2)∑i=1Mlog2(1−λi1λi1)≥ϵ 再将前M个特征向量组成空间滤波器组W~\tilde{\mathbf{W}}W~。当ϵ=0.9\epsilon=0.9ϵ=0.9时,我们可以说选定的空间滤波器组可以贡献大约90%左右的2类之间的黎曼距离。 最后一种常用的空间滤波器选择方法是计算空间滤波后的特征同标签之间的互信息,再按互信息从大到小排列选择前M个空间滤波器。互信息常用于CSP的衍生算法FBCSP的特征筛选过程,也可以用于CSP(MNE中的CSP也提供了互信息的排序选项)。至于哪一种选择方法是最优的,目前似乎还没有定论(我感觉)。 协方差矩阵正则化 CSP中的正则化方法主要是对协方差矩阵做正则化处理,从十几年前开始,BCI研究者就在协方差矩阵的正则化处理上做了大量的工作,有些正则化方法与BCI的变异性问题也有着密切的联系,因此这一方面的正则化方法展开来讲就收不住啦。我们这里介绍的正则化方法的目的非常单纯,就是为了解决EEG中可能存在的协方差矩阵非正定的问题。 一般而言,本文的第一个公式C(i)\mathbf{C}^{(i)}C(i)在大多数情况下都是正定的。所谓矩阵M\mathbf{M}M是正定(definite-positive)的,是指对任意非0实向量z\mathbf{z}z,zTMz\mathbf{z}^T\mathbf{M}\mathbf{z}zTMz都是一个正数。不过在实践中,经常会遇到程序报类似b matrix is not definite positive这种的错误,这种情况来源于底层的特征值分解或广义特征值分解函数对于矩阵的正定性有较为严格的要求,但输入的协方差矩阵却不是正定的。 那么为啥协方差矩阵不是正定的呢?大概率可按照以下三种情况逐步排查: EEG采样信号X(i)\mathbf{X}^{(i)}X(i)中的Nc>NsN_c \gt N_sNc>Ns,即导联多于采样点 虽然导联多于采样点,但X(i)\mathbf{X}^{(i)}X(i)的秩小于NcN_cNc,即可能存在导联打串或做过共平均参考变换等导致矩阵不满秩的预处理操作 前两条都不满足,则可能是由于数值计算精度上出了问题,比如单个样本的协方差矩阵满足正定性,但平均协方差矩阵却不是正定的(我也不太懂数值计算方面的内容,这一条有待考证,但确实碰到过这样的现象) 总之,为了让计算进行下去,对协方差矩阵做正则化处理是很有必要的,协方差矩阵的正则化就是对协方差矩阵做以下变换: (1−λ)∗C+λ∗μ∗I\begin{equation} (1-\lambda) * \mathbf{C} + \lambda * \mu * \mathbf{I} \end{equation} (1−λ)∗C+λ∗μ∗I 其中λ\lambdaλ是待估计的正则化系数,μ=tr(C)Nc\mu=\frac{\mathrm{tr}(\mathbf{C})}{N_c}μ=Nctr(C)是为了对单位矩阵的数值范围做限定。sklearn的covariance模块中列出了多种正则化处理方法,比如著名的ledoit-wolf正则化、oas正则化等方法,选个顺眼的用就行。
新建单窗口 窗口(windows)是刺激呈现的舞台,任何刺激对象都需要指定其所属的窗口对象。Pyschopy的Window对象位于psychopy.visual模块中,一个最简单的窗口示例如下 # -*- coding: utf-8 -*-from psychopy import visual, event, monitors, coreimport numpy as np# 根据你自己的显示器调整显示器信息mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息mon.setSizePix((1920, 1080)) # 设置显示器分辨率mon.save() # 保存显示器信息win = visual.Window(monitor=mon, size=(800, 600), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=True)event.waitKeys() # 等待按键# 修复或预防原始gamma不能恢复bug(运行Psychopy程序显示器变暗加入以下代码)# Pyschopy 3.0.0 版似乎修复了此bug,如果显示器没有变暗的现象可以不加入以下代码origLUT = np.round(win.backend._origGammaRamp * 65535.0).astype("uint16")origLUT = origLUT.byteswap() / 255.0win.backend._origGammaRamp = origLUTcore.quit() # 退出Psychopy程序 Window对象用size参数申明窗口尺寸为800*600像素;fullscr参数决定是否全屏显示;screen参数决定了窗口在哪个显示器上显示,通常0是主显示器;winType参数决定了Psychopy使用的后端程序,有’pyglet’和’pygame’两种选择(Psychopy官方未来主要采用pyglet作为后端程序,我推荐采用pyglet)。 新建多窗口 Psychopy也可以同时建立多个窗口对象,注意仅pyglet后端支持多窗口行为。下面的代码展示了如何新建两个位于不同位置的窗口 # -*- coding: utf-8 -*-from psychopy import visual, event, monitors, coreimport numpy as np# 根据你自己的显示器调整显示器信息mon = monitors.Monitor( name='my_monitor', width=53.704, # 显示器宽度,单位cm distance=45, # 被试距显示器距离,单位cm gamma=None, # gamma值 verbose=False) # 是否输出详细信息mon.setSizePix((1920, 1080)) # 设置显示器分辨率mon.save() # 保存显示器信息# 窗口1在屏幕左上角win1 = visual.Window(monitor=mon, size=(800, 600), pos=(0, 0), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=True)# 窗口2在屏幕右下角win2 = visual.Window(monitor=mon, size=(800, 600), pos=(1120, 480), fullscr=False, screen=0, winType='pyglet', units='norm', allowGUI=True)event.waitKeys() # 等待按键# 修复或预防原始gamma不能恢复bug(运行Psychopy程序显示器变暗加入以下代码)# Pyschopy 3.0.0 版似乎修复了此bug,如果显示器没有变暗的现象可以不加入以下代码origLUT = np.round(win1.backend._origGammaRamp * 65535.0).astype("uint16")origLUT = origLUT.byteswap() / 255.0win1.backend._origGammaRamp = origLUTwin2.backend._origGammaRamp = origLUTcore.quit() # 退出Psychopy程序 pos参数调整窗口在屏幕上显示的位置,单位始终为像素,这里的坐标系不同于Psychopy的坐标系,以屏幕的左上角为原点,向下和向右分别为y轴和x轴的正方向。Window对象有很多可调整的参数和行为,具体细节可见官方文档Window API
Psychopy的坐标系统 Psychopy提供了5种不同的坐标单位(unit),使用者只需提供刺激对应的坐标单位,Psychopy会自动计算刺激所对应的像素点范围。这种多坐标单位的好处在于,能够开发和设备无关的刺激呈现,不需要每次实验都对刺激的大小和呈现位置进行调整。其劣势则是需要精心挑选刺激对应的坐标单位,有时还要在不同单位间进行转换,一不小心就容易出错。 无论何种单位,Psychopy的坐标系统始终以屏幕中心为原点(0, 0),原点向上为正y轴,原点向右为正x轴。不同的Psychopy坐标单位的所需参考对象不同,例如norm、height的坐标单位是针对视窗对象(window),而cm、deg、degFlat、degFlatPos、pix则是针对屏幕(screen)。 归一化单位(norm) 归一化单位可能是最常用的单位之一。在该单位下,window左下角坐标为(-1, -1),window右上角坐标为(1, 1)。如图为长宽均为0.5的三个色块,其中心点分别位于(-0,5, 0)、(0, 0)和(0.5, 0)坐标下,注意window的分辨率为800*600,因此尽管色块的归一化长宽均为0.5,但其长实际为200像素点,宽实际为150像素点,表现为长方形。 像素单位(pix) 像素单位的坐标范围取决于screen的宽、高像素点数,假设screen宽度有w个像素点,高度有h个像素点,则screen左下角坐标为(-w/2, -h/2),右上角坐标为(w/2, h/2)。 厘米单位(cm) 厘米单位的坐标范围取决于screen的宽度和高度,假设screen宽度为w厘米,高度为h厘米,则screen左下角坐标为(-w/2, -h/2),右上角坐标为(w/2, h/2),每cm所代表的像素长度则由screen的像素点数确定。 高度单位(height) 高度单位的坐标范围取决于window的宽高比。无论何种window,y轴的坐标范围始终是从-0.5到0.5。因此,如果window是4:3的尺寸,则window左下角坐标为(-0.6667, -0.5),window右上角坐标为(0.6667, 0.5);如果是16:9的尺寸,则window左下角坐标为(-0.8, -0.5),window右上角坐标为(0.8, 0.5)。如图是800*600的window,色块的长宽均为1,则色块会占满y轴方向的所有空间。 视角(deg, degFlatPos, degFlat) 视角单位是五种单位种最复杂的坐标单位,使用该单位,不仅要知道屏幕的大小、像素点的多少,还要知道被试距离屏幕的距离,Psychopy提供三种不同的视角单位deg、degFlatPos和degFlat。 deg deg单位默认视角在screen所有位置具有相同的像素长度,即在screen边缘位置和中心位置会产生相同大小的刺激图形。采用deg单位可以认为screen是球形曲面,而人眼则是球心,每度视角在screen所投射的像素长度完全相同。上图deg行红绿蓝三色块的长宽均为5度,位置分别为(-25, 10)、(0, 10)、(25, 10)。 degFlatPos degFlatPos在deg的基础上考虑了位置在水平屏幕上的修正,远离屏幕中心的位置,刺激间的间隔越大,但是不改变刺激本身的大小。上图degFlatPos行三色块的参数同deg行完全相同,但因为采用了degFlatPos单位,红蓝色块距离绿色色块的距离要比deg行更大。 degFlat degFlat不仅修正了位置信息,还修正了刺激的大小,因此,远离屏幕中心的位置,刺激尺寸越大,刺激间的间隔也越大。上图degFlat行三色块的参数同deg行完全相同,但因为采用了degFlat单位,不仅红蓝色块距离绿色色块的距离要比deg行更大,红蓝色块的形状也产生了畸变。 Psychopy单位转换 Psychopy提供了不同单位间的转换方法,位于psychopy.tools.unittools模块中。 Psychopy显示器信息设置 以上提到的pix、cm、deg、degFlatPos和degFlag均需要提供显示器信息(尺寸、分辨率等),Psychopy提供两种方式设定显示设备信息。 Moniter Center界面 Anaconda Prompt下切换到%Anaconda的安装目录%\Anaconda\Lib\site-packages\psychopy\monitors目录,输入命令 python MonitorCenter.py Moniter类 Moniter类位于psychopy.monitors模块中,负责显示器参数和刺激环境设置。 # Create my primary monitormon = monitors.Monitor( 'monitor1', width=53.704, # width of the monitor in cm distance=114, # distance from viewer to the screen in cm notes="This is my primary monitor")mon.setSizePix((1920, 1080)) # set pixel size of the monitormon.save() # save the monitor information to disk# Reuse my primary monitormon = monitors.Monitor('monitor1')# Change the distance from viewer to the screenmon.setDistance(200) Monitor.save()函数保存的显示器信息位于%APPDATA%psychopy3monitors文件夹下,保存过一次后可以直接在Monitor类中以name调用。
swolf的博客开通啦!本博客记录我在学习过程中的心得体会。
什么是Psychopy Psychopy是基于Python的心理学实验设计软件,由英国诺丁汉大学的Jon Peirce主持开发。Psychopy结合了OpenGL的图形优势和Python的语法特性,给科学家们提供了快速构建高性能的图形刺激界面的工具。 我为什么选择Psychopy 在我研究生阶段,我做脑机接口实验编写刺激界面的工具主要是Matlab平台的Psychtoolbox。很早之前,我也用过e-Prime,但很快就放弃了。e-Prime提供GUI界面,简单易学,但是无法设计复杂的刺激界面。相比之下,Psychtoolbox能够实现大多数脑-机接口刺激界面,同时基于Matlab平台,集成了大量简单方便的函数,对科研人员的编程要求不高,基本上是科研人员的第一选择。 然而对我而言,我一直不喜欢Matlab,理由有三: Matlab不是一门真正的通用编程语言。Matlab本质是为不懂CS的科研人员设计的编程语言,很难进行普通程序的开发,例如GUI、网络编程等等。脑机接口一个很重要的方面是开发和机器交互的程序,这些程序有时很注重性能,Matlab开发这些功能不太方便。 Matlab是收费的商业软件,其工具包的价格不是穷学生能承受的。尽管天朝存在“破解版”这种Matlab版本,MathWorks公司对科研人员使用破解版也视而不见,但谁知道以后会怎么样呢?为了不受制于人,我决定转向开源软件阵营。 最重要的一点,学Matlab找不到工作。研究生转博士期间,我也跟着校招参加了不少面试。很遗憾,在脑机接口领域,对口的工作几乎没有;就算扩大了从生物医学工程专业来说,大部分工作机会还是集中在医疗图像领域。这些领域的招聘要求中可没有熟练使用Matlab这一项,大部分还是C/C++、Java等通用编程语言。 综上所述,我在博士阶段毅然决然的放弃了Matlab,放弃了以前所有的代码,转向了Python。而在Python平台下,Psychopy几乎是唯一选择(个人认为)。Psychopy目前虽然仍处在开发阶段,还有不少bug,官方文档也不完善,但是官方社区和开发者相当活跃,使用人数也越来越多,借着Python语言的上升势头,我认为不久之后Psychopy很可能成为神经科学及脑机接口设计实验的首选框架。 当然由于Psychopy还很年轻,我在Psychopy实践过程中遇到许多问题。写Psychopy系列博文的目的是记录我的开发经验,给想使用Psyhcopy却遇到问题的朋友提供帮助。 安装Anaconda和Psychopy Anaconda是一个用于科学计算的Python发行版,集成了大量Python科学计算所需的环境库,提供包管理和环境管理的功能,免去了手动安装Python及其各种工具包的麻烦。Anaconda支持Linux、Mac和Windows系统,在Windows下几乎是科学计算的唯一选择。我推荐安装Anaconda或Miniconda的最新版本,Anaconda的下载地址可在其官网或清华TUNA镜像站找到。 Psychopy在2019年10月8号release了3.2.4 (PyPi上为3.2.3),相比3.0.0添加了很多特性,修复了大量bug。尽管官方提供了多种安装方式,我仍建议使用Anaconda或Miniconda安装(pip会存在一些编译依赖缺失问题)。 Psychopy在Python3.6版本下较为稳定,在Anaconda Prompt中键入如下命令创建环境并安装: conda create -n psypy3 python=3.6conda activate psypy3conda install numpy scipy matplotlib pandas pyopengl pillow lxml openpyxl xlrd configobj pyyaml gevent greenlet msgpack-python psutil pytables requests[security] cffi seaborn wxpython cython pyzmq pyserialconda install -c conda-forge pyglet pysoundfile python-bidi moviepy pyosfpip install zmq json-tricks pyparallel sounddevice pygame pysoundcard psychopy_ext psychopy 在python控制台中运行如下命令检查Psychopy版本 import psychopyprint(psychopy.__version__) 使用Psychopy Psychopy提供两种刺激界面设计方式,一种是类似e-Prime的GUI界面Builder,另一种是普通的脚本编写方式Coder。 Builder 在Anaconda Prompt中切换到Psychopy的app安装目录,Windows下通常为cd %Anaconda的安装目录%\Anaconda\Lib\site-packages\psychopy\app,在该目录下运行命令 python psychopyApp.py -b Builder的使用在Builder - building experiments in a GUI文档中有详细的介绍,builder很适合设计一些简单的刺激界面,设计完成的界面也可以转换为Coder中的脚本程序。 Coder 在Anaconda Prompt中切换到Psychopy的app安装目录,Windows下通常为%Anaconda的安装目录%\Anaconda\Lib\site-packages\psychopy\app,在该目录下运行命令 python psychopyApp.py -c Coder的使用在Coder - writing experiments with scripts文档中有详细的介绍,相比Builder,Coder提供的编程设计方式更加灵活,可以实现更为复杂的刺激界面。当然Coder本身只是提供了开发环境,脚本编写可以在任何编辑器下进行,我很少直接使用Coder,通常会使用Pycharm和Sublime Text vscode 来编写程序。 Psychopy相关资源 Psychopy的官方文档更新不算及时,大部分文档还是基于Python2的版本Psychopy官方文档已更新至Python3,并不再支持Python2。官方文档和demo仍然是学习Psychopy的不二之选。如果有问题在官方文档里没有说明,Google也没有相关信息的话,可以去Psychopy的论坛问问或Github提个issue。 Psychopy 官方文档 Psychopy API手册 Psychopy 论坛 Psychopy Github仓库