avatar

上一篇blog实现了一个基本的cuda程序,本篇记录如何实现矩阵乘法并加速。

CMake is a popular cross-platform build system that allows developers to use a common interface to define a set of rules to build the source code with different compilers, such as GCC, Clang and Visual Studio. In fact, CMake supports not only C/C++, but also other languages such as C# and Fortran. CMake is an essential tool for building software projects and making the whole process easier for anyone.

当今科学计算越来越复杂,各种模型越来越大,使用SVM做分类的时代早已一去不复返,通过GPU加速应用的重要性不言而喻。我在MRI领域工作的这几年,经常碰高度复杂的MRI应用,Matlab计算时间在几小时到几天不等,即使高度优化的C++的程序也需要10至30分钟左右。这些应用的CPU利用率已然接近100%,因此通过GPU进一步优化程序成为最可行的手段。

Total variation (TV) denoising, also known as TV regularization or TV filtering, is a powerful technique widely used in various fields, including medical imaging, computer vision, etc. It removes noises while preserving most important structural features. The first image of black hole, captured by Event Horizon Telescope (EHT), was processed and revealed with this technique in 2019.

Last week I was attempting to implement an in-place fftshift function in c++. I hoped this function could perform shifting along any given dimension of N-dimensional data.

The Moore-Penrose inverse or the pseudoinverse A+Rn×m\mathbf{A}^+ \in \mathbb{R}^{n \times m} of a matrix ARm×n\mathbf{A} \in \mathbb{R}^{m \times n} is a kind of generalization of the inverse matrix to non-square matrices or ill-conditioned matricies. The most confusing part in coding a pinv function is how to choose a appropriate tolerance truncating zero singular values.

I’ve been struggling with calculating the memory usage for a week. Here’s the case: I got a program that needs to estimate how much memory it may consume during runtime with some predefined inputs, such as the size of images, etc. The problem is that the program is so complicated that nearly no one understands the code fully. Not to mention, there are lots of parallel codes in the program, scaling the memory usage by the dynamic number of threads.