I used to do all scientific computing work on Jupyter notebooks. My most common way of debugging was print, which is definitely not the best way to do so. Things got complex when I sometimes had to debug other packages installed in the Python environment. The debug mode provided in Jupyter is not very convenient. But I do like the freedom of running cells in Jupyter, which is important for testing new algorithms, so I don’t want to move my workflow to other IDEs like Pycharm.
My final choice is VS Code with its gorgeous Python extensions.
Virtual Environment
Python has a gorgeous ecosystem of libraries and packages. Virtual environments allow user to specify which versions of dependencies and even versions of Python interpreters. Environments also provide isolation between projectes which is important for solving conficting dependencies.
I am using Conda as my environment management tool due to its ease of use. Specifically, I choose Miniconda which only includes essential components, allowing me to customize my environments from the ground up.
In arch linux, Miniconda is included in AUR, so just install it.
yay -S miniconda3 |
After the installation, it would prompt you to add conda to your terminal:
conda init zsh |
This command would automatically initiate conda for zsh shell (writing things into .zshrc
in your home directory).
You may want to change conda mirros before creating any environments in China. See conda mirror help for more information.
Create Virtual Environments
conda create -n test python=3.10 |
This command creates a virtual environment named ml with python 3.10 installed.
Activate/Deactivate Virtual Environments
The default virtual environment is “base”. Change to our “test” environment is quite simple:
conda activate test |
Or deactivate it:
conda deactivate |
List All Environments
conda env list |
Pretty easy, hah.
Package Management
Although conda is able to handle both package and environment management. I prefer to use pip
to install necessary packages. pip
is included in any conda environment.
Again, change pip mirros before install any packages in China. See pip mirror help for more information.
Install Basic Scientific Packages
pip install numpy scipy pandas scikit-learn |
Install Basic Visualizaiton Packages
pip install matplotlib seaborn pyqt5 |
Here I choose PyQt as the backend of matplotlib.
Install JupyterLab
pip install jupyterlab |
The following commands generate Jupyer server configuration files and set a password, so you can customize it yourself. The configuration files are in the .jupyter
folder in your home directory.
jupyter server --generate-config |
We can start our scientific coding with Jupyter Lab, but I would recommend VS Code for a more comfortable coding experience. Now, let’s just start a Jupyter server without taking you to the default browser page:
jupyter lab ---no-browser |
then we would go back into the VS Code configuration.
VS Code
VS Code is developed by Microsoft. It is a lightweight but powerful code editor which has a rich ecosystem of extensions for nearly every lanaguage.
Here I install the binary release from AUR:
yay -S visual-studio-code-bin |
Basic Extensions
After starting the VS Code, install Python
and Jupyter
extensions from the extension market. And this pretty much sets everything up for Python development automatically.
Python Formatting
A formatter makes your code look clean and readable. In any Python file, hit Ctrl+Shift+P
and type format
, select Format Document With
and hit Python
. If you don’t have a formatter installed, VS Code would pop up a window to sugget you install a formatter. I personally choose black
formatter.
Install Black Formatter
extension. In settings, search Format On Save
and put this on. Then you will be able to format each document automatically every time you hit save it. Below is a picture of how black formatter format your code.
Python Intellisense
The default Python intellisense should work out of the box. If not, go to the Settings
and search for python.languageserver
, change the python language server from default to Pylance or Jedi. Then restart the VS Code. If this still not work, perhaps you need to disable all extensions, restart the VS Code and enable all extensions again.
Python Debugging
The default Jupyter debug only steps through user-written code. To debug library code, open Settings
by Ctrl+,
and search justmycode
, uncheck the option and restart the VS Code.
Run Jupyter Notebook in VS Code
There are many ways to create a Jupyer notebook in VS Code (shortcuts, commands, etc). The most easy way is to create a new file named xxx.ipynb
and you can directly open it in the VS Code.
The first thing is to make sure you select the right Python kernel or environment to run the notebook. You may have multiple choices. Then toggle the Jupyter server selection menu and select Existing server
. You will be prompt to enter your Jupyter password.
Running cells in VS Code is almost the same as in the Jupyter browser app. Triggering debug mode requires clicking the debug cell button alongside each cell. Just like MATLAB, you can set multiple breakpoints and watch how variables change with the program.
Docker for Deep Learning
Docker allows you to package applications and their dependencies into isolated containers. One good reason for using Docker is that official PyTorch releases usually depend on older CUDA and cuDNN versions, whereas Arch-based distributions typically have the latest NVIDIA driver. This may cause unexpected failures during installation. For example, my CUDA version is 12.2 whereas the official suggested CUDA version of PyTorch is 11.8.
You can use Docker to create a container with a different version of NVIDIA CUDA than what you have installed locally on your machine. Docker containers are designed to be isolated environments that can run software with specific dependencies, regardless of what is installed on the host system.
Install Docker
Install docker engine on Arch with these commands:
sudo pacman -S docker docker-compose |
After that, add the current user to docker group:
sudo groupadd docker |
Log out and log in to verify successful installation by:
docker run hello-world |
If everything works, it would pull a small image and print hello world
.
My home directory is quite empty so I want to change docker default images location to my home directory:
sudo systemctl stop docker.service |
Configure data-root
in /etc/docker/daemon.json
:
{ |
Restart docker.service
to apply these changes.
Perhaps you may need docker-compose
:
sudo pacman -S docker-compose |
Install NVIDIA Container Toolkit
This is the most tricky part and I can not ensure my installation steps works for you. First, install NVIDIA Container Toolkit
from AUR:
yay -S nvidia-container-toolkit |
After that, restart docker.service
and type:
docker run --gpus all nvidia/cuda:11.8.0-runtime-ubuntu22.04 nvidia-smi |
and hopefully you can get the same output as of running nvidia-smi
on the host machine.
Be aware of the maximum supported CUDA version for your host NVIDIA driver, check this list.
Build A Docker Image for DL
NVIDIA has pre-build PyTorch images combined with CUDA, CUDNN, etc. Check the release note to see if your requirements have been satisfied already.
I am building my deep learning image based on anibali’s PyTorch image. Here is my Dockerfile
:
FROM anibali/pytorch:2.0.1-cuda11.8 |
This Dockerfile
would build an image making Jupyter available to external users. I personally change the default port to 8889 for compatibility with local Jupyter servers. And my docker-compose.yml
file looks like this:
version: '3' |
So all my stuffs in /home/swolf/MyProjects/
would be a persistent volume to the docker container. Building the image is also simple:
docker-compose build |
To start docker container, use this command:
docker-compose up |
and stop the container after coding:
docker-compose down |
Connect to Jupyter in Container
After starting the container above, open your browser and type the URL http://127.0.0.1:8889/
. Then you can access any resources inside the container from the browser app.
Connect to Container in VS Code
Of course you could open any file in /home/swolf/MyProjects/
with VS Code, but it can not parse packages installed in the container so Go to definitions
or something else may not works well. We actually want to “open” the IDE inside the container. For this, two extensions are needed: Docker
and Dev Containers
.
After that, there would be a small button on the left bottom corner named Open a Remote Window
. Press the button and it would pop up a selection menu prompting you to select the way how to connect to a container.
Since we have started our deep learning container, we choose Attach to Running Container
. VS Code would open a new window, connecting to the running container with a few necessary extensions installed automatically. You may need install Python
and Jupyter
extensions inside the container from the extension market. Then you will be able to perform all deep learning tasks just as you would in a local development environment.