1. Install CUDA Toolkit Link to heading
The CUDA Toolkit includes: CUDA, cuDNN, TensorRT, and more.
-
Download the CUDA Toolkit Download link: CUDA Toolkit Archive
-
Set up Environment Variables
export PATH=/usr/local/cuda-12.2/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH
-
Test the installation
nvcc --version # Display CUDA version
2. Install Drivers Link to heading
-
Check if the system detects the NVIDIA GPU:
lspci | grep -i nvidia
-
Use
ubuntu-drivers
to check for the recommended NVIDIA driver version:ubuntu-drivers devices
-
Install the recommended driver: To let the system automatically install the recommended NVIDIA driver:
sudo ubuntu-drivers autoinstall
If you need to install a specific driver version manually:
sudo apt install nvidia-driver-535 # The recommended version on my system is 535
-
After installation, reboot the system:
sudo reboot
-
Check if the NVIDIA driver is loaded:
nvidia-smi # The output should match Step 3 from the first part
At this point, the installation should be complete.
Troubleshooting Link to heading
-
If there’s no output or you see an error such as:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
-
Run the following command to check if the driver is properly installed:
dpkg -l | grep nvidia
Look for an entry like
nvidia-driver-535
. -
Check if the NVIDIA module is loaded:
lsmod | grep nvidia
-
If there’s no output, try manually loading it:
sudo modprobe nvidia
If you encounter this error:
bash
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
This is likely due to Secure Boot being enabled.
-
Check if Secure Boot is enabled:
mokutil --sb-state
If it shows
SecureBoot enabled
, you need to disable it, as Secure Boot prevents unsigned drivers from loading. -
Disabling Secure Boot:
-
Method 1: BIOS Settings
- Restart the computer and enter the BIOS/UEFI settings.
- Find the Secure Boot option and set it to Disabled.
- Save the changes and exit the BIOS.
-
Method 2: MOK Settings
- Run the following command to disable Secure Boot or register the key:
sudo mokutil --disable-validation
- Set a password when prompted.
- Reboot the system:
sudo reboot
- Enter the MOK management interface:
- “Continue Boot” to proceed with normal startup.
- “Enroll MOK” to register keys (if you selected to import keys).
- “Disable Secure Boot” (if you ran
mokutil --disable-validation
). - “Change Password” to change the password.
- Finally, check if the NVIDIA driver is properly loaded:
nvidia-smi
3. Nvidia-container-toolkit Link to heading
This toolkit helps users access/build/run GPU-accelerated applications in containerized environments (like Docker). It includes a runtime library and associated utilities that automatically configure containers to leverage NVIDIA GPUs for efficient GPU acceleration in containerized applications.
Installation
-
Check if
nvidia-container-toolkit
is installed:dpkg -l | grep nvidia-container-toolkit
-
Install the toolkit:
sudo apt-get install -y nvidia-container-toolkit
If you encounter the error:
E: Unable to locate package nvidia-container-toolkit
Follow the steps below.
-
Add NVIDIA GPG key:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
-
Add the NVIDIA container toolkit repository:
distribution=$(. /etc/os-release; echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
-
Update the package list:
sudo apt-get update
-
Install
nvidia-container-toolkit
:sudo apt-get install -y nvidia-container-toolkit
-
Restart Docker service:
sudo systemctl restart docker
Usage
-
Start a container with GPU support: When running a Docker container, use the
--gpus all
flag to enable GPU support, and the-v
flag to mount the host system’s CUDA directory to the container. For example, if the host’s CUDA installation path is/usr/local/cuda
, use:docker run --gpus all -it \ -v /usr/local/cuda:/usr/local/cuda \ your_docker_image
-
Set up environment variables inside the container to ensure it can find
nvcc
:export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
-
Test CUDA:
nvcc --version
4. GPU Power Settings Link to heading
To limit power usage:
nvidia-smi -i 0 -pl 100 # -i 0 for the first GPU, -pl 100 to limit power to 100W
To restore power limits:
nvidia-smi -i 0 -pl 160