1. Install CUDA Toolkit Link to heading

The CUDA Toolkit includes: CUDA, cuDNN, TensorRT, and more.

Download the CUDA Toolkit Download link: CUDA Toolkit Archive

Set up Environment Variables

export PATH=/usr/local/cuda-12.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH

Test the installation
```
nvcc --version # Display CUDA version
```

2. Install Drivers Link to heading

Check if the system detects the NVIDIA GPU:
```
lspci | grep -i nvidia
```
Use ubuntu-drivers to check for the recommended NVIDIA driver version:
```
ubuntu-drivers devices
```
Install the recommended driver: To let the system automatically install the recommended NVIDIA driver:
```
sudo ubuntu-drivers autoinstall
```
If you need to install a specific driver version manually:
```
sudo apt install nvidia-driver-535  # The recommended version on my system is 535
```
After installation, reboot the system:
```
sudo reboot
```

Check if the NVIDIA driver is loaded:

nvidia-smi  # The output should match Step 3 from the first part

At this point, the installation should be complete.

Troubleshooting Link to heading

If there’s no output or you see an error such as:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Run the following command to check if the driver is properly installed:
```
dpkg -l | grep nvidia
```
Look for an entry like nvidia-driver-535.
Check if the NVIDIA module is loaded:
```
lsmod | grep nvidia
```
If there’s no output, try manually loading it:
```
sudo modprobe nvidia
```

If you encounter this error: bash modprobe: ERROR: could not insert 'nvidia': Operation not permitted This is likely due to Secure Boot being enabled.

Check if Secure Boot is enabled:
```
mokutil --sb-state
```
If it shows SecureBoot enabled, you need to disable it, as Secure Boot prevents unsigned drivers from loading.
Disabling Secure Boot:

Method 1: BIOS Settings
1. Restart the computer and enter the BIOS/UEFI settings.
2. Find the Secure Boot option and set it to Disabled.
3. Save the changes and exit the BIOS.
Method 2: MOK Settings
1. Run the following command to disable Secure Boot or register the key:
```
sudo mokutil --disable-validation
```
1. Set a password when prompted.
2. Reboot the system:
```
sudo reboot
```
1. Enter the MOK management interface:
  - “Continue Boot” to proceed with normal startup.
  - “Enroll MOK” to register keys (if you selected to import keys).
  - “Disable Secure Boot” (if you ran mokutil --disable-validation).
  - “Change Password” to change the password.

Finally, check if the NVIDIA driver is properly loaded:
```
nvidia-smi
```

3. Nvidia-container-toolkit Link to heading

This toolkit helps users access/build/run GPU-accelerated applications in containerized environments (like Docker). It includes a runtime library and associated utilities that automatically configure containers to leverage NVIDIA GPUs for efficient GPU acceleration in containerized applications.

Installation

Check if nvidia-container-toolkit is installed:
```
dpkg -l | grep nvidia-container-toolkit
```

Install the toolkit:

sudo apt-get install -y nvidia-container-toolkit

If you encounter the error:

E: Unable to locate package nvidia-container-toolkit

Follow the steps below.

Add NVIDIA GPG key:

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

Add the NVIDIA container toolkit repository:

distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list |
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |
sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Update the package list:
```
sudo apt-get update
```

Install nvidia-container-toolkit:

sudo apt-get install -y nvidia-container-toolkit

Restart Docker service:
```
sudo systemctl restart docker
```

Usage

Start a container with GPU support: When running a Docker container, use the --gpus all flag to enable GPU support, and the -v flag to mount the host system’s CUDA directory to the container. For example, if the host’s CUDA installation path is /usr/local/cuda, use:
```
docker run --gpus all -it \ 
  -v /usr/local/cuda:/usr/local/cuda \
  your_docker_image
```

Set up environment variables inside the container to ensure it can find nvcc:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Test CUDA:
```
nvcc --version
```

4. GPU Power Settings Link to heading

To limit power usage:

nvidia-smi -i 0 -pl 100  # -i 0 for the first GPU, -pl 100 to limit power to 100W

To restore power limits:

nvidia-smi -i 0 -pl 160