OS OS_version GPU Drvier
Ubuntu Server 20.04 RTX A6000 515.48.07

Ubuntu Server install to Server

Ubuntu Server

Ubuntu Server iso file download to Kakao-mirror Server.

Flash iso file on Usb

Rufus Software

Flash ubuntu-server iso file to USB using Rufus software.

Install Ubuntu-Server OS on Server-Computer

  1. GNU GRUB
    • Install Ubuntu Server
  2. Select Language
    • English(US)
  3. Select Keyboard configuration
    • English(US)
  4. Network connections
    • Edit IPv4
    • IPv4 Method:
      • Manual
      • Subnet : ..*.0/24
      • Address : ...
      • Gateway : ..*.1
      • Name servers : ..., ...
      • search domains : empty
      • save
  5. Proxy address
    • Pass(Done)
  6. Mirror address
    • Pass(Done)
  7. Select Disk and Disk Size
    • Use an entire disk
      • Done
    • USED DEVICES
      • ubuntu-lv … size -> Entire size
      • Done
  8. Profile setup
  9. Install OpenSSH server
    • Check
    • Done
  10. Featured Server Snaps
    • Pass(Done)

DeActivate Nouveau

Ubuntu 20.04 LTS에서 Nouveau 비활성화 및 CUDA 설치하기

sudo vim /etc/modprobe.d/blacklist.conf
# 아래 추가

blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u

sudo reboot

lsmod |grep nouveau

Language 설정

sudo su

apt-get update && apt-get install -y dialog language-pack-en

sudo update-locale && sudo vi /etc/default/locale

# 아래 추가

LANG="en_US.UTF-8"
LANGUAGE="en_US:en"
LC_ALL="en_US.UTF-8"

Install Nvidia-A6000 Driver

서버에 NVIDIA 드라이버 설치하기 - Ubuntu 16.04 LTS Server(Titan XP)

# Check Nvidia A6000 Drvier for Server

apt-cache search nvidia

sudo apt-get install nvidia-driver-515-server

sudo reboot

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    Off  | 00000000:41:00.0 Off |                  Off |
| 30%   57C    P0    86W / 300W |      0MiB / 49140MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Install Docker.io

https://docs.docker.com/engine/install/ubuntu/

# Install Docker.io
sudo apt-get remove docker docker-engine docker.io containerd runc

sudo apt-get update && sudo apt-get install -y ca-certificates && sudo apt-get install -y curl && sudo apt-get install -y gnupg && sudo apt-get install -y lsb-release

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update && sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Install nvidia-container-toolkit

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

sudo systemctl restart docker

Install Nvidia-Docker

nvidia-docker GitHub

nvidia-docker Install Guide

# Setting up NVIDIA Container Toolkit
## Setup the package repository and the GPG key

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install the nvidia-docker2

sudo apt-get update && sudo apt-get install -y nvidia-docker2

# Restart the docker

sudo systemctl restart docker

# Verify installed correctly

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Add to Docker Group

sudo usermod -aG docker $USER && sudo service docker restart

Pull Docker Image

pytorchlightning Image

docker pull pytorchlightning/pytorch_lightning