centos8安装NVIDIA显卡驱动,docker模式运行机器学习
生活随笔
收集整理的這篇文章主要介紹了
centos8安装NVIDIA显卡驱动,docker模式运行机器学习
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1.下載驅動
a.查看顯卡版本,版本是1050Ti,需要在官網下載該型號驅動
[root@localhost ~]# lspci|grep -i nvidia 00:10.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) 00:10.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)進入官網官方驅動 | NVIDIA,選擇符合自己顯卡版本驅動
?安裝
chmod a+x NVIDIA-Linux-x86_64-515.76.run ./NVIDIA-Linux-x86_64-515.76.run?a.該錯誤提示有內置驅動存在系統,不用管它,點擊回車繼續
b.禁用內置驅動,是否自動創建禁用配置,通知table選中Yes,然后點擊回車
?然后一直回車,等待報錯退出。
cat /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf cat /etc/modprobe.d/nvidia-installer-disable-nouveau.conf [root@localhost ~]# cat /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf # generated by nvidia-installer blacklist nouveau options nouveau modeset=0 [root@localhost ~]# cat /etc/modprobe.d/nvidia-installer-disable-nouveau.conf # generated by nvidia-installer blacklist nouveau options nouveau modeset=0c.重啟系統,使配置生效,進入系統安裝依賴程序
dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils elfutils-libelf-devel libglvnd-devel dnf install -y epel-release dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) dnf install -y kernel kernel-core kernel-modules重新再次執行顯卡驅動安裝
./NVIDIA-Linux-x86_64-515.76.run?
?
?
?
?
?查看顯卡信息
[root@localhost ~]# nvidia-smi Mon Oct 24 20:36:20 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:00:10.0 Off | N/A | | 20% 38C P0 N/A / 75W | 0MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+可以看到,內核中已經有顯卡模塊
[root@localhost ~]# lsmod|grep nvidia nvidia_drm 69632 0 nvidia_modeset 1142784 1 nvidia_drm nvidia 40812544 1 nvidia_modeset drm_kms_helper 266240 5 drm_vram_helper,bochs_drm,nvidia_drm drm 585728 8 drm_kms_helper,drm_vram_helper,bochs_drm,nvidia,drm_ttm_helper,nvidia_drm,ttm安裝docker-ce,支持容器內GPU使用
dnf install -y tar bzip2 make automake gcc gcc-c++ vim pciutils elfutils-libelf-devel libglvnd-devel iptables設置docker-ce倉庫
dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo安裝docker-ce
dnf install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y啟動docker并設置開機自啟動
sudo systemctl --now enable docker測試docker是否正常
sudo docker run --rm hello-world輸出內容大致如下
Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world 2db29710123e: Pull complete Digest: sha256:18a657d0cc1c7d0678a3fbea8b7eb4918bba25968d3e1b0adebfa71caddbc346 Status: Downloaded newer image for hello-world:latestHello from Docker! This message shows that your installation appears to be working correctly.To generate this message, Docker took the following steps:1. The Docker client contacted the Docker daemon.2. The Docker daemon pulled the "hello-world" image from the Docker Hub.(amd64)3. The Docker daemon created a new container from that image which runs theexecutable that produces the output you are currently reading.4. The Docker daemon streamed that output to the Docker client, which sent itto your terminal.To try something more ambitious, you can run an Ubuntu container with:$ docker run -it ubuntu bashShare images, automate workflows, and more with a free Docker ID:https://hub.docker.com/For more examples and ideas, visit:https://docs.docker.com/get-started/安裝NVIDIA容器支持套件
設置倉庫
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo?安裝并重啟容器服務
dnf install -y nvidia-docker2 systemctl restart docker測試容器是否支持GPU
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi輸出內容如下
[root@localhost ~]# sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi Mon Oct 24 12:59:21 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:00:10.0 Off | N/A | | 20% 39C P0 N/A / 75W | 0MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+測試
啟動一個GPU的容器,跑一下測試
docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter?輸出如下
[root@localhost ~]# docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter [I 01:39:15.201 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret jupyter_http_over_ws extension initialized. Listening on /http_over_websocket [I 01:39:16.364 NotebookApp] Serving notebooks from local directory: /tf [I 01:39:16.364 NotebookApp] Jupyter Notebook 6.4.12 is running at: [I 01:39:16.364 NotebookApp] http://b6f4b9f884f9:8888/?token=5dbb788fda348efc71e58ed07407d83a1ad0b26c5496fdaf [I 01:39:16.364 NotebookApp] or http://127.0.0.1:8888/?token=5dbb788fda348efc71e58ed07407d83a1ad0b26c5496fdaf [I 01:39:16.364 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 01:39:16.384 NotebookApp]1.新開一個窗口,輸入命令,監聽顯卡信息
watch -n1 nvidia-smi # nvidia-smi -l 1 # 該命令輸出刷屏比較嚴重2.在瀏覽器輸入服務器的地址:8888,然后輸入token測試
新建一個文件,內容如下
import tensorflow as tf import timeitdef cpu_run():with tf.device('/cpu:0'):cpu_a = tf.random.normal([10000, 1000])cpu_b = tf.random.normal([1000, 2000])c = tf.matmul(cpu_a, cpu_b)return cdef gpu_run():with tf.device('/gpu:0'):gpu_a = tf.random.normal([10000, 1000])gpu_b = tf.random.normal([1000, 2000])c = tf.matmul(gpu_a, gpu_b)return ccpu_time = timeit.timeit(cpu_run, number=10) gpu_time = timeit.timeit(gpu_run, number=10) print("cpu:", cpu_time, " gpu:", gpu_time)當執行運行的時候,可以看到GPU使用?
?
?容器使用tensorflow已經可以正常使用
查看物理設備信息
>>> import tensorflow as tf >>> tf.config.experimental.list_physical_devices() [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] >>>遇到的錯誤
?1.kvm虛機安裝的系統,導致CPU識別失敗報錯,Python進程直接掛了
Aborted (core dumped) [root@localhost ~]# lscpu |grep 'Model name' Model name: Common KVM processor BIOS Model name: pc-i440fx-6.2應該是tensorflow不識別該類型CPU,導致失敗
解決:需要將虛擬機的CPU設置為host
[root@localhost ~]# lscpu |grep 'Model name' Model name: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz BIOS Model name: pc-i440fx-6.2總結
以上是生活随笔為你收集整理的centos8安装NVIDIA显卡驱动,docker模式运行机器学习的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 测试——Monkey测试的介绍及使用
- 下一篇: 二十个不可不知的 TSM 知识点