當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

centos8安装NVIDIA显卡驱动，docker模式运行机器学习

發布時間：2024/3/13 编程问答 53 豆豆

生活随笔收集整理的這篇文章主要介紹了 centos8安装NVIDIA显卡驱动，docker模式运行机器学习小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.下載驅動

a.查看顯卡版本，版本是1050Ti,需要在官網下載該型號驅動

[root@localhost ~]# lspci|grep -i nvidia 00:10.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) 00:10.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)

進入官網官方驅動 | NVIDIA，選擇符合自己顯卡版本驅動

?安裝

chmod a+x NVIDIA-Linux-x86_64-515.76.run ./NVIDIA-Linux-x86_64-515.76.run

?a.該錯誤提示有內置驅動存在系統，不用管它，點擊回車繼續

b.禁用內置驅動，是否自動創建禁用配置，通知table選中Yes,然后點擊回車

?然后一直回車，等待報錯退出。

cat /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf cat /etc/modprobe.d/nvidia-installer-disable-nouveau.conf [root@localhost ~]# cat /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf # generated by nvidia-installer blacklist nouveau options nouveau modeset=0 [root@localhost ~]# cat /etc/modprobe.d/nvidia-installer-disable-nouveau.conf # generated by nvidia-installer blacklist nouveau options nouveau modeset=0

c.重啟系統，使配置生效，進入系統安裝依賴程序

dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils elfutils-libelf-devel libglvnd-devel dnf install -y epel-release dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) dnf install -y kernel kernel-core kernel-modules

重新再次執行顯卡驅動安裝

./NVIDIA-Linux-x86_64-515.76.run

?查看顯卡信息

可以看到，內核中已經有顯卡模塊

[root@localhost ~]# lsmod|grep nvidia nvidia_drm 69632 0 nvidia_modeset 1142784 1 nvidia_drm nvidia 40812544 1 nvidia_modeset drm_kms_helper 266240 5 drm_vram_helper,bochs_drm,nvidia_drm drm 585728 8 drm_kms_helper,drm_vram_helper,bochs_drm,nvidia,drm_ttm_helper,nvidia_drm,ttm

安裝docker-ce，支持容器內GPU使用

dnf install -y tar bzip2 make automake gcc gcc-c++ vim pciutils elfutils-libelf-devel libglvnd-devel iptables

設置docker-ce倉庫

dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo

安裝docker-ce

dnf install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y

啟動docker并設置開機自啟動

sudo systemctl --now enable docker

測試docker是否正常

sudo docker run --rm hello-world

輸出內容大致如下

Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world 2db29710123e: Pull complete Digest: sha256:18a657d0cc1c7d0678a3fbea8b7eb4918bba25968d3e1b0adebfa71caddbc346 Status: Downloaded newer image for hello-world:latestHello from Docker! This message shows that your installation appears to be working correctly.To generate this message, Docker took the following steps:1. The Docker client contacted the Docker daemon.2. The Docker daemon pulled the "hello-world" image from the Docker Hub.(amd64)3. The Docker daemon created a new container from that image which runs theexecutable that produces the output you are currently reading.4. The Docker daemon streamed that output to the Docker client, which sent itto your terminal.To try something more ambitious, you can run an Ubuntu container with:$ docker run -it ubuntu bashShare images, automate workflows, and more with a free Docker ID:https://hub.docker.com/For more examples and ideas, visit:https://docs.docker.com/get-started/

安裝NVIDIA容器支持套件

設置倉庫

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

?安裝并重啟容器服務

dnf install -y nvidia-docker2 systemctl restart docker

測試容器是否支持GPU

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

輸出內容如下

[root@localhost ~]# sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi Mon Oct 24 12:59:21 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:00:10.0 Off | N/A | | 20% 39C P0 N/A / 75W | 0MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

測試

啟動一個GPU的容器，跑一下測試

docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter

?輸出如下

[root@localhost ~]# docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter [I 01:39:15.201 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret jupyter_http_over_ws extension initialized. Listening on /http_over_websocket [I 01:39:16.364 NotebookApp] Serving notebooks from local directory: /tf [I 01:39:16.364 NotebookApp] Jupyter Notebook 6.4.12 is running at: [I 01:39:16.364 NotebookApp] http://b6f4b9f884f9:8888/?token=5dbb788fda348efc71e58ed07407d83a1ad0b26c5496fdaf [I 01:39:16.364 NotebookApp] or http://127.0.0.1:8888/?token=5dbb788fda348efc71e58ed07407d83a1ad0b26c5496fdaf [I 01:39:16.364 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 01:39:16.384 NotebookApp]

1.新開一個窗口，輸入命令，監聽顯卡信息

watch -n1 nvidia-smi # nvidia-smi -l 1 # 該命令輸出刷屏比較嚴重

2.在瀏覽器輸入服務器的地址:8888，然后輸入token測試

新建一個文件，內容如下

import tensorflow as tf import timeitdef cpu_run():with tf.device('/cpu:0'):cpu_a = tf.random.normal([10000, 1000])cpu_b = tf.random.normal([1000, 2000])c = tf.matmul(cpu_a, cpu_b)return cdef gpu_run():with tf.device('/gpu:0'):gpu_a = tf.random.normal([10000, 1000])gpu_b = tf.random.normal([1000, 2000])c = tf.matmul(gpu_a, gpu_b)return ccpu_time = timeit.timeit(cpu_run, number=10) gpu_time = timeit.timeit(gpu_run, number=10) print("cpu:", cpu_time, " gpu:", gpu_time)

當執行運行的時候，可以看到GPU使用?

?容器使用tensorflow已經可以正常使用

查看物理設備信息

>>> import tensorflow as tf >>> tf.config.experimental.list_physical_devices() [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] >>>

遇到的錯誤

?1.kvm虛機安裝的系統，導致CPU識別失敗報錯，Python進程直接掛了

Aborted (core dumped)

[root@localhost ~]# lscpu |grep 'Model name' Model name: Common KVM processor BIOS Model name: pc-i440fx-6.2

應該是tensorflow不識別該類型CPU，導致失敗

解決：需要將虛擬機的CPU設置為host

[root@localhost ~]# lscpu |grep 'Model name' Model name: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz BIOS Model name: pc-i440fx-6.2

總結

以上是生活随笔為你收集整理的centos8安装NVIDIA显卡驱动，docker模式运行机器学习的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：测试——Monkey测试的介绍及使用
下一篇：二十个不可不知的 TSM 知识点

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

centos8安装NVIDIA显卡驱动，docker模式运行机器学习

總結