服务器Ubuntu 16.04 更新NVIDIA显卡驱动-命令行版本及报错完美解决
為何有這種需求?
- 我的tensorflow2.1.0要求CUDA 10.1 + cuDNN 7.6.5版本支持;但對(duì)于CUDA 10.1+而言,必須要將NVIDIA顯卡驅(qū)動(dòng)升級(jí)至>=418.39才行。具體參考:https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components
安裝過程
1.到官網(wǎng)查詢適合的驅(qū)動(dòng)版本
首先,甩出官網(wǎng)下載鏈接:https://www.nvidia.cn/Download/index.aspx?lang=cn
其次,怎么選擇合適的NVIDIA顯卡驅(qū)動(dòng)?需要關(guān)注以下幾點(diǎn):
- 系統(tǒng)?(我是Linux64位系統(tǒng))
- 對(duì)應(yīng)顯卡的版本?(我是Tesla V系列,可以用lspci |grep?-i?nvidia命令,結(jié)果一望便知)
- CUDA toolkit版本(這個(gè)你應(yīng)該提前知道,也可以用nvcc -V查詢,我是CUDA 10.1)
下載.run文件,然后在命令行直接運(yùn)行
sudo ./NVIDIA-Linux-x86_64-418.126.02.run -no-x-check -no-nouveau-check -no-opengl-files
安裝中遇到的錯(cuò)誤合集:
問題1:
An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel. ?This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. ?Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. ?If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.很簡(jiǎn)單,就像原文所述,'nvidia-uvm'程序因故未退出導(dǎo)致按照無法正常進(jìn)行。所以該怎么辦?
執(zhí)行以下命令,查看到底是哪些程序在占用nvidia-uvm。
sudo lsof | grep nvidia.uvm然后得到pid后,使用「sudo kill -9 `pid`」殺掉進(jìn)程。再次運(yùn)行下載下來的.run文件,即可跳過該錯(cuò)誤;
問題2:
The CC version check failed
The kernel was built with gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) , but the current compiler version is cc (Ubuntu 4.8.5-4ubuntu2) 4.8.5.
This may lead to subtle problems; if you are not certain whether the mismatched compiler will be compatible with your kernel, you may wish to abort installation, set the CC environment variable to the name of the compiler used to compile your kernel, and restart installation. (Answer: Abort installation
這個(gè)問題也很簡(jiǎn)單,就像原文說的那樣,該kernel是gcc==5.4.0編譯的,但當(dāng)前編譯器的gcc版本是4.8.5。我們需要安裝并更改gcc編譯器版本。
該怎么做呢?具體步驟就是到官網(wǎng)下載gcc 5.4.0的壓縮文件,在本地解壓之后按順序安裝。參考本文即可完成:https://blog.csdn.net/Marilynviolet/article/details/100009979
問題3:
在安裝過程中會(huì)遇到的一些問題:
- The distribution-provided pre-install script failed! Are you sure you want to continue? 選擇 yes 繼續(xù)。
- Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? 選擇 No 繼續(xù)。
- Nvidia’s 32-bit compatibility libraries? 選擇 No 繼續(xù)。
- Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. 選擇 Yes 繼續(xù)
?
額外參考內(nèi)容:
- 關(guān)于gcc版本的安裝、查詢和切換https://cloud.tencent.com/developer/article/1430839
- 安裝過程中的問題匯總:https://blog.csdn.net/u013832707/article/details/93157805
?
最后想嘮叨一句。由于我這次的安裝和配置是在公司服務(wù)器上進(jìn)行的,所以要大家都停下GPU服務(wù)然后等我操作。我一開始想請(qǐng)教老手來幫忙,但在聊天過程中意識(shí)到不少老手也是按照blog內(nèi)容直接擼罷了。其實(shí)這種配置問題并不難,只是很復(fù)雜,你必須花時(shí)間去上手做才行。畢竟,沒有人一生下來就是老手嘛
總結(jié)
以上是生活随笔為你收集整理的服务器Ubuntu 16.04 更新NVIDIA显卡驱动-命令行版本及报错完美解决的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 九头牛的故事—你就是那个美丽的期待(刘俊
- 下一篇: 笔记:离散时间形式的索洛模型