日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

MAC OS下编译tensorflow 2.4.1 - 支持GPU CUDA 10.1和AVX2 FMA

發(fā)布時間:2024/3/12 编程问答 57 豆豆
生活随笔 收集整理的這篇文章主要介紹了 MAC OS下编译tensorflow 2.4.1 - 支持GPU CUDA 10.1和AVX2 FMA 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

步驟

  • 1. 為什么要自己編譯tensorflow?
  • 2. 編譯環(huán)境
    • 2.1 安裝所需軟件
  • 3. 編譯步驟
    • 3.1 安裝python包
    • 3.2 克隆代碼
    • 3.3 修改代碼
    • 3.4 配置編譯選項
    • 3.5 編譯代碼
    • 3.6 打包安裝包
    • 3.7 安裝編譯出來的安裝包
    • 3.8 運行測試
  • 4. 總結(jié)
    • 4.1 參考

1. 為什么要自己編譯tensorflow?

tensorflow官方不再提供macos的GPU安裝包,因為Nvidia也不再提供macos下的顯卡驅(qū)動了。而官方的CPU安裝包也沒有針對AVX2, FMA等指令集優(yōu)化,跑模型會出現(xiàn):

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

據(jù)說啟用AVX2, FMA等指令集可以在CNN模型上提速約40%。

查看CPU支持的指令集的命令為:

sysctl -a | grep "machdep.cpu.*features:"

通常編譯tensorflow有以下2點原因:

  • 安裝了黑蘋果并插了顯卡或MAC插了外置顯卡,希望利用上GPU。
  • CPU環(huán)境下希望利用AVX2, FMA等進行加速。
  • 2. 編譯環(huán)境

    筆者使用的是一臺PC,CPU為Intel i5,GPU為GTX 1050Ti,系統(tǒng)安裝了黑蘋果。由于Nvidia的驅(qū)動最高只能支持到OSX 10.13,所以系統(tǒng)只能安裝High Sierra

    注意:

    要驅(qū)動GPU只能安裝High Sierra;

    只編譯優(yōu)化CPU包可以安裝新版本的macos

    2.1 安裝所需軟件

  • 顯卡驅(qū)動和CUDA 10.1
    這里提供了mac下安裝Nvidia驅(qū)動的快捷方式:https://github.com/Benjamin-Dobell/nvidia-update
    只需要執(zhí)行:
  • bash <(curl -s https://raw.githubusercontent.com/Benjamin-Dobell/nvidia-update/master/nvidia-update.sh)

    安裝完確保系統(tǒng)信息顯示正確。

    到nvidia官網(wǎng)下載CUDA 10.1的osx安裝包并安裝。安裝完System Preferences下會多出一項“CUDA”:

    到nvidia官網(wǎng)下載cudnn 7.6包,并解壓復制文件到CUDA安裝目錄。

  • Xcode 10.1
    從apple developer官網(wǎng)或搜索百度云下載安裝。

  • python 3
    下載并安裝:https://www.python.org/ftp/python/3.7.9/python-3.7.9-macosx10.9.pkg

  • bazel 3.7.2
    這是編譯tensorflow 2.4要求的最低版本。
    下載可執(zhí)行文件:https://github.com/bazelbuild/bazel/releases/download/3.7.2/bazel-3.7.2-darwin-x86_64
    然后鏈接一下到/usr/local/bin,測試一下能否輸出版本號:

  • chmod +x ~/Downloads/bazel-3.7.2-darwin-x86_64 ln -s ~/Downloads/bazel-3.7.2-darwin-x86_64 /usr/local/bin/bazel bazel --version

    3. 編譯步驟

    3.1 安裝python包

    pip3 install -U pip numpy wheel pip3 install -U keras_preprocessing --no-deps

    3.2 克隆代碼

    git clone https://github.com/tensorflow/tensorflow cd tensorflow git checkout v2.4.1

    3.3 修改代碼

    有幾個文件的代碼需要修改一下才能在macos下編譯通過。請到github項目下載patch文件,然后執(zhí)行:

    git am 2.4.1.patch

    3.4 配置編譯選項

    如果編譯GPU包,注意在CUDA Support選項輸入y,否則選N。

    ./configure You have bazel 3.7.2 installed. Please specify the location of python. [Default is /usr/local/bin/python3]: Found possible Python library paths:/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages Please input the desired Python library path to use. Default is [/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages]Do you wish to build TensorFlow with ROCm support? [y/N]: N No ROCm support will be enabled for TensorFlow.Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow.Found CUDA 10.1 in:/usr/local/cuda/lib/Developer/NVIDIA/CUDA-10.1/include Found cuDNN 7 in:/usr/local/cuda/lib/Developer/NVIDIA/CUDA-10.1/includePlease specify a list of comma-separated CUDA compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code. Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.0,3.5,5.0,6.1,7.0WARNING: XLA does not support CUDA compute capabilities lower than 3.5. Disable XLA when running on older GPUs. Do you want to use clang as CUDA compiler? [y/N]: N nvcc will be used as CUDA compiler.Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N Not configuring the WORKSPACE for Android builds.Do you wish to build TensorFlow with iOS support? [y/N]: N No iOS support will be enabled for TensorFlow.Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.--config=mkl # Build with MKL support.--config=mkl_aarch64 # Build with oneDNN support for Aarch64.--config=monolithic # Config for mostly static monolithic build.--config=ngraph # Build with Intel nGraph support.--config=numa # Build with NUMA support.--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.--config=v2 # Build TensorFlow 2.x instead of 1.x. Preconfigured Bazel build configs to DISABLE default on features:--config=noaws # Disable AWS S3 filesystem support.--config=nogcp # Disable GCP support.--config=nohdfs # Disable HDFS support.--config=nonccl # Disable NVIDIA NCCL support. Configuration finished

    3.5 編譯代碼

    使用以下命令來編譯:

    bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2 //tensorflow/tools/pip_package:build_pip_package

    如果遇到下面的錯誤,則需要把bazel-tensorflow/external/com_google_absl/absl/container/internal/compressed_tuple.h中的那2個有問題的函數(shù)注釋掉:

    external/com_google_absl/absl/container/internal/compressed_tuple.h:171:53: error: use 'template' keyword to treat 'Storage' as a dependent template name return (std::move(*this).internal_compressed_tuple::Storage< CompressedTuple, I> ::get()); ^template external/com_google_absl/absl/container/internal/compressed_tuple.h:177:54: error: use 'template' keyword to treat 'Storage' as a dependent template name return (absl::move(*this).internal_compressed_tuple::Storage< CompressedTuple, I> ::get()); ^template 2 errors generated.

    編譯過程很漫長,可能需要8個小時,如果沒有錯誤,結(jié)束時會輸出:

    ... Target //tensorflow/tools/pip_package:build_pip_package up-to-date:bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed time: 17902.809s, Critical Path: 684.61s INFO: 7578 processes: 41 internal, 7537 local. INFO: Build completed successfully, 7578 total actions INFO: Build completed successfully, 7578 total actions

    3.6 打包安裝包

    ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

    會在/tmp/tensorflow_pkg下生成wheel安裝包,如:tensorflow-2.4.1-cp37-cp37m-macosx_10_13_x86_64.whl

    3.7 安裝編譯出來的安裝包

    pip3 install /tmp/tensorflow_pkg/tensorflow-2.4.1-cp37-cp37m-macosx_10_13_x86_64.whl

    3.8 運行測試

    隨便運行一個模型可以看到類似下面的輸出,則說明運行正常:

    2021-02-19 12:27:55.699299: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.10.1.dylib 2021-02-19 12:27:57.935779: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-02-19 12:27:57.959578: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.dylib 2021-02-19 12:27:57.984405: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:902] OS X does not support NUMA - returning NUMA node zero 2021-02-19 12:27:57.984958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1 coreClock: 1.392GHz coreCount: 6 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s 2021-02-19 12:27:57.985295: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.10.1.dylib 2021-02-19 12:27:58.067490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.10.dylib 2021-02-19 12:27:58.067768: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.10.dylib 2021-02-19 12:27:58.121329: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.10.dylib 2021-02-19 12:27:58.143269: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.10.dylib 2021-02-19 12:27:58.234257: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.10.dylib 2021-02-19 12:27:58.288708: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.10.dylib 2021-02-19 12:27:58.372196: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.7.dylib 2021-02-19 12:27:58.372467: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:902] OS X does not support NUMA - returning NUMA node zero 2021-02-19 12:27:58.372987: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:902] OS X does not support NUMA - returning NUMA node zero 2021-02-19 12:27:58.373252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 ...

    如果出現(xiàn)下面的錯誤,則需要升級一下numpy的版本:

    F tensorflow/python/lib/core/bfloat16.cc:714] Check failed: PyBfloat16_Type.tp_base != nullptr Abort trap: 6 pip3 install -U numpy

    4. 總結(jié)

    祝編譯順利!如果遇到什么問題,歡迎與我聯(lián)系!項目的github地址為:https://github.com/evan-wu/tensorflow-macosx-build。歡迎關(guān)注、點贊、加🌟。

    4.1 參考

    • 官方編譯文檔:https://www.tensorflow.org/install/source#macos_1
    • 已有的CPU優(yōu)化包:https://github.com/lakshayg/tensorflow-build
    • 提供了到tensorflow 2.2的代碼patch:https://github.com/TomHeaven/tensorflow-osx-build

    總結(jié)

    以上是生活随笔為你收集整理的MAC OS下编译tensorflow 2.4.1 - 支持GPU CUDA 10.1和AVX2 FMA的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。