mpi tcp连接报错_MPI分布式编程 --3.OpenMPI多节点运行报错
1. OpenMPI多節點運行報錯問題
問題描述:節點一即host3,通過mpirun調用節點二即host4的mpi程序,報錯如下。
$ mpirun -np 1 --host host4 ./main
[[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 367
[[INVALID],INVALID]-[[59225,0],0] mca_oob_tcp_peer_try_connect: connect to 255.255.255.255:51754 failed: Network is unreachable (101)
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
解決方案
在確保節點一和節點二都能單機運行OpenMPI程序的前提下,檢查兩個節點的OpenMPI版本是否一致。如果不一致,重裝OpenMPI使之版本一致。
參考資料
[1. OpenMPI報錯問題] https://www.slothparadise.com/fix-orte-error-unknown-option-hnp-topo-sig/
總結
以上是生活随笔為你收集整理的mpi tcp连接报错_MPI分布式编程 --3.OpenMPI多节点运行报错的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: oracle精度说明符1~38_Orac
- 下一篇: arduino 勘智k210_如何评价嘉