You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
System: Ubuntu 18.04.6
I follow the instruction following to install openmpi:
wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.1.tar.gz
tar -zxvf openmpi-5.0.1.tar.gz
cd openmpi-5.0.1
./configure --prefix=$HOME/openmpi CC=gcc CXX=g++ --disable-mpi-fortran --disable-mca-dso
make
make install
And I add two lines in the bottom of file ~/.bashrc to update environment variables:
ImportError: libmpi.so.12: cannot open shared object file: No such file or directory
Then I follow the suggest online to install openmpi:
conda install openmpi
Now the error is:
--------------------------------------------------------------------------
The value of the MCA parameter "plm_rsh_agent" was set to a path
that could not be found:
plm_rsh_agent: ssh : rsh
Please either unset the parameter, or check that the path is correct
--------------------------------------------------------------------------
[718c7e141fd5:75498] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error plm_rsh_component.c(327)
[718c7e141fd5:75498] *** Process received signal ***
[718c7e141fd5:75498] Signal: Segmentation fault (11)
[718c7e141fd5:75498] Signal code: Address not mapped (1)
[718c7e141fd5:75498] Failing at address: (nil)
[718c7e141fd5:75498] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f82eb125980]
[718c7e141fd5:75498] *** End of error message ***
[718c7e141fd5:75416] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 716
[718c7e141fd5:75416] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 172
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[718c7e141fd5:75416] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Then I couldn't solve the problem although I have tried many methods.
Could anybody help me? Thanks a lot!
The text was updated successfully, but these errors were encountered:
System: Ubuntu 18.04.6
I follow the instruction following to install openmpi:
And I add two lines in the bottom of file ~/.bashrc to update environment variables:
Then I install mpi4py:
When I run the train.py script, it goes error:
ImportError: libmpi.so.12: cannot open shared object file: No such file or directory
Then I follow the suggest online to install openmpi:
Now the error is:
Then I couldn't solve the problem although I have tried many methods.
Could anybody help me? Thanks a lot!
The text was updated successfully, but these errors were encountered: