CUDA Runtime Error(46): all CUDA-capable devices are busy or unavailable
on Summit
#11
Labels
CUDA Runtime Error(46): all CUDA-capable devices are busy or unavailable
on Summit
#11
Running on Summit with
jsrun -n 1 -r 1 -c 42 -g 6 -a 6 -b rs js_task_info ../../build/src/weak
causesThis is possibly because all GPUs in this configuration are reported to be in
cudaComputeModeExclusiveProcess
, which may only allow certain processes to access certain GPUs, even though all processes have visibility to all GPUs.It may mean that the first MPI rank that tries to
cudaSetDevice
to that GPU gets exclusive access to it.Running with only a single process on the node works:
jsrun -n 1 -r 1 -c 42 -g 6 -a 1 -b rs js_task_info ../../build/src/weak
The text was updated successfully, but these errors were encountered: