-
Notifications
You must be signed in to change notification settings - Fork 1
Compile Guide
Note: A working Makefile is already provided in this repository.
You can change the cluster node names to the cluster you're using and use the existing file.
In order to use DPU-MPI, the additional dependencies must be included in your Makefile.
Ensure that you are compiling both the library itself and the protobuf dependency inside
the protobuf/ folder in the package.
You can put the following in the Makefile:
NANOPB_DIR = protobuf
# Compiler flags to enable all warnings & debug info
CFLAGS = -Wall -O1
CFLAGS += "-I$(NANOPB_DIR)"
# C source code files that are required
CSRC += $(NANOPB_DIR)/common.pb.c
CSRC += $(NANOPB_DIR)/pb_encode.c # The nanopb encoder
CSRC += $(NANOPB_DIR)/pb_decode.c # The nanopb decoder
CSRC += $(NANOPB_DIR)/pb_common.c # The nanopb common parts
LIBS = -lrdmacm -libverbs
The command in the Makefile should be (replace my_program.* with your program):
mpicc $(CFLAGS) get_ip.c dpucommon.c dpulib.c my_program.c $(CSRC) $(LIBS) -o my_program.o
When running the MPI task, run the server task first. This can be accomplished using:
mpirun -np num_hosts --map-by node -H hostnames ./bf.o
Warning: Depending on your cluster networking setup, MPI may choose the onboard interface for transmission when running the job.
The BlueField server includes a basic MPI_Bcast check that ensures the configuration is correct before starting the server.
If you receive the following error, this is an issue with the mpirun configuration and not the program itself.
The failsafe method is to use the TCP BTL, although this is much slower than letting MPI use RDMA.
*** An error occurred in MPI_Bcast
*** reported by process [4009623553,0]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_INTERN: internal error
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
Warning: mpirun for the BlueField task may fail when the command is executed on either the same node or an x86 node.
You may need to allocate an ARM node and use it to perform the mpirun.
Once that is complete you can run the client MPI job. An example is provided in main_host_use_lib.c. This function takes the offset between your host IP and the BlueField IP as a command-line parameter.