-
Notifications
You must be signed in to change notification settings - Fork 23
Run on Cavium ThunderX
- Cavium ThunderX has a number of flaws in their hardware (many of them look like PCI/interrupt related ones), and you should apply some patches to workaround them on the vanilla 4.4 kernels. Currently, some critical patches are already applied, and you can check them with:
$ git log --author="David Daney"
The stock 4.4 kernel device drivers for Mellanox Connect-X3 InfiniBand(IB) cards is so outdated that they are not properly working on our Cavium ThunderX machines; it requires 4.14+ kernel to make them work using vanilla device drivers only. Instead, Mellanox provides their up-to-dated version of device drivers as Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) and we have to use the device drivers to utilize Connect-X3 on Cavium ThunderX.
-
Boot with the popcorn kernel to avoid complicated path setting in the following configurations.
-
Install prerequisite packages
$ sudo apt-get install quilt dkms make gcc coreutils pciutils grep perl procps lsof python-libxml2 libssl-dev
- Download and untar the OFED source
$ wget http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.2-1.2.0.0/MLNX_OFED_SRC-debian-4.2-1.2.0.0.tgz
$ tar xzf MLNX_OFED_SRC-debian-4.2-1.2.0.0.tgz
$ cd MLNX_OFED_SRC-debian-4.2-1.2.0.0
- Let's build and install it. It will take around 10 minutes
$ sudo ./install.pl --kernel-only --without-dkms --without-iser-modules --without-isert-modules --without-srp-modules --without-knem-modules
- Load the newly built OFED device drivers
$ sudo /etc/init.d/openibd restart
-
Try to reboot if the loading fails due to an old module. In this case, you don't have to start the service manually after the boot.
-
Check the modules are properly loaded.
$ ls /sys/class/infiniband
mlx4_0
$ ls /sys/class/net/
... .. ib0 ib1 ...
- ib0 can be configured just as same as Ethernet NICs; give IP address by editing /etc/network/interfaces and reload the NIC.
$ sudo vi /etc/network/interfaces
...
auto ib0
allow-hotplug ib0
iface ib0 inet static
address 10.4.6.32
netmask 255.255.255.0
...
$ sudo ifdown ib0 && sudo ifup ib0
$ sudo ifconfig
enx70886b806129 Link encap:Ethernet HWaddr 70:88:6b:80:61:29
inet addr:10.4.4.32 Bcast:10.4.4.255 Mask:255.255.255.0
...
ib0 Link encap:UNSPEC HWaddr A0-00-02-20-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.4.6.32 Bcast:10.4.6.255 Mask:255.255.255.0
^^^^^^^^^^^^^^^^^^^
...
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
...
-
Congraturation!! At this point, you can use the InfiniBand card using the
msg_socket.ko
with IPoIB (IP over InfiniBand). -
Try to rebuild the driver after deleting DEBS directory if the kernel fails to load modules with complains like following. Also, "Unknown symbol" with err -22 can be resolved by doing so as well.
[ 91.696060] mlx_compat: version magic '4.4.55-popcorn+ SMP mod_unload modversions aarch64' should be '4.4.55-popcorn+ SMP mod_unload aarch64'
[ 91.729824] mlx_compat: version magic '4.4.55-popcorn+ SMP mod_unload modversions aarch64' should be '4.4.55-popcorn+ SMP mod_unload aarch64'
[ 91.768016] mlx_compat: version magic '4.4.55-popcorn+ SMP mod_unload modversions aarch64' should be '4.4.55-popcorn+ SMP mod_unload aarch64'
-
IB message layer module should be rebuilt with the OFED device drivers. However, from my best knowledge, there is no obvious way to compile an external module (
msg_rdma.ko
) atop external modules (MLNX_OFED) that use custom/overridden kernel headers. So, the last resort is to convert the msg_layer module to an OFED submodule. -
Get the kernel source from the OFED source.
$ cd /path/to/MLNX_OFED_SRC-debian-4.2-1.2.0.0
$ cd SOURCES
$ tar xzf mlnx-ofed-kernel_4.2.orig.tar.gz
$ cd mlnx-ofed-kernel-4.2
- Copy the
msg_layer
$ cp -pr /path/to/popcorn/kernel/msg_layer net/
-
This is a proper moment to check the module has a proper IP list in
msg_layer/config.h
. The IP should be that ofib*
IPoIB interfaces, not the ones ofeth*
device. In the above example, use 10.4.6.32 not 10.4.4.32. -
To adapt to an API change in OFED, modify
__setup_rdma_buffer()
inmsg_layer/rdma.c
(around line 771)
$ vi net/msg_layer/rdma.c
/__setup_rdma_buffer
int ret;
+ unsigned int sg_index = 0;
DECLARE_COMPLETION_ONSTACK(done);
...
...
ib_update_fast_reg_key(reg_mr, cb->key);
- ret = ib_map_mr_sg(mr, &sg, 1, PAGE_SIZE);
+ ret = ib_map_mr_sg(mr, &sg, 1, &sg_index, PAGE_SIZE);
if (ret != 1) {
...
- Include
msg_layer
into the build list by appending the following line to the end ofMakefile
$ vi Makefile
...
obj-$(CONFIG_POPCORN_KMSG_RDMA) += net/msg_layer/
- Configure the modules. It will take around 20+ minutes :-(
$ ./configure --with-core-mod --with-ipoib-mod --with-ipoib-cm --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mlx4-mod --with-mlx4_core-mod --with-mlx4_en-mod --with-mlx4_inf-mod
- Let's build and update modules. If the kernel is significantly changed and modules are not properly loaded, you should rebuild and reinstall drivers
$ make -j 96 && sudo make install
- Try to load the message module.
$ sudo insmod net/msg_layer/msg_rdma.ko
- As of November 17, 2017, the official device driver (v5.4.2) for aarch64 does not work. An in-house snapshot of 5.5.0 release candidate seems work but still experiences interrupt mishandling. The suggestion from Dolphin tech support was to turn off the global DMA by setting
ntb_disable_global_dma=1
in/opt/DIS/lib/modules/dis_px.conf