Skip to content

Run on Cavium ThunderX

Sang-Hoon Kim edited this page Jun 27, 2018 · 6 revisions

Patches for working around Cavium Erratas

  • Cavium ThunderX has a number of flaws in their hardware (many of them look like PCI/interrupt related ones), and you should apply some patches to workaround them on the vanilla 4.4 kernels. Currently, some critical patches are already applied, and you can check them with:
$ git log --author="David Daney"

Message Layer

InfiniBand

The stock 4.4 kernel device drivers for Mellanox Connect-X3 InfiniBand(IB) cards is so outdated that they are not properly working on our Cavium ThunderX machines; it requires 4.14+ kernel to make them work using vanilla device drivers only. Instead, Mellanox provides their up-to-dated version of device drivers as Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) and we have to use the device drivers to utilize Connect-X3 on Cavium ThunderX.

Installing MLNX OFED

  • Boot with the popcorn kernel to avoid complicated path setting in the following configurations.

  • Install prerequisite packages

$ sudo apt-get install quilt dkms make gcc coreutils pciutils grep perl procps lsof python-libxml2 libssl-dev
  • Download and untar the OFED source
$ wget http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.2-1.2.0.0/MLNX_OFED_SRC-debian-4.2-1.2.0.0.tgz
$ tar xzf MLNX_OFED_SRC-debian-4.2-1.2.0.0.tgz
$ cd MLNX_OFED_SRC-debian-4.2-1.2.0.0
  • Let's build and install it. It will take around 10 minutes
$ sudo ./install.pl --kernel-only --without-dkms --without-iser-modules --without-isert-modules --without-srp-modules --without-knem-modules
  • Load the newly built OFED device drivers
$ sudo /etc/init.d/openibd restart
  • Try to reboot if the loading fails due to an old module. In this case, you don't have to start the service manually after the boot.

  • Check the modules are properly loaded.

$ ls /sys/class/infiniband
mlx4_0

$ ls /sys/class/net/
... .. ib0 ib1 ...
  • ib0 can be configured just as same as Ethernet NICs; give IP address by editing /etc/network/interfaces and reload the NIC.
$ sudo vi /etc/network/interfaces
...
auto ib0
allow-hotplug ib0
iface ib0 inet static
    address 10.4.6.32
    netmask 255.255.255.0
...

$ sudo ifdown ib0 && sudo ifup ib0

$ sudo ifconfig
enx70886b806129 Link encap:Ethernet  HWaddr 70:88:6b:80:61:29
          inet addr:10.4.4.32  Bcast:10.4.4.255  Mask:255.255.255.0
...

ib0       Link encap:UNSPEC  HWaddr A0-00-02-20-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:10.4.6.32  Bcast:10.4.6.255  Mask:255.255.255.0
		  ^^^^^^^^^^^^^^^^^^^
...

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
...
  • Congraturation!! At this point, you can use the InfiniBand card using the msg_socket.ko with IPoIB (IP over InfiniBand).

  • Try to rebuild the driver after deleting DEBS directory if the kernel fails to load modules with complains like following. Also, "Unknown symbol" with err -22 can be resolved by doing so as well.

[   91.696060] mlx_compat: version magic '4.4.55-popcorn+ SMP mod_unload modversions aarch64' should be '4.4.55-popcorn+ SMP mod_unload aarch64'
[   91.729824] mlx_compat: version magic '4.4.55-popcorn+ SMP mod_unload modversions aarch64' should be '4.4.55-popcorn+ SMP mod_unload aarch64'
[   91.768016] mlx_compat: version magic '4.4.55-popcorn+ SMP mod_unload modversions aarch64' should be '4.4.55-popcorn+ SMP mod_unload aarch64'

Using the native InfiniBand message layer

  • IB message layer module should be rebuilt with the OFED device drivers. However, from my best knowledge, there is no obvious way to compile an external module (msg_rdma.ko) atop external modules (MLNX_OFED) that use custom/overridden kernel headers. So, the last resort is to convert the msg_layer module to an OFED submodule.

  • Get the kernel source from the OFED source.

$ cd /path/to/MLNX_OFED_SRC-debian-4.2-1.2.0.0
$ cd SOURCES
$ tar xzf mlnx-ofed-kernel_4.2.orig.tar.gz
$ cd mlnx-ofed-kernel-4.2
  • Copy the msg_layer
$ cp -pr /path/to/popcorn/kernel/msg_layer net/
  • This is a proper moment to check the module has a proper IP list in msg_layer/config.h. The IP should be that of ib* IPoIB interfaces, not the ones of eth* device. In the above example, use 10.4.6.32 not 10.4.4.32.

  • To adapt to an API change in OFED, modify __setup_rdma_buffer() in msg_layer/rdma.c (around line 771)

$ vi net/msg_layer/rdma.c

/__setup_rdma_buffer

   int ret;
+ unsigned int sg_index = 0;
  DECLARE_COMPLETION_ONSTACK(done);
  ...

  ...
    ib_update_fast_reg_key(reg_mr, cb->key);
-   ret = ib_map_mr_sg(mr, &sg, 1, PAGE_SIZE);
+   ret = ib_map_mr_sg(mr, &sg, 1, &sg_index, PAGE_SIZE);
    if (ret != 1) {
  ...
  • Include msg_layer into the build list by appending the following line to the end of Makefile
$ vi Makefile
...
obj-$(CONFIG_POPCORN_KMSG_RDMA) += net/msg_layer/
  • Configure the modules. It will take around 20+ minutes :-(
$ ./configure --with-core-mod --with-ipoib-mod --with-ipoib-cm --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mlx4-mod --with-mlx4_core-mod --with-mlx4_en-mod --with-mlx4_inf-mod
  • Let's build and update modules. If the kernel is significantly changed and modules are not properly loaded, you should rebuild and reinstall drivers
$ make -j 96 && sudo make install
  • Try to load the message module.
$ sudo insmod net/msg_layer/msg_rdma.ko

Dolphin

  • As of November 17, 2017, the official device driver (v5.4.2) for aarch64 does not work. An in-house snapshot of 5.5.0 release candidate seems work but still experiences interrupt mishandling. The suggestion from Dolphin tech support was to turn off the global DMA by setting ntb_disable_global_dma=1 in /opt/DIS/lib/modules/dis_px.conf