Tips of configuring InfiniBand adapters
After reconfiguring clusters from scratch for several times, it seems that I am gradually adapting to this mystery and strange InfiniBand world...
Relationship among InfiniBand, RoCE, IPoIB, and Ethernet Mode
Let us take Mellanox ConnectX Adapter as an example. Actually, this adapter can work in either InfiniBand Mode or Ethernet Mode, which is configurable with some tools provided by the vendor. As iWARP is not widely adopted, our article will not discuss this protocol.
InfiniBand Mode | Ethernet Mode | |
---|---|---|
Supported by ConnectX | Yes | Yes |
RDMA Support | Yes | Yes |
Programmable with Verbs | Yes | Yes |
TCP/IP Support | Needs IPoIB | Yes |
Configurable with Netplan (e.g. Assign IP Address) | Needs IPoIB | Yes |
Layout of RDMA Packet | IB Frame + IB Header | ETH Frame + RoCE Header |
Layout of TCP Packet | IB Frame + IB/IPoIB/IP/TCP Headers | ETH Frame + IP/TCP Headers |
Note that RoCE Header is a general concept. And RoCEv1 and RoCEv2 give different detailed definitions of this part.
Identify InfiniBand / Ethernet Mode
The easiest way is to directly have a look at the interface name and link type with ifconfig
or ip
under Linux. An InfiniBand adapter working in Ethernet mode looks exactly the same as a regular Ethernet adapter.
1 | $ ip a |
Besides, ibdev2netdev
can also help.
1 | $ ibdev2netdev |
Another approach is through ibstat
. And the field Link layer
shows which mode the adapter is working in.
1 | $ ibstat |
Change InfiniBand / Ethernet Mode
To alter the work mode, there doesn't exist a general way for now. For Mellanox ConnectX Adapter, the vendor provided a tool called mlxconfig
. Here is the usage listed in the official document, where you can find more information about it.
1 | $ sudo mlxconfig -d /dev/mst/mt4103_pci_cr0 set LINK_TYPE_P1=1 LINK_TYPE_P2=1 |
Note that P1 and P2 are referring to two separated ports on the adapter. Attention: Please make sure the network switch is capable of handling InfiniBand or Ethernet Frame before altering the work mode . If the switch cannot recognize the data frame sent from the server, you might observer Physical state: Polling
reported by ibstat
, as the packet is not forwarded by the switch correctly. Certain network switches can only forward one type of data frame at a time, which means you may need to manually reconfigure the switch to let it work with the other type of data frame.
Configure IPoIB
By default, the IPoIB will be automatically configured when the IP address is assigned to the interface. The IP address can be managed by netplan
or NetworkManager
, which depends on your Linux distro. As for the configuration file, there is no difference between the InfiniBand and regular Ethernet Adapters.
1 | # Assign a static IP address with netplan for an InfiniBand interface |
Once the above configuration is applied and the interface is brought up successfully. We can see ib_ipoib
module is loaded.
1 | $ lsmod | grep ipoib |
If the IP address doesn't appear in ip a
, we need to check the status of the InfiniBand adapter and make sure its state is active in ibstat
. A common mistake is forgetting to enable opensm
/ opensmd
, which will make the adapter stuck at State: Initializing
. Note that opensmd
will not launch on startup by default.
1 | # Start OpenSM |
Identify RoCE Version
The major difference between RoCEv1 and RoCEv2 is that RoCEv2 is able to utilize IP networking to route while RoCEv1 is routing via MAC addresses. A funny fact is RoCEv1 and RoCEv2 may be enable simultaneously, and we could choose the version at runtime through specifying Group ID (GID). There is a script written by Mellanox named show_gids
and it will display RoCE versions associated to GIDs.
1 | $ show_gids |
Check Adapter Speed
ethtool
can read out this information and it can work with both InfiniBand and Ethernet mode.
1 | $ ethtool ibp129s0 |
References
- https://www.advancedclustering.com/act_kb/infiniband-port-states/
- https://zhuanlan.zhihu.com/p/32105832
- https://wiki.archlinux.org/title/InfiniBand
- https://docs.nvidia.com/networking/display/MLNXOFEDv461000/OpenSM
- https://www.cnblogs.com/juzib/p/13273380.html
- https://blog.51cto.com/liangchaoxi/4044293