Proxmox安装MLNX-OFED驱动

  400大洋给R7515装了一块ConnectX-5网卡,具体型号MCX542B-ACAN,因为R7515的OCP网卡槽为OCP 2.0 Type1类型,PCIE3.0*8通道,这个型号已经算最顶配了。前后空余时间搞了一个礼拜,还是有点波折的,特此记录一下。卖家说是浪潮服务器的拆机卡,实际推测可能是百度退役下来的服务器拆机的。

驱动安装

  Proxmox 7是基于Debian 11的发行版,但是其内核版本不同于Debian 11默认使用的5.10,而是使用的Ubuntu 22.04使用的5.15版本,因此并不能使用官网的用于Debian系统的Repo来安装MLNX-OFED驱动,亦不能使用Ubuntu系统的Repo,因为依赖版本完全不同。这里需要按照官方文档,生成用于适用于本机内核版本的deb包。

参考:Installing MLNX_OFED

  1. 下载驱动源码包,打开官网->选择一个合适的版本,我选择目前最新的LTS版本5.8-3.0.7.0-LTS->Debian->Debian 11.3->x86_64->tgz,Proxmox 8要使用最新的24.04版本才支持6.8版本的内核。
1
2
wget https://content.mellanox.com/ofed/MLNX_OFED-5.8-3.0.7.0/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.3-x86_64.tgz
tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.3-x86_64.tgz
  1. 生成本地Repo
1
2
3
4
5
6
7
./mlnx_add_kernel_support.sh -m $(pwd)

cd tmp/
tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext.tgz

cd /usr/local/src
mv /tmp/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext ./
  1. apt添加本地Repo
1
2
cd /etc/apt/sources.list.d
echo "deb [trusted=yes] file:/usr/local/src/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext/DEBS ./" > mlnx_ofed.list
  1. 安装mlnx-ofed驱动
1
2
apt update
apt install mlnx-ofed-basic

更新固件

  拿到手的网卡的PSID和官方的不同,因此不能通过官方的固件更新工具在线自动更新,推测这是一批百度使用的网卡。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 直接在线更新会发现没有固件,因为PSID不匹配
$ mlxfwmanager --online -u -d 02:00.0
Querying Mellanox devices firmware ...

Device #1:
----------

Device Type: ConnectX5
Part Number: MCX542B-ACAN_C07_Ax
Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket Halogen free
PSID: BAI0000000010
PCI Device Name: 02:00.1
Base GUID: b8599f0300ab1a7c
Base MAC: b8599fab1a7c
Versions: Current Available
FW 16.25.4062 N/A
PXE 3.5.0701 N/A
UEFI 14.18.0019 N/A

Status: No matching image found

参考:Updating Firmware After Installation

  1. 下载固件,打开官网->选择一个合适的版本,我选择目前最新的LTS版本16.35.3006-LTS->MCX542B-ACA->MT_0000000248
1
2
wget https://www.mellanox.com/downloads/firmware/fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin.zip
unzip fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin.zip
  1. 备份当前固件,更新固件后,型号描述等都会变,这些东西也可以备份下,参考,我手太快没备份T_T
1
flint -d /dev/mst/mt4119_pciconf0 ri BAI0000000010.bin
  1. 强刷新固件,修改PSID的情况需要用flint命令,请三思而后行,确保型号没错
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ flint --allow_psid_change -d /dev/mst/mt4119_pciconf0 -i fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin burn
Done.
Current FW version on flash: 16.25.4062
New FW version: 16.35.3006


You are about to replace current PSID on flash - "BAI0000000010" with a different PSID - "MT_0000000248".
Note: It is highly recommended not to change the PSID.

Do you want to continue ? (y/n) [n] : y
Burning FW image without signatures - OK
Burning FW image without signatures - OK
Restoring signature - OK
-I- To load new FW run mlxfwreset or reboot machine.

  查询一下信息发现已成功刷上。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

Device Type: ConnectX5
Part Number: MCX542B-ACAN_C07_Ax
Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket Halogen free
PSID: BAI0000000010
PCI Device Name: /dev/mst/mt4119_pciconf0
Base GUID: b8599f0300ab1a7c
Base MAC: b8599fab1a7c
Versions: Current Available
FW 16.35.3006 16.25.4062
FW (Running) 16.25.4062 N/A
PXE 3.5.0701 3.5.0701
UEFI 14.18.0019 14.18.0019

Status: Up to date
  1. 硬件重置一下
1
2
3
4
5
6
7
8
9
10
11
12
$ mlxfwreset -d /dev/mst/mt4119_pciconf0 reset

Minimal reset level for device, /dev/mst/mt4119_pciconf0:

3: Driver restart and PCI reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw -Done
-I- Stopping Driver -Done
-I- Resetting PCI -Done
-I- Starting Driver -Done
-I- Restarting MST -Done
-I- FW was loaded successfully.

  已经是新固件了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

Device Type: ConnectX5
Part Number: MCX542B-ACA_Ax_Bx
Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket; ROHS R6 Halogen free
PSID: MT_0000000248
PCI Device Name: /dev/mst/mt4119_pciconf0
Base GUID: b8599f0300ab1a7c
Base MAC: b8599fab1a7c
Versions: Current Available
FW 16.35.3006 16.35.3006
PXE 3.6.0902 3.6.0902
UEFI 14.29.0015 14.29.0015

Status: Up to date

在线更新

  输入新的PSID后,后续就可以直接使用mlxfwmanager在线更新固件了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
$ mlxfwmanager --online -u -d 02:00.0
Querying Mellanox devices firmware ...

Device #1:
----------

Device Type: ConnectX5
Part Number: MCX542B-ACA_Ax_Bx
Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket; ROHS R6 Halogen free
PSID: MT_0000000248
PCI Device Name: 02:00.0
Base GUID: b8599f0300ab1a7c
Base MAC: b8599fab1a7c
Versions: Current Available
FW 16.35.3006 16.35.3502
PXE 3.6.0902 3.6.0902
UEFI 14.29.0015 14.29.0015

Status: Update required

Release notes for the available Firmware:
-----------------------------------------

For more details, please refer to the following FW release notes:
1- ConnectX3 (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf
2- ConnectX3Pro (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3Pro-FW-2_42_5000-release_notes.pdf
3- Connect-IB (10.16.1200): http://www.mellanox.com/pdf/firmware/ConnectIB-FW-10_16_1200-release_notes.pdf
4- ConnectX4 (12.28.2006): http://docs.mellanox.com/display/ConnectX4Firmwarev12282006
5- ConnectX4Lx (14.32.1010): http://docs.mellanox.com/display/ConnectX4LxFirmwarev14321010
6- ConnectX5 (16.35.3502): http://docs.mellanox.com/display/ConnectX5Firmwarev16353502
7- ConnectX6 (20.41.1000): http://docs.mellanox.com/display/ConnectX6Firmwarev20411000
8- ConnectX6Dx (22.41.1000): http://docs.mellanox.com/display/ConnectX6DxFirmwarev22411000
9- ConnectX6Lx (26.41.1000): http://docs.mellanox.com/display/ConnectX6LxFirmwarev26411000
10- BlueField2 (24.41.1000): http://docs.mellanox.com/display/BlueField2Firmwarev24411000
11- ConnectX7 (28.41.1000): http://docs.mellanox.com/display/ConnectX7Firmwarev28411000
12- BlueField3 (32.41.1000): http://docs.mellanox.com/display/BlueField3Firmwarev32411000

---------
Found 1 device(s) requiring firmware update...

Perform FW update? [y/N]: y

Please wait while downloading MFA(s) 100%
Device #1: Updating FW ...
FSMST_INITIALIZE - OK
Writing Boot image component - OK
Done

Restart needed for updates to take effect.