Proxmox安装MLNX-OFED驱动

  400大洋给R7515装了一块ConnectX-5网卡,具体型号MCX542B-ACAN,因为R7515的OCP网卡槽为OCP 2.0 Type1类型,PCIE3.0*8通道,这个型号已经算最顶配了。前后空余时间搞了一个礼拜,还是有点波折的,特此记录一下。卖家说是浪潮服务器的拆机卡,实际推测可能是百度退役下来的服务器拆机的。

驱动安装

  Proxmox 7是基于Debian 11的发行版,但是其内核版本不同于Debian 11默认使用的5.10,而是使用的Ubuntu 22.04使用的5.15版本,因此并不能使用官网的用于Debian系统的Repo来安装MLNX-OFED驱动,亦不能使用Ubuntu系统的Repo,因为依赖版本完全不同。这里需要按照官方文档,生成用于适用于本机内核版本的deb包。

参考:Installing MLNX_OFED

  1. 下载驱动源码包,打开官网->选择一个合适的版本,我选择目前最新的LTS版本5.8-3.0.7.0-LTS->Debian->Debian 11.3->x86_64->tgz
1
2
wget https://content.mellanox.com/ofed/MLNX_OFED-5.8-3.0.7.0/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.3-x86_64.tgz
tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.3-x86_64.tgz
  1. 生成本地Repo
1
2
3
4
5
6
7
./mlnx_add_kernel_support.sh -m $(pwd)

cd tmp/
tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext.tgz

cd /usr/local/src
mv /tmp/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext ./
  1. apt添加本地Repo
1
2
cd /etc/apt/sources.list.d
echo "deb [trusted=yes] file:/usr/local/src/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext/DEBS ./" > mlnx_ofed.list
  1. 安装mlnx-ofed驱动
1
2
apt update
apt install mlnx-ofed-basic

更新固件

  拿到手的网卡的PSID和官方的不同,因此不能通过官方的固件更新工具在线自动更新,推测这是一批百度使用的网卡。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 直接在线更新会发现没有固件,因为PSID不匹配
$ mlxfwmanager --online -u -d 02:00.0
Querying Mellanox devices firmware ...

Device #1:
----------

Device Type: ConnectX5
Part Number: MCX542B-ACAN_C07_Ax
Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket Halogen free
PSID: BAI0000000010
PCI Device Name: 02:00.1
Base GUID: b8599f0300ab1a7c
Base MAC: b8599fab1a7c
Versions: Current Available
FW 16.25.4062 N/A
PXE 3.5.0701 N/A
UEFI 14.18.0019 N/A

Status: No matching image found

参考:Updating Firmware After Installation

  1. 下载固件,打开官网->选择一个合适的版本,我选择目前最新的LTS版本16.35.3006-LTS->MCX542B-ACA->MT_0000000248
1
2
wget https://www.mellanox.com/downloads/firmware/fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin.zip
unzip fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin.zip
  1. 备份当前固件,更新固件后,型号描述等都会变,这些东西也可以备份下,参考,我手太快没备份T_T
1
flint -d /dev/mst/mt4119_pciconf0 ri BAI0000000010.bin
  1. 强刷新固件,修改PSID的情况需要用flint命令,请三思而后行,确保型号没错
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ flint --allow_psid_change -d /dev/mst/mt4119_pciconf0 -i fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin burn
Done.
Current FW version on flash: 16.25.4062
New FW version: 16.35.3006


You are about to replace current PSID on flash - "BAI0000000010" with a different PSID - "MT_0000000248".
Note: It is highly recommended not to change the PSID.

Do you want to continue ? (y/n) [n] : y
Burning FW image without signatures - OK
Burning FW image without signatures - OK
Restoring signature - OK
-I- To load new FW run mlxfwreset or reboot machine.

  查询一下信息发现已成功刷上。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

Device Type: ConnectX5
Part Number: MCX542B-ACAN_C07_Ax
Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket Halogen free
PSID: BAI0000000010
PCI Device Name: /dev/mst/mt4119_pciconf0
Base GUID: b8599f0300ab1a7c
Base MAC: b8599fab1a7c
Versions: Current Available
FW 16.35.3006 16.25.4062
FW (Running) 16.25.4062 N/A
PXE 3.5.0701 3.5.0701
UEFI 14.18.0019 14.18.0019

Status: Up to date
  1. 硬件重置一下
1
2
3
4
5
6
7
8
9
10
11
12
$ mlxfwreset -d /dev/mst/mt4119_pciconf0 reset

Minimal reset level for device, /dev/mst/mt4119_pciconf0:

3: Driver restart and PCI reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw -Done
-I- Stopping Driver -Done
-I- Resetting PCI -Done
-I- Starting Driver -Done
-I- Restarting MST -Done
-I- FW was loaded successfully.

  已经是新固件了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

Device Type: ConnectX5
Part Number: MCX542B-ACA_Ax_Bx
Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket; ROHS R6 Halogen free
PSID: MT_0000000248
PCI Device Name: /dev/mst/mt4119_pciconf0
Base GUID: b8599f0300ab1a7c
Base MAC: b8599fab1a7c
Versions: Current Available
FW 16.35.3006 16.35.3006
PXE 3.6.0902 3.6.0902
UEFI 14.29.0015 14.29.0015

Status: Up to date