400大洋给R7515装了一块ConnectX-5网卡,具体型号MCX542B-ACAN
,因为R7515的OCP网卡槽为OCP 2.0 Type1类型,PCIE3.0*8通道,这个型号已经算最顶配了。前后空余时间搞了一个礼拜,还是有点波折的,特此记录一下。卖家说是浪潮服务器的拆机卡,实际推测可能是百度退役下来的服务器拆机的。
驱动安装
Proxmox 7是基于Debian 11的发行版,但是其内核版本不同于Debian 11默认使用的5.10,而是使用的Ubuntu 22.04使用的5.15版本,因此并不能使用官网的用于Debian系统的Repo来安装MLNX-OFED驱动,亦不能使用Ubuntu系统的Repo,因为依赖版本完全不同。这里需要按照官方文档,生成用于适用于本机内核版本的deb包。
参考:Installing MLNX_OFED
下载驱动源码包,打开官网 ->选择一个合适的版本,我选择目前最新的LTS版本5.8-3.0.7.0-LTS
->Debian->Debian 11.3
->x86_64
->tgz
,Proxmox 8要使用最新的24.04版本才支持6.8
版本的内核。
1 2
wget https://content.mellanox.com/ofed/MLNX_OFED-5.8-3.0.7.0/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.3-x86_64.tgz tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.3-x86_64.tgz
生成本地Repo
1 2 3 4 5 6 7
./mlnx_add_kernel_support.sh -m $(pwd) cd tmp/ tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext.tgz cd /usr/local/src mv /tmp/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext ./
apt添加本地Repo
1 2
cd /etc/apt/sources.list.d echo "deb [trusted=yes] file:/usr/local/src/MLNX_OFED_LINUX-5.8-3.0.7.0-debian11.7-x86_64-ext/DEBS ./" > mlnx_ofed.list
安装mlnx-ofed驱动
1 2
apt update apt install mlnx-ofed-basic
更新固件
拿到手的网卡的PSID
和官方的不同,因此不能通过官方的固件更新工具在线自动更新,推测这是一批百度使用的网卡。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# 直接在线更新会发现没有固件,因为PSID不匹配 $ mlxfwmanager --online -u -d 02:00.0 Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX5 Part Number: MCX542B-ACAN_C07_Ax Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket Halogen free PSID: BAI0000000010 PCI Device Name: 02:00.1 Base GUID: b8599f0300ab1a7c Base MAC: b8599fab1a7c Versions: Current Available FW 16.25.4062 N/A PXE 3.5.0701 N/A UEFI 14.18.0019 N/A Status: No matching image found
参考:Updating Firmware After Installation
下载固件,打开官网 ->选择一个合适的版本,我选择目前最新的LTS版本16.35.3006-LTS
->MCX542B-ACA
->MT_0000000248
1 2
wget https://www.mellanox.com/downloads/firmware/fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin.zip unzip fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin.zip
备份当前固件,更新固件后,型号描述等都会变,这些东西也可以备份下,参考 ,我手太快没备份T_T
1
flint -d /dev/mst/mt4119_pciconf0 ri BAI0000000010.bin
强刷新固件,修改PSID的情况需要用flint
命令,请三思而后行,确保型号没错
1 2 3 4 5 6 7 8 9 10 11 12 13 14
$ flint --allow_psid_change -d /dev/mst/mt4119_pciconf0 -i fw-ConnectX5-rel-16_35_3006-MCX542B-ACA_Ax_Bx-UEFI-14.29.15-FlexBoot-3.6.902.bin burn Done. Current FW version on flash: 16.25.4062 New FW version: 16.35.3006 You are about to replace current PSID on flash - "BAI0000000010" with a different PSID - "MT_0000000248". Note: It is highly recommended not to change the PSID. Do you want to continue ? (y/n) [n] : y Burning FW image without signatures - OK Burning FW image without signatures - OK Restoring signature - OK -I- To load new FW run mlxfwreset or reboot machine.
查询一下信息发现已成功刷上。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ mlxfwmanager Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX5 Part Number: MCX542B-ACAN_C07_Ax Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket Halogen free PSID: BAI0000000010 PCI Device Name: /dev/mst/mt4119_pciconf0 Base GUID: b8599f0300ab1a7c Base MAC: b8599fab1a7c Versions: Current Available FW 16.35.3006 16.25.4062 FW (Running) 16.25.4062 N/A PXE 3.5.0701 3.5.0701 UEFI 14.18.0019 14.18.0019 Status: Up to date
硬件重置一下
1 2 3 4 5 6 7 8 9 10 11 12
$ mlxfwreset -d /dev/mst/mt4119_pciconf0 reset Minimal reset level for device, /dev/mst/mt4119_pciconf0: 3: Driver restart and PCI reset Continue with reset?[y/N] y -I- Sending Reset Command To Fw -Done -I- Stopping Driver -Done -I- Resetting PCI -Done -I- Starting Driver -Done -I- Restarting MST -Done -I- FW was loaded successfully.
已经是新固件了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
$ mlxfwmanager Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX5 Part Number: MCX542B-ACA_Ax_Bx Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket; ROHS R6 Halogen free PSID: MT_0000000248 PCI Device Name: /dev/mst/mt4119_pciconf0 Base GUID: b8599f0300ab1a7c Base MAC: b8599fab1a7c Versions: Current Available FW 16.35.3006 16.35.3006 PXE 3.6.0902 3.6.0902 UEFI 14.29.0015 14.29.0015 Status: Up to date
在线更新
输入新的PSID后,后续就可以直接使用mlxfwmanager
在线更新固件了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
$ mlxfwmanager --online -u -d 02:00.0 Querying Mellanox devices firmware ... Device ---------- Device Type: ConnectX5 Part Number: MCX542B-ACA_Ax_Bx Description: ConnectX-5 EN network interface card for OCP; with host management; 25GbE dual-port SFP28; PCIe3.0 x8; no bracket; ROHS R6 Halogen free PSID: MT_0000000248 PCI Device Name: 02:00.0 Base GUID: b8599f0300ab1a7c Base MAC: b8599fab1a7c Versions: Current Available FW 16.35.3006 16.35.3502 PXE 3.6.0902 3.6.0902 UEFI 14.29.0015 14.29.0015 Status: Update required Release notes for the available Firmware: ----------------------------------------- For more details, please refer to the following FW release notes: 1- ConnectX3 (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf 2- ConnectX3Pro (2.42.5000): http://www.mellanox.com/pdf/firmware/ConnectX3Pro-FW-2_42_5000-release_notes.pdf 3- Connect-IB (10.16.1200): http://www.mellanox.com/pdf/firmware/ConnectIB-FW-10_16_1200-release_notes.pdf 4- ConnectX4 (12.28.2006): http://docs.mellanox.com/display/ConnectX4Firmwarev12282006 5- ConnectX4Lx (14.32.1010): http://docs.mellanox.com/display/ConnectX4LxFirmwarev14321010 6- ConnectX5 (16.35.3502): http://docs.mellanox.com/display/ConnectX5Firmwarev16353502 7- ConnectX6 (20.41.1000): http://docs.mellanox.com/display/ConnectX6Firmwarev20411000 8- ConnectX6Dx (22.41.1000): http://docs.mellanox.com/display/ConnectX6DxFirmwarev22411000 9- ConnectX6Lx (26.41.1000): http://docs.mellanox.com/display/ConnectX6LxFirmwarev26411000 10- BlueField2 (24.41.1000): http://docs.mellanox.com/display/BlueField2Firmwarev24411000 11- ConnectX7 (28.41.1000): http://docs.mellanox.com/display/ConnectX7Firmwarev28411000 12- BlueField3 (32.41.1000): http://docs.mellanox.com/display/BlueField3Firmwarev32411000 --------- Found 1 device(s) requiring firmware update... Perform FW update? [y/N]: y Please wait while downloading MFA(s) 100% Device FSMST_INITIALIZE - OK Writing Boot image component - OK Done Restart needed for updates to take effect.