I want more nvme raid0 performance

hash02 · June 20, 2021

I have 3x 3.8TB nvme pcie3 enteprise ssds in raid0 (mdadm on linux) and i'm getting read speeds the same or below that of the individual underlying drives.
These drives are rated at 2.4~2.5 GB/s sequential max read, so in a raid0 setup i should get more, but sometimes i get less.

setup:
Epyc 7502p
3x 3.8 TB Samsung NVME pcie3 ssds

128GB DDR4

Debian 10, stock, default kernel

On the software side, mdadm raid0 from all three devices with xfs on top. no special flags passed to xfs creation, defaults only.

Tested using hdparm and dd.
nvme list:

nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     S438N*********       SAMSUNG MZQLB3T8HALS-00007               1           1.92  TB /   3.84  TB    512   B +  0 B   EDA5502Q
/dev/nvme1n1     S438N*********       SAMSUNG MZQLB3T8HALS-00007               1           1.92  TB /   3.84  TB    512   B +  0 B   EDA5502Q
/dev/nvme2n1     S438N*********       SAMSUNG MZQLB3T8HALS-00007               1           1.92  TB /   3.84  TB    512   B +  0 B   EDA5502Q

xfs info

meta-data=/dev/md0               isize=512    agcount=32, agsize=87904768 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=2812952576, imaxpct=5
         =                       sunit=128    swidth=384 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

md info

/dev/md0:
           Version : 1.2
     Creation Time : Fri Jun 18 18:11:41 2021
        Raid Level : raid0
        Array Size : 11251817472 (10730.57 GiB 11521.86 GB)
      Raid Devices : 3
     Total Devices : 3
       Persistence : Superblock is persistent

       Update Time : Fri Jun 18 18:11:41 2021
             State : clean 
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

        Chunk Size : 512K

Consistency Policy : none

              Name : epyc-server-01:0  (local to host epyc-server-01)
              UUID : 9c1a9795:81025516:ab1b34a2:4d27d794
            Events : 0

    Number   Major   Minor   RaidDevice State
       0     259        1        0      active sync   /dev/nvme1n1
       1     259        2        1      active sync   /dev/nvme2n1
       2     259        0        2      active sync   /dev/nvme0n1

Speed:

hdparm -t /dev/md0 : average 2000MB/s

hdparm -t /dev/nvme0n1 (or any other) - average 2300MB/s

What's wrong here ?

Electronics Wizardy · June 20, 2021

2 hours ago, hash02 said:

Tested using hdparm and dd.

Those aren't great benchmarks, try using FIO here.

What board are you using and how are the drives connected to the board?

Sergio778 · March 3, 2023

Just given the task to analyze a serious performance issue on a mdadm (Raid10)

Very similar setup. Probably hitting the same problem.

MB Supermicro H12DSU-iN

2 x AMD EPYC 7352 24-Core Processor (CPU1 and CPU2)

Storage

Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 BTLJ113301G41P0FGN INTEL SSDPE2KX010T8 1 1.00 TB / 1.00 TB 512 B + 0 B VDV10170
/dev/nvme1n1 BTLJ1133031W1P0FGN INTEL SSDPE2KX010T8 1 1.00 TB / 1.00 TB 512 B + 0 B VDV10170
/dev/nvme2n1 S438NC0R600133 SAMSUNG MZQLB3T8HALS-00007 1 3.09 TB / 3.84 TB 512 B + 0 B EDA5402Q
/dev/nvme3n1 S438NC0R600129 SAMSUNG MZQLB3T8HALS-00007 1 3.09 TB / 3.84 TB 512 B + 0 B EDA5402Q
/dev/nvme4n1 S438NC0R600128 SAMSUNG MZQLB3T8HALS-00007 1 3.09 TB / 3.84 TB 512 B + 0 B EDA5402Q
/dev/nvme5n1 S438NC0R600131 SAMSUNG MZQLB3T8HALS-00007 1 3.09 TB / 3.84 TB 512 B + 0 B EDA5402Q
/dev/nvme6n1 S438NC0R600124 SAMSUNG MZQLB3T8HALS-00007 1 3.09 TB / 3.84 TB 512 B + 0 B EDA5402Q
/dev/nvme7n1 S438NC0R600126 SAMSUNG MZQLB3T8HALS-00007 1 3.09 TB / 3.84 TB 512 B + 0 B EDA5402Q

Sofware RAID setup:

md0 : active raid10 nvme4n1[2] nvme7n1[5] nvme5n1[3] nvme6n1[4] nvme3n1[1] nvme2n1[0]
11251817472 blocks super 1.2 512K chunks 2 near-copies [6/6] [UUUUUU]
bitmap: 24/84 pages [96KB], 65536KB chunk

md2 : active raid1 nvme1n1p3[1] nvme0n1p3[0] -> Root
975578112 blocks super 1.2 [2/2] [UU]
bitmap: 4/8 pages [16KB], 65536KB chunk

md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
1046528 blocks super 1.2 [2/2] [UU]

Right now all NVMe are hanging from CPU1

read latency in /dev/md0 is very bad.

I was planning to distribute the nvme to both CPUs, but I need to know which one is mdadm mirroring and which one is mdadm stripping.

Any clue/suggestion? Thanks!