Jump to content

NVMe RAID on ESXi ??

Archaic0

I've been experimenting with NMVe storage in a VMWare environment and so far it's going very well with single NMVe datastores.  I've had a direct performance increase on a core daily operation where a particular task started at 35 minutes and has dropped to 12 minutes.  It's amazing.

 

This is still very much experimental though in that single NMVe drives give me a single point of failure that I'm not comfortable with.  Especially when I cannot monitor the SMART status of an NVMe drive while it's 2 steps removed from any OS (hardware->ESXi->vSphere datastore->Windows VM).  (any tips on this?)

 

I found a HighPoint software RAID card and bought one and experimented with it and saw amazing numbers with a RAID 10 setup.  None of their cards appear to be hardware RAID though so I can't use them in my ESXi setup.

 

I keep finding old information online, and some bleeding edge stuff from the most recent CES, but then there is Intel's VROC and AMD's version as well with a lot of drama and I'm left wondering if anyone besides LTT has a functioning NVMe RAID setup, and if any of those has one setup as an ESXi datastore?

 

It appears we're just not there yet, but we should be in 2019 or 2020?  Or maybe there's something I'm missing?

 

My ESXi hosts are Dell R730s running Intel Xeon E5-2667 v3 CPUs.  I'm unsure about PCIe versions / lane configuration and CPU VROC support - I'm just not educated about the hardware at that level yet.

 

We will be replacing these ESXi hosts in 2020 or 2021 though, so if NVMe RAID for an ESXi host just isn't possible yet, then what should I be making sure I look for in our next hardware cycle?

 

Thanks in advance!

 

Link to comment
Share on other sites

Link to post
Share on other sites

You will require vROC (virtual raid on chip) to be supported on the motherboard to use NVMe drives in RAID.


vROC only arrived on the latest eval units, the R730 as good of a chassis that it is doesn't have support for vROC.  In addition the vROC needs to be licensed for the requirement you have which is a premium license which costs £££ also :( AMD's version is license free so keep that in mind.

 

I'm also unsure about vROC driver support on ESXi.  I haven't tested vROC on ESXi myself as I've only tested Server 2016, Server 2019, CentOS and RedHat.
Here's the vROC running in Windows using RSTe
 

image.png.6deded0729f208cb620c65ff74dfa0d3.png

 

vROC also has some limitations with dual socket configurations, for example you can't create VMD (Volume Management Device) across multiple CPU sockets which is bootable.  As the PCIe devices are bound to a CPU for their PCIe lanes it doesn't allow for it.  You can create a VMD which spans multiple CPU's but it cannot be used to boot and some OS's have problems seeing the device correctly due to early driver implementation.

 

Some caveats to using a VMD, currently nothing other than Intel software can correctly read the S.M.A.R.T data on the NVMe(s) behind the VMD.  This is something I have raised to Intel recently for their team to review which i'm sure they will sort out.

 

Hope this helps a bit, give me a shout if you want any more specific info as I still have a vROC capable NVMe unit in storage eval testing.

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, Falconevo said:

In addition the vROC needs to be licensed for the requirement you have which is a premium license which costs £££ also :( AMD's version is license free so keep that in mind.

For me if it requires money then I'd start considering an LSI/Broadcom RAID card that supports NVMe, though I imagine that is a fair amount more than a vROC Premium key. At least I also know ESXi supports that as well.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

For me if it requires money then I'd start considering an LSI/Broadcome RAID card that supports NVMe, though I imagine that is a fair amount more than a vROC Premium key. At least I also know ESXi supports that as well. 

Yea, I've been doing testing with the tri-mode LSI controllers and they have some serious performance and support NVMe raid using U.2 drives.

Just a pain how expensive the chassis are to support the stuff at the moment.

I also have an Intel Ruler Eval system in if you want any info on that beast ;)

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Falconevo said:

I also have an Intel Ruler Eval system in if you want any info on that beast ;)

Just the number of organs I have to sacrifice to get one ?

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, leadeater said:

... I'd start considering an LSI/Broadcom RAID card that supports NVMe...

Where can I find these cards?

 

EDIT: I found Broadcom's website with a card listed, but I am not finding any retail source for them.  I'll contact Broadcom sales but this is feeling like it's meant for OEMs and not retail customers.

 

https://www.broadcom.com/products/storage/host-bus-adapters/sas-nvme-9400-8i#overview

Link to comment
Share on other sites

Link to post
Share on other sites

I finally found where I can buy these cards at CDW, but I'm in need of further explanation if someone is willing.

 

These cards appear to have 2 mini-sas connectors on them, and they have NVMe support for 'internal only' connections.  I am unclear on how exactly I get from this card physically connected to NVMe drives. 

 

Some card descriptions mention that this card enables you to run NVMe drives 'in one drive bay', so should I be able to find some cage or mount or breakout board that will allow me to physically connect multiple NVMe drives and end up with a SAS connector that I can then use to connect to this card?

 

https://www.cdw.com/product/broadcom-hba-9400-8i-storage-controller-sata-6gb-s-sas-12gb-s-pcie/4865638#PO

Link to comment
Share on other sites

Link to post
Share on other sites

57 minutes ago, Archaic0 said:

Where can I find these cards? 

You need to find a tri-mode LSI card that supports NVMe, these are quite expensive at the moment and you will need a chassis backplane that supports SATA/SAS/NVMe (NVMe is in U.2 small form factor)

 

LSI 9400 series cards that support it with internal connectivity (LSI SAS3416 / SAS3616W)

9405W-16i (16 internal ports)

9400-8i8e (this is 8 internal 8 external ports)

9400-8i (8 internal ports)

9400-16i (16 internal ports)

 

The above should support up to 24 NVMe devices (requires 2x CPUs populated for 24 NVMe's) the connectivity is provided by SFF-8643 which will need to be supported by the backplane you are connecting the NVMes to.

 

The R730 isn't the right chassis for the job, even though they hold a very high standing in my experience as I run many in production.

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

Also just a note, you would be better off doing NVMe without hardware raid and using software technologies for redundant storage such as vSAN if you are using ESXi or if using alternate Windows would be Storage Spaces Direct and Linux based would be something like CEPH.

 

Don't get me wrong, I'm in the process of testing NVMe RAID because it comes up converstation alot when discussing redundancy as most people are scared to death of 'software' based RAID tech even though that is the future.

 

If you have 3 or more ESXi nodes, I would advise going with vSAN and getting a single NVMe (PCI Express HHHL) for the caching disk and use comodity stuff for the underlying storage.  Something that is better £/$ per GB

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, Falconevo said:

You need to find a tri-mode LSI card that supports NVMe, these are quite expensive at the moment and you will need a chassis backplane that supports SATA/SAS/NVMe (NVMe is in U.2 small form factor)

 

LSI 9400 series cards that support it with internal connectivity (LSI SAS3416 / SAS3616W)

9405W-16i (16 internal ports)

9400-8i8e (this is 8 internal 8 external ports)

9400-8i (8 internal ports)

9400-16i (16 internal ports)

 

The above should support up to 24 NVMe devices (requires 2x CPUs populated for 24 NVMe's) the connectivity is provided by SFF-8643 which will need to be supported by the backplane you are connecting the NVMes to.

 

The R730 isn't the right chassis for the job, even though they hold a very high standing in my experience as I run many in production.

Do you know how these 9400 series cards work? I have done some googling and there aren't many reviews or people talking about them. 

 

From what I see in the user guide.

 

Quote

The Tri-Mode device interface contains a SAS core and a PCIe device bridge (PDB). The PDB enables the PCIe (NVMe) storage interface connections and each PDB can support direct connect to NVMe devices or to x4 PCIe switches

 

So that makes me think that there converting nvme pcie devices to sas, but that would seem to hurt speed a good amount and get rid of some of the benfits of sas. 

 

One dude on the internet says that there nvme drive went from 3000mB/s connected to the cpu to 700mB/s connected to this series of cards, but that just doesn't seem right.

 

@leadeater Do you know?

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, Electronics Wizardy said:

So that makes me think that there converting nvme pcie devices to sas, but that would seem to hurt speed a good amount and get rid of some of the benfits of sas. 

 

One dude on the internet says that there nvme drive went from 3000mB/s connected to the cpu to 700mB/s connected to this series of cards, but that just doesn't seem right.

 

@leadeater Do you know?

Far as I know the U.2 device connector is a bit like SAS and SATA in that it is physical spec compliant with the former two and differs electrically. A universal drive slot will physically allow all 3 to be inserted and the pin contact is based on the drive type inserted.

 

3031

 

3030

https://www.boston.co.uk/technical/2017/09/nvme-the-future-of-storage-as-we-know-it.aspx

 

As for the speed drop I can only think of two reasons for that, the SSD isn't optimized for a hardware RAID card and really doesn't like it or the RAID card was installed in to a PCIe slot connected to the chipset going though the DMI/QPI interface.

Link to comment
Share on other sites

Link to post
Share on other sites

The reason for the performance drop is quite simple, the PCIe LSI card is going to be on x8 or x16 PCIe 3.1 lanes.  For it to be able to provide up to 24 NVMe drives via a PCIe PLX switch which 24 x4 NVMe drives cannot communicate via all at the same time and the drives are reduced to x1 bandwidth.  Some reduction in performance is also present when you bring in hardware RAID to the NVMe stack as the SSD has its local drive cache dram disabled and the cache of the raid controller is used instead, most Tri-Mode adapters have an 4-8G cache which is large for RAID controller standards of the past years.

 

This is why vROC exists because it eliminates the reduction in performance for the NVMe drives and allows them to have up to x4 PCIe lanes depending on the configuration, system layout and if it is single or dual socket.  vROC has some pitfalls not being able to do complex RAID configurations but for the most part it covers most requirements, RAID0, RAID1, RAID5 & RAID10.  You can also enable the dram cache on the SSD NVMe behind the vROC while in RAID but this comes with significant warning for potential data loss.

 

Software defined storage does a much better job for getting performance out of NVMe's that are not behind a RAID controller.  As you are using ESXi, I would look at using vSAN and making use of PCIe HHHL NVMe for the cache drive and comodity mechanical HDD's for back end storage.

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, leadeater said:

Far as I know the U.2 device connector is a bit like SAS and SATA in that it is physical spec compliant with the former two and differs electrically. A universal drive slot will physically allow all 3 to be inserted and the pin contact is based on the drive type inserted.

 

 

 

As for the speed drop I can only think of two reasons for that, the SSD isn't optimized for a hardware RAID card and really doesn't like it or the RAID card was installed in to a PCIe slot connected to the chipset going though the DMI/QPI interface.

 

4 hours ago, Falconevo said:

The reason for the performance drop is quite simple, the PCIe LSI card is going to be on x8 or x16 PCIe 3.1 lanes.  For it to be able to provide up to 24 NVMe drives via a PCIe PLX switch which 24 x4 NVMe drives cannot communicate via all at the same time and the drives are reduced to x1 bandwidth.  Some reduction in performance is also present when you bring in hardware RAID to the NVMe stack as the SSD has its local drive cache dram disabled and the cache of the raid controller is used instead, most Tri-Mode adapters have an 4-8G cache which is large for RAID controller standards of the past years.

 

This is why vROC exists because it eliminates the reduction in performance for the NVMe drives and allows them to have up to x4 PCIe lanes depending on the configuration, system layout and if it is single or dual socket.  vROC has some pitfalls not being able to do complex RAID configurations but for the most part it covers most requirements, RAID0, RAID1, RAID5 & RAID10.  You can also enable the dram cache on the SSD NVMe behind the vROC while in RAID but this comes with significant warning for potential data loss.

 

Software defined storage does a much better job for getting performance out of NVMe's that are not behind a RAID controller.  As you are using ESXi, I would look at using vSAN and making use of PCIe HHHL NVMe for the cache drive and comodity mechanical HDD's for back end storage.

But does the raid controller chip on these cards convert the pcie nvme drives to sas or does it do raid with the native pcie calls.

 

The 9400 series cards seem to be more than just a plx chip from the user guide, Look at the use guide here https://docs.broadcom.com/docs/pub-005851 and on page 11, it seems to make me think that its a sas raid card with a pcie to sas adapter.

 

The card being a plx chip wouldn't explain the speed difference as running a single drive on a plx chip should get about the same speed as not using a plx chips.

5 hours ago, Falconevo said:

The reason for the performance drop is quite simple, the PCIe LSI card is going to be on x8 or x16 PCIe 3.1 lanes.  For it to be able to provide up to 24 NVMe drives via a PCIe PLX switch which 24 x4 NVMe drives cannot communicate via all at the same time and the drives are reduced to x1 bandwidth.

But wouldn't a plx chip allow the max speed to any drive at once time? And with a single drive you should get the full speed. That speed that I found seems to be from using once drive at once. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Electronics Wizardy said:

 

But does the raid controller chip on these cards convert the pcie nvme drives to sas or does it do raid with the native pcie calls. 

 

The 9400 series cards seem to be more than just a plx chip from the user guide, Look at the use guide here https://docs.broadcom.com/docs/pub-005851 and on page 11, it seems to make me think that its a sas raid card with a pcie to sas adapter. 

 

The card being a plx chip wouldn't explain the speed difference as running a single drive on a plx chip should get about the same speed as not using a plx chips. 

But wouldn't a plx chip allow the max speed to any drive at once time? And with a single drive you should get the full speed. That speed that I found seems to be from using once drive at once.  

The PLX is only in place to split x8 or x16 3.1 lanes in to x24 (x1 per NVMe if the maximum of 24 drives are populated I believe 24 is the current maximum the cards are capable depending on the model you purchase and backplane support).
 

If you only have 4 NVMe's connected it allows for each device to run at x4 but the total x16 bandwidth is 'limited' by the LSI RAID card including the overhead from the cards processing itself.  This would depend on the RAID type ofc, RAID0, 1 and 10 are going to be less computation to RAID5 so overheads are reduced.


As for the devices being passed as NVMe or SAS, the device is shown as RAID NVMe bus type rather than SAS and vROC is the same.  That threw me a little as in 'mixed' mode with RAID and pass-through with vROC the devices show as RAID NVMe bus type even if they are not inside a VMD which is essentially pass-through.  Disable the vROC and the device shows as NVMe bus type which is what would be expected as it's just a PCie device with no vROC middleman.

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Falconevo said:

The PLX is only in place to split x8 or x16 3.1 lanes in to x24 (x1 per NVMe if the maximum of 24 drives are populated I believe 24 is the current maximum the cards are capable depending on the model you purchase and backplane support).
 

If you only have 4 NVMe's connected it allows for each device to run at x4 but the total x16 bandwidth is 'limited' by the LSI RAID card including the overhead from the cards processing itself.  This would depend on the RAID type ofc, RAID0, 1 and 10 are going to be less computation to RAID5 so overheads are reduced.


As for the devices being passed as NVMe or SAS, the device is shown as RAID NVMe bus type rather than SAS and vROC is the same.  That threw me a little as in 'mixed' mode with RAID and pass-through with vROC the devices show as RAID NVMe bus type even if they are not inside a VMD which is essentially pass-through.  Disable the vROC and the device shows as NVMe bus type which is what would be expected as it's just a PCie device with no vROC middleman.

have you used the cards before? got any performance numbers? 

 

So if you have one nvme x4 drive connected you see full performance?

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Electronics Wizardy said:

have you used the cards before? got any performance numbers? 

 

So if you have one nvme x4 drive connected you see full performance?

Only if the device is set to pass-through mode and the cache on the SSD is active.  If the device is put in RAID mode and the cache disabled, it doesn't perform anywhere near as good.   I'll have a look through my past eval unit notes and see if I have anything specific for you to chew on, nothing should be under NDA anymore :)

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

There is a OEM Dell card, made for this purpose... 0P31H2 is the partnumber afaik and this should give you 4x PCIe 3.0 Lanes for your U.2 SSD's. As you use a Dell server, I would highly recommend using this card over the rocketraid one.

 

I don't know if this card supports raid, I would be surprised if it didn't. If it did, you should either see the logical device directly in ESXi or have to install a driver (VIB) for it. You use the Dell OEM ESXi image, right?

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 year later...

I am trying to use VROC for a datastore RAID with 2 M.2 drives.  The BIOS sees the RAID but ESXi sees 2 separate drives when trying to add a new datastore.  I have loaded the latest Intel VROC driver (INT_bootbank_intel-nvme-vmd_1.8.1.1001-1OEM.670.0.0.8169922) and rebooted but still no go.  Any ides what I can try next?  I don't want to give up the performance of the M.2 going through the cpu and I don't want to go without redundancy for the datastore either. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×