Jump to content

Supermicro with 1PB SSD in 1U

10 hours ago, leadeater said:

Oooo nice, an actual product. Now it's worth looking at, out of the theory box and in to reality :)

Can't say too much but I have one of these coming in for testing :)

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

41 minutes ago, Falconevo said:

Can't say too much but I have one of these coming in for testing :)

Not a 1PB model I hope, well I mean I wish but for testing be a huge waste xD

Link to comment
Share on other sites

Link to post
Share on other sites

47 minutes ago, leadeater said:

Not a 1PB model I hope, well I mean I wish but for testing be a huge waste xD

Na, just 6 rulers to start with.


They are essentially the DC4600 series SSD's in a new form factor, looking forward to seeing how it copes with hot-removal of drives during sustained IO.

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Falconevo said:

Na, just 6 rulers to start with.


They are essentially the DC4600 series SSD's in a new form factor, looking forward to seeing how it copes with hot-removal of drives during sustained IO.

Wouldn't that also depend a lot on what you're using storage wise? Lustre, Ceph, Gluster etc. Do these even have hardware RAID options, seems like too high for that.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, leadeater said:

Wouldn't that also depend a lot on what you're using storage wise? Lustre, Ceph, Gluster etc. Do these even have hardware RAID options, seems like too high for that.

Yea it would, PCI-E hot-plug (non U.2) isn't new but the only experience I have with it personally is the DSSD tech from EMC which wasn't exactly flawless.

 

It will be used in for testing in CEPH, it does support vROC via the CPU but it requires a license and has some unusual caveats regarding booting (not that you would ever boot from it) and drive grouping across multiple CPU's as the PCI-E lanes are spread amongst both processors.

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Falconevo said:

Yea it would, PCI-E hot-plug isn't new but the only experience I have with it personally is the DSSD tech from EMC which wasn't exactly flawless.

 

It will be used in for testing in CEPH, it does support vROC via the CPU but it requires a license and has some unusual caveats regarding booting (not that you would ever boot from it) and drive grouping across multiple CPU's as the PCI-E lanes are spread amongst both processors.

I'd be interested in the performance under Ceph, throughput and latency wise. Going with Bluestore OSDs?

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, leadeater said:

I'd be interested in the performance under Ceph, throughput and latency wise. Going with Bluestore OSDs?

Unfortunately our 'live' platform is still using filestore as its a few releases back :| but it has plans to be brought forward which are in final planning/auth stages at the moment.

 

Some serious problems occurred when an upgrade was attempted, those have been ironed out now so it should be able to upgrade with less drama (hopefully)

 

In testing this equipment we would be using the latest build of Luminious which I believe defaults to Bluestore now?

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, Falconevo said:

In testing this equipment we would be using the latest build of Luminious which I believe defaults to Bluestore now?

Yep, had a few issues preping disks though but I think that's related to not using proper HBAs and decommissioned servers. Ceph-volume didn't work properly for me but ceph-disk did but is getting deprecated, though some of those issues relate to also trying to use wal and block.db on SSD for the SAS disks I was using (everything is behind per disk RAID 0 so super hack).

 

I found using wal and block.db for the HDDs didn't really improve performance much if at all, not like filestore SSD journaling does. I also wasn't that impressed by the throughput either of SSD only pool, 6 DL380p Gen8 2x 2667v2 64GB 3 SSD was pushing around 1.5GB/s but the other 11 10k SAS disks per server would do basically the same. Though SAS disks aren't slow and that's fair amount compared to the number of SSDs, suppose I could put more SSDs in and test again.

 

I've heard a lot of mixed reports performance wise for Bluestore, I do see it as being much more consistent and also more reliable config wise.

 

Edit:

Also the new OSD disk classes is super awesome, makes storage rules easier/interesting and helps simplify crush maps particularly if you're using mixed disk servers (helps me anyway in my lab setup).

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Yep, had a few issues preping disks though but I think that's related to not using proper HBAs and decommissioned servers. Ceph-volume didn't work properly for me but ceph-disk did but is getting deprecated, though some of those issues relate to also trying to use wal and block.db on SSD for the SAS disks I was using (everything is behind per disk RAID 0 so super hack).

 

I found using wal and block.db for the HDDs didn't really improve performance much if at all, not like filestore SSD journaling does. I also wasn't that impressed by the throughput either of SSD only pool, 6 DL380p Gen8 2x 2667v2 64GB 3 SSD was pushing around 1.5GB/s but the other 11 10k SAS disks per server would do basically the same. Though SAS disks aren't slow and that's fair amount compared to the number of SSDs, suppose I could put more SSDs in and test again.

 

I've heard a lot of mixed reports performance wise for Bluestore, I do see it as being much more consistent and also more reliable config wise.

Yea, similar issues... before I was involved the hardware choice was all wrong and it was using RAID0 with write cache disabled rather than an actual HBA.   That has since changed but there is a mix of the two, some older devices which haven't been replaced yet are still using RAID0 :( makes me sad inside.

 

The PCI-E NVMe performance needs some serious work, I know Sandisk have made some strides in this area but they don't perform anything like they should.

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Falconevo said:

Yea, similar issues... before I was involved the hardware choice was all wrong and it was using RAID0 with write cache disabled rather than an actual HBA.   That has since changed but there is a mix of the two, some older devices which haven't been replaced yet are still using RAID0 :( makes me sad inside.

 

The PCI-E NVMe performance needs some serious work, I know Sandisk have made some strides in this area but they don't perform anything like they should.

I'm also wondering if some of my performance is being limited by the NICs, they're not exactly the best for this. They're just standard 10Gb with no RDMA, I checked NIC utilization and it's not maxing out the link fully but I get the feeling that's only due to being latency limited. I've got newer servers with 25Gb and RDMA but they're all prod and years away from making it down to dev lab/play toys.

 

I'm hoping to get funding to do a POC with some HPE Apollo kit, probably Apollo 4200s.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

I'm also wondering if some of my performance is being limited by the NICs, they're not exactly the best for this. They're just standard 10Gb with no RDMA, I checked NIC utilization and it's not maxing out the link fully but I get the feeling that's only due to being latency limited. I've got newer servers with 25Gb and RDMA but they're all prod and years away from making it down to dev lab/play toys.

 

I'm hoping to get funding to do a POC with some HPE Apollo kit, probably Apollo 4200s.

Unfortunately we have a mix and some work better than others, some are using 10G solarflare cards and Intel x540's (a previous Linux engineer had a hardon for solarflare) which have no RDMA support but do have good driver support.  They perform 'OK' and don't often get close to maxing their ports but when they do an obvious latency increase is present, neither cards in the old chassis have RDMA or iWARP support...   A bigger problem is the fact that smart information is bullshit from some of the disk firmwares used when behind a RAID controller in RAID0.  Had to spend a lot of time updating disk firmware versions to get rid of bullshit smart information being generated and/or disks failing randomly with no warning and ofc the tools dont support flashing behind a RAID controller...

Another fun one from the early days was SGI equipment (stay well away from this shit show vendor) where the 100+ jbod disk chassis had a 5 minute chassis intrusion shutdown when the main lid was off to replace hot-swap disks.  Could it be disabled in BIOS, no..did the vendor provide a fix... nope...We had to literally tape the chassis intrusion switches down when anyone was working on the chassis after a number of incidents.  The incidents were multiple disks were being replaced and the 5 minute time window expired.. the SGI unit force powers down... can you imagine what that does in CEPH.  Was this documented anywhere in the official docs for the chassis versions we had... nope :D

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Falconevo said:

SGI equipment (stay well away from this shit show vendor) where the 100+ jbod disk chassis had a 5 minute chassis intrusion shutdown when the main lid was off to replace hot-swap disks.

Too late xD. Not my problem though and what we have is old.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Too late xD. Not my problem though and what we have is old.

hahaha, I think early days of CEPH they were the only real 'mass storage' jbod vendor and their marketing was on point for it.

 

We wanted to throw them in to the canal, absolute shower of shit causing outages all the time. 

Please quote or tag me if you need a reply

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×