Linus' latest Stornado video

alpha754293 · June 11, 2019

In reference to the latest video:

I would be really curious to see how this system is going to perform.

It's a bit of a pity that the SSDs are connected to a PCIe 3.0 x8 LSI 9305-16i which means that it'll only have about a total of 64 Gbps per card speeds, and each SAS port is supposed to be SAS 12 Gbps meaning that 5 SSDs should, theorectically come close to saturating the HBA's interface bandwidth already.

At a build price of close to $40k, I would have expected more/better.

Windows7ge · June 11, 2019

You're forgetting that the SSDs are SATAIII not SAS so the SAS3 bandwidth for each SSD is worthless here. SAS2 would suffice with no bottleneck.

I have very mixed feeling about how well this is going to perform seeing has how they're going to use ZFS. I've worked with ZFS for a number of years now and although it is no slouch by any means it doesn't respond to SSDs like you think it would compared to something like NTFS.

IOPS will be though the roof but raw throughput may not be as you're expecting. ZFS sacrifices performance for data integrity. They aught to test a few other options like BTRFS or ReFS just to see how it compares.

Network performance wise I expect to see some very surprising results weather they be good or bad. Which it'll be I can't say but from my experience it won't surprise me if it's only as fast as your average 6 or 8 drive mechanical pool when writing to it over the network.

Enderman · June 11, 2019

You know he's made other SSD builds before to get the highest possible speeds, right?

Maybe go watch those videos.

This server is not intended to be the fastest storage server in the world or anything close to that.

It's just meant to be faster than hard drives.

Electronics Wizardy · June 11, 2019

37 minutes ago, alpha754293 said:

At a build price of close to $40k, I would have expected more/better.

you gotta realize that this is the budget way of doing storage. If you want fast ssd storage you get something like a dell r7415 where you have 24x nvme 2.5 ssd bays.

Performance on this will be much better than spinning drive, and making it faster here really won't matter as the video is only a certian bitrate.

leadeater · June 12, 2019

6 hours ago, alpha754293 said:

It's a bit of a pity that the SSDs are connected to a PCIe 3.0 x8 LSI 9305-16i which means that it'll only have about a total of 64 Gbps per card speeds, and each SAS port is supposed to be SAS 12 Gbps meaning that 5 SSDs should, theorectically come close to saturating the HBA's interface bandwidth already.

Seq performance is pretty irrelevant anyway, the only task that will do that is copying the footage from the ingest workstation and that's limited to 1Gb/s which is not likely to be obtained on ZFS + SMB anyway.

4k and 64k 70/30 performance is more important and SSDs are great for that, with that smaller block size saturating network links and PCIe subsystems is a non issue.

unijab · June 12, 2019

The point is IOPs.

Multiple editors need the I/O responsiveness. The speed per editor will be limited to 10Gb anyway.

alpha754293 · June 18, 2019

On 6/11/2019 at 6:15 PM, Windows7ge said:

You're forgetting that the SSDs are SATAIII not SAS so the SAS3 bandwidth for each SSD is worthless here. SAS2 would suffice with no bottleneck.

I have very mixed feeling about how well this is going to perform seeing has how they're going to use ZFS. I've worked with ZFS for a number of years now and although it is no slouch by any means it doesn't respond to SSDs like you think it would compared to something like NTFS.

IOPS will be though the roof but raw throughput may not be as you're expecting. ZFS sacrifices performance for data integrity. They aught to test a few other options like BTRFS or ReFS just to see how it compares.

Network performance wise I expect to see some very surprising results weather they be good or bad. Which it'll be I can't say but from my experience it won't surprise me if it's only as fast as your average 6 or 8 drive mechanical pool when writing to it over the network.

SAS 6 Gbps and SATA 6 Gbps would have the same bandwidth.

Even with SATA 6 Gbps, you'd be able to saturate the HBA's PCIe 3.0 x8 interface (64 Gbps) with 11 drives out of the 16 that the HBA can host/support.

I thought that ZFS had a SSD caching mode?

BtrFS, according to Wendall, isn't ready for production deployment (which this would be).

alpha754293 · June 18, 2019

On 6/11/2019 at 6:30 PM, Enderman said:

You know he's made other SSD builds before to get the highest possible speeds, right?

Maybe go watch those videos.

This server is not intended to be the fastest storage server in the world or anything close to that.

It's just meant to be faster than hard drives.

But not for a multi-user, production (literally) environment.

It's one thing to have one stream run at 10 Gbps. It's another thing entirely when you have your production crew/staff trying to pull concurrent streams from it. At last count, he had somewhere between like 4-8 editors, which means that ideally, he should be using 100 GbE if that's what he really wants to do, otherwise, even this is going to struggle.

The other thing about SSDs is that as you're reading, writing, and deleting data on and off it constantly, you'll pretty much have to constantly re-trim it in order to keep the performance up, otherwise, you can actually get the SSD to a state (with a LOT of use) that it will perform WORSE than mechanical hard drives just due to this fact alone. The long standing rule for ZFS was that you don't bother running fsck on it to defrag the zpool because it's managed by zfs inherently.

To the best of my knowledge, that isn't true in regards to having to constantly re-trim the SSDs so that cells that were previously marked "full" will now be properly flushed and marked as empty.

alpha754293 · June 18, 2019

On 6/12/2019 at 12:02 AM, leadeater said:

Seq performance is pretty irrelevant anyway, the only task that will do that is copying the footage from the ingest workstation and that's limited to 1Gb/s which is not likely to be obtained on ZFS + SMB anyway.

4k and 64k 70/30 performance is more important and SSDs are great for that, with that smaller block size saturating network links and PCIe subsystems is a non issue.

I don't know. If they're trying to edit directly off of it in Final Cut Pro (and/or it's their Premier batch renderer), I would think that the feed rate would be the number of streams * number of concurrent users * number of concurrent sessions (if there are more than one session per use). It's the weird "world" of randomly sequential because the streams themselves, unless there's a lot of trimming that's waiting to happen, quite sequential, but in a multiuser, concurrent editing environment, it would be requesting the stream data from all over the place because it's trying to pull in multiple streams at once, which gives it a tad more like a random access I/O profile, but it's not quite as random as actually random.

And it's too bad that IOPs can't be translated into an interface transfer bit rate.

alpha754293 · June 18, 2019

On 6/12/2019 at 7:26 AM, unijab said:

The point is IOPs.

Multiple editors need the I/O responsiveness. The speed per editor will be limited to 10Gb anyway.

If I recall, the server itself only has like a dual 10 GbE connection to it anyways, correct? Something like that? i.e. NOT 100 GbE which they probably could have done with the host server.

Windows7ge · June 18, 2019

2 minutes ago, alpha754293 said:

SAS 6 Gbps and SATA 6 Gbps would have the same bandwidth.

Even with SATA 6 Gbps, you'd be able to saturate the HBA's PCIe 3.0 x8 interface (64 Gbps) with 11 drives out of the 16 that the HBA can host/support.

I thought that ZFS had a SSD caching mode?

BtrFS, according to Wendall, isn't ready for production deployment (which this would be).

Yes. Yes it would. It's called SAS2 so I don't see the point you're making. Each SSD has it's own dedicated cable it's not being broken by any backplane so the SAS3 link is a waste.

This is entirely dependent on the software. ZFS is not a performance oriented file system so I don't expect they'll see anywhere near that.

SSD caching on ZFS (ZIL/L2ARC) only work to accelerate synchronous operations. As a file server using SMB or NFS or SSH or SFTP most if not all operations are asynchronous making SSD caching worthless. Not to mention there's no point in having SSD cache on an SSD pool.

That still leaves ReFS and a plethora of other file system options. They could have used two 8 port RAID cards and stripped them in software (not recommended when using ZFS) there's many options they could have considered besides ZFS if max performance were the goal.

unijab · June 18, 2019

34 minutes ago, alpha754293 said:

server itself only has like a dual 10 GbE connection

The jellyfish server they are competing against.. uses one 10GbE connection per editor.

I think the max editors they are shooting for on a non-nvme server is 4.

unijab · June 18, 2019

51 minutes ago, Windows7ge said:

ZFS is not a performance oriented file system so I don't expect they'll see anywhere near that

You dont expect they'll hit what level of performance with ZFS?

Windows7ge · June 18, 2019

1 minute ago, unijab said:

You dont expect they'll hit what level of performance with ZFS?

Even if they used a 100Gbit NIC (large enough network pipe) I don't expect they'll saturate the PCI_e x8 3.0 connection the HBA has. I expect ludicrous IOPS but from my own experience ZFS doesn't respond to SSDs like SSDs respond to other file systems like NTFS.

alpha754293 · June 18, 2019

45 minutes ago, Windows7ge said:

Yes. Yes it would. It's called SAS2 so I don't see the point you're making. Each SSD has it's own dedicated cable it's not being broken by any backplane so the SAS3 link is a waste.

This is entirely dependent on the software. ZFS is not a performance oriented file system so I don't expect they'll see anywhere near that.

SSD caching on ZFS (ZIL/L2ARC) only work to accelerate synchronous operations. As a file server using SMB or NFS or SSH or SFTP most if not all operations are asynchronous making SSD caching worthless. Not to mention there's no point in having SSD cache on an SSD pool.

That still leaves ReFS and a plethora of other file system options. They could have used two 8 port RAID cards and stripped them in software (not recommended when using ZFS) there's many options they could have considered besides ZFS if max performance were the goal.

Actually, it's backwards.

Serial attached SCSI, based on the predecessor that it is built on top of, SCSI, has ALWAYS been named after its interface throughput capacity as of Ultra160 SCSI.

(Prior to that, it was named as SCSI wide, ultra wide, etc.).

Ultra320 SCSI was the last SCSI iteration that had widespread adoption even though they were already working on the Ultra640 SCSI spec by that point.

As such, SAS 3 Gbps was the first generation of the Serial Attached SCSI protocol that was originally developed by the SCSI Trade Association, which was BACKWARDS named the first generation of SAS.

Conversely, SATA 1.5 Gbps (SATA 1.0) was named the other way (generation, then interface speed).

This is why you get a mix of the two.

(https://www.tvtechnology.com/opinions/reaching-for-24g-storage)

You will note that in this press release about 24G SAS (http://www.scsita.org/content/library/24g-sas-data-storage-specification-development-complete-scsi-trade-association-spotlights-technology-at-2017-flash-memory-summit/). Note again, that they list the interface speed first, and then state that it is "comprised of SAS-4 and SPL-4...".

"Each SSD has it's own dedicated cable it's not being broken by any backplane so the SAS3 link is a waste."

I'm not sure what this is in reference to.

You are correct that the Seagate IronWolf SSDs are only SATA 6 Gbps, so it won't be able to take full advantage of the SAS 12 Gbps link that each port CAN have upto, that is supported by the Broadcom 9305-16i, but it still means that if you load 16 drives up per HBA, you're going to exceed the PCIe 3.0 x8 bandwidth, rather than the bandwidth per port. It just takes more drive to overload the interface bandwidth (11 SATA/SAS 6 Gbps vs. 5 SAS 12 Gbps drives), but the point still remains - you're still overloading the interface bandwidth. (i.e. it's somewhat surprising that Broadcom/LSI would have a card where the total number of ports * bandwidth per port > interface bandwidth, and that Broadcom/LSI didn't put the card on a PCIe 3.0 x16 interface connector)

That's my point.

Yeah, I'm sure there are.

It is interesting though, that on Lustre's wiki page, it specifically calls out ext4 and ZFS. (https://en.wikipedia.org/wiki/Lustre_(file_system))

Again, I'm sure that there are lots of choices/options available out there, but I also figure that if it's good enough for Top500, it can't possibly be so different/difficult/complicated (given how long it's been in use for by the Top500) that there should be plenty of support and documentation and how-tos available by now on how to deploy this on bare metal systems.

Again though, if I can avoid needing to or having to do that, it would make the management of my cluster easier/simpler. But if it comes to it, then I'll have to bite the bullet and deploy this.

(And I am looking at this mostly because the Infiniband stuff works WAYYY better in Linux than in Windows).

unijab · June 18, 2019

1 minute ago, Windows7ge said:

like SSDs respond to other file systems like NTFS.

first: LMAO

second: the point of this build is IOPs not peak throughput.

Depending on how they/he configures the vdevs... it could easily affect peak performance (iops vs throughput), but either way, 29 sata SSDs could easily saturate 64 Gbps

alpha754293 · June 18, 2019

5 minutes ago, Windows7ge said:

Even if they used a 100Gbit NIC (large enough network pipe) I don't expect they'll saturate the PCI_e x8 3.0 connection the HBA has. I expect ludicrous IOPS but from my own experience ZFS doesn't respond to SSDs like SSDs respond to other file systems like NTFS.

I'd buy that.

Heck, even ca. 2007, when SAS drives were a thing, ZFS, the way how it handled file I/O, was very kind to NOT thrash my Sun Microsystem SunFire X4200 with four 73 GB 10krpm 2.5" SAS 6 Gbps hard drives while it was serving a LAN party with about 40 players. So updates, etc., it barely made a dent in iostat and top.

But again, if I do end up building a system like this, you guys can be sure that I'll be testing the heck out of it.

Windows7ge · June 18, 2019

2 minutes ago, unijab said:

Depending on how they/he configures the vdevs... it could easily affect peak performance (iops vs throughput), but either way, 29 sata SSDs could easily saturate 64 Gbps

I don't doubt that I just don't think it's going to happen on ZFS. I've tried building a SSD pool on ZFS and the results were very underwhelming. I tried striping, striping with mirrors, the performance was nowhere near what the SSDs cumulative bandwidth was really capable of.

alpha754293 · June 18, 2019

3 minutes ago, unijab said:

first: LMAO

second: the point of this build is IOPs not peak throughput.

Depending on how they/he configures the vdevs... it could easily affect peak performance (iops vs throughput), but either way, 29 sata SSDs could easily saturate 64 Gbps

Well..he does have two Broadcom/LSI 9305-16i cards, so it would be 128 Gbps on the interface alone. But, again, the current build as it was in the video, by loading 16 drives on one card and 11 on the other (rather than trying to get closer to a 50/50 balance between the two HBAs), he's going to saturate one moreso than the other simply because there are more drives connected to it.

It'll be interesting to see what they'll be able to get both in terms of IOPs and raw bandwidth, either way.

Windows7ge · June 18, 2019

11 minutes ago, alpha754293 said:

You are correct that the Seagate IronWolf SSDs are only SATA 6 Gbps, so it won't be able to take full advantage of the SAS 12 Gbps link that each port CAN have upto, that is supported by the Broadcom 9305-16i

You validated the point I was trying to get across that's all that mattered to me.

alpha754293 · June 18, 2019

41 minutes ago, Windows7ge said:

You validated the point I was trying to get across that's all that mattered to me.

It can still saturate the HBA though (limited by the PCIe 3.0 x8 interface) despite that.

unijab · June 18, 2019

I just hope that Linus plans it well and has Wendell (since he'll be in CA soon for ltx) there to do some performance tweaks. That way nobody will default to blaming Linus for doing it wrong.

@LinusTech

Mikensan · June 19, 2019

Albeit a much smaller array of SSDs, I've gotten exactly the speeds I expected out of my SSDs via ZFS. I have 3 SSDs striped and I'm getting a little over 1400mbyte/s.

Now that's raw performance locally tested on the NAS on an empty pool, because once you start mixing in protocls like SMB / iSCSI / NFS and filling up the pool, you introduce other limiting factors (including the network stack). For my homelab however, I can at least attest to ~800-900mbyte/s on my plug and play 10gb (SFP+) network. I would expect once you go beyond 10gb networking you will start seeing more problems with the common network protocols.

leadeater · June 21, 2019

On 6/20/2019 at 12:21 AM, Mikensan said:

Albeit a much smaller array of SSDs, I've gotten exactly the speeds I expected out of my SSDs via ZFS. I have 3 SSDs striped and I'm getting a little over 1400mbyte/s.

Now that's raw performance locally tested on the NAS on an empty pool, because once you start mixing in protocls like SMB / iSCSI / NFS and filling up the pool, you introduce other limiting factors (including the network stack). For my homelab however, I can at least attest to ~800-900mbyte/s on my plug and play 10gb (SFP+) network. I would expect once you go beyond 10gb networking you will start seeing more problems with the common network protocols.

Access protocol is the killer of all storage systems, things like SMB Direct/iSCSI Offload etc are a god send here. Without those getting past 1GB/s is not easily done and leaves huge amounts of storage subsystem performance wasted to achieve it, at least with multi user access it doesn't go to waste.

Sign In

Linus' latest Stornado video

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites