Jump to content

LTT: This is 50x faster than your PC… HOLY $H!T

jakkuh_t
10 hours ago, Vitamanic said:

This isn't even a clickbait title, it's a straight up lie.

 

"This" is not 50x faster than someone's PC, it's a component that has no use unless it's in a system.

 

I know clickbait is a forgone conclusion now, but please avoid just straight up lying about what the video is in the title.

If you think this wad clickbait then, good for you

A guy asking and answering in his bedroom since he has nothing else to do

Link to comment
Share on other sites

Link to post
Share on other sites

What I don't understand is, how does this card get around the issues that the 24 disk all-NVME storage server (New Whonnock) has? Is it just the on-board PCIe switch being way more efficient than the motherboard equivalent?

Link to comment
Share on other sites

Link to post
Share on other sites

Anyone know if they've ever published their SSD benchmarking scripts? Been looking for something on Linux to bench SSD's and they've used some in-house scripts on a few vids now based around fio.

fio seems to have a million settings, it would be good to have a ready to go script with everything dialled in

Link to comment
Share on other sites

Link to post
Share on other sites

[global]
        log_avg_msec=5
        write_bw_log=bandwidth.results
        write_iops_log=iops.results
        write_lat_log=lat.results


        # bs= blocksize for test. Common sizes are 4k, 64k,128k,256k, 512k, 1M

        bs=512k

        # iodepth = queue depth. Common depths are 32, 64, 126, 256, 512
        iodepth=32

        # no reason to change direct
        direct=1

        # only change if you need to, ioengine is fine.
        ioengine=libaio

        # do not change randrepeat, this gives validation to test and makes sure you're not just writing on m.2 DRAM buffer stored data.
        randrepeat=1

        # reports all jobs into the same output. No need to change unless you think there's a single m.2 that's under-performing.
        group_reporting=1

        # sets the jobs to run as a time based run.
        time_based

        # runtime is how long the test will run in seconds.
        runtime=120

        # filesize sets the size of the test file plot.
        filesize=10G

        #cpus_allowed will set what CPU thread to use for FIO. CSV for certain threads.
        # Dash for a span of cpus. ie 0,1,2,3  or 0-24
        cpus_allowed=0-24

        # cpus_allowed_policy changes how the threads are locked. This can be changed to shared or split. Split dedicates one thread to one job
        # shared will allow more than one job to run a thread. Tweak for performance.
        cpus_allowed_policy=shared

        # numjobs sets the amount of threads to run per job. ie. if numbjobs=2 than each job will run two tests on the m.2. Change value to see where performance is the best. 
        numjobs=2

        # do not change invalidate. If set to 0 your test will not be valid.
        invalidate=1

        # ramp_time sets the amount of time to warm up the drives. Longer ramp times will fill the write buffer so you are able to log data for sustained writes only.
        ramp_time=10

        # How to use this job file:
        # Job structure is broken down into four values, job number defined as [job$], job type as defined by rw=$$, 
        # target for test is defined by filename=/dev/nvme$n$ and the name of the test as defined by name=$$



        # Remove comment to switch test mode.
        # write = seq write
        # read = seq read
        # rw = seq read and write
        # randread random reads
        # randwrite = random writes
        # randrw = random reads and writes
        # trimwrite = before writing the block will be trimmed and then written. Good for testing file system performance.

        # Remove jobs and change "filename" to suit your target(s). 
        # This test is currently setup for 4 nvme.  If you want to test 1 drive then remove or comment out jobs 2,3 and 4. 
        # If you want to test more drives or perform multiple tests on one drive,
        # change the filenames as required. If you want to test more drives, add more jobs.

        [job1]


        rw=read
        # rw=write
        # rw=rw
        # rw=randread
        # rw=randwrite
        # rw=randrw
        # rw=trimwrite

        filename=/dev/nvme0n1
        name=raw=test

        [job2]

        rw=read
        # rw=write
        # rw=rw
        # rw=randread
        # rw=randwrite
        # rw=randrw
        # rw=trimwrite

        filename=/dev/nvme1n1
        name=raw=test

        [job3]

        rw=read
        # rw=write
        # rw=rw
        # rw=randread
        # rw=randwrite
        # rw=randrw
        # rw=trimwrite

        filename=/dev/nvme2n1
        name=raw=test

        [job4]

        rw=read
        # rw=write
        # rw=rw
        # rw=randread
        # rw=randwrite
        # rw=randrw
        # rw=trimwrite

        filename=/dev/nvme3n1
        name=raw=test

Here's a job file I wrote that should pretty much take care of anything you need to do in FIO. Just adjust the amount of disks you're running against. I only test nvme so if you have a sata drive you'll want to change to /dev/sd$ or whatever. These are the scripts I run on any of our storage products to get a baseline of performance, it's not the official test but it gives me a good idea of what to expect.

 

Also, remove or comment out the log entries if you don't need them. That would be log_avg_msec=5, and the write_blahblahblah entries. It's good practice to save these as jobfilename.fio

 

Also - DO NOT RUN A WRITE TEST on a drive you don't want obliterated.

 

To run the job file

 

sudo fio jobfilename.fio

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Woodgnome said:

What I don't understand is, how does this card get around the issues that the 24 disk all-NVME storage server (New Whonnock) has? Is it just the on-board PCIe switch being way more efficient than the motherboard equivalent?

An example is something like a Dell PERC where it has a x8 connection to the CPU, so you're trying to stuff 24 nvme down an 8x link. I'm not super familiar with the Dell PERC and I'm just going off from specifications listed. I don't think it has a real switch in it and if it does it's not a beeftastic one. Not knocking Dell, usually that front array isn't going to be used for ultra high IO, there will be an expansion box or PCIe AICs in the mix for the real grunt.

 

 We are able to fit a ton of data through the pipe by using switching and our firmware to drive the switch. Fundamentally it works just like an ethernet switch. We can fit a crap ton of data through the pcie slot by taking advantage of the tiny gaps in each request to and from the drives. So if drive 1 is "idle" for 5 microseconds then drive 2 is able to communicate during that time. The context switching is insanely fast so we're able to really saturate the link. I'm not sure if that helps.

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 5/31/2020 at 1:54 PM, SkillTim said:

Patriot Viper VP4100 does 800K IOPS read and write. Should have used that bad boy.

I'll take a look into that m.2 :D

Link to comment
Share on other sites

Link to post
Share on other sites

On 5/31/2020 at 1:08 PM, Vitamanic said:

That card isn't faster than any PC, that's my point. It's a PCIe card, not a computer. A Commodore 64 is faster than that card in the context they used for the title. 

 

They clickbaited their clickbait, some kind of clickbait inception.

I can assure you, almost any one of the components on this device is more powerful than a MOS 6510 and VIC II, combined.

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, xnamkcor said:

I can assure you, almost any one of the components on this device is more powerful than a MOS 6510 and VIC II, combined.

This is true, there is an ARM CPU onboard.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Liqid_Mark said:

This is true, there is an ARM CPU onboard.

oh that is very interesting

sorry if this is a stupid question but what exactly does the arm CPU does ? :D

 

PC: Motherboard: ASUS B550M TUF-Plus, CPU: Ryzen 3 3100, CPU Cooler: Arctic Freezer 34, GPU: GIGABYTE WindForce GTX1650S, RAM: HyperX Fury RGB 2x8GB 3200 CL16, Case, CoolerMaster MB311L ARGB, Boot Drive: 250GB MX500, Game Drive: WD Blue 1TB 7200RPM HDD.

 

Peripherals: GK61 (Optical Gateron Red) with Mistel White/Orange keycaps, Logitech G102 (Purple), BitWit Ensemble Grey Deskpad. 

 

Audio: Logitech G432, Moondrop Starfield, Mic: Razer Siren Mini (White).

 

Phone: Pixel 3a (Purple-ish).

 

Build Log: 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, TofuHaroto said:

oh that is very interesting

sorry if this is a stupid question but what exactly does the arm CPU does ? :D

 

I'm gonna guess it does something akin to a "controller".

That or a hardware/data interpreter.

Link to comment
Share on other sites

Link to post
Share on other sites

The ARM part is sort of like a controller.

 

 

These are enterprise devices meant for extreme IO and bandwidth needs. If you're going to install it in a gaming rig you can... just make sure your butler wears an ESD safe strap that's grounded. :)

Link to comment
Share on other sites

Link to post
Share on other sites

23 hours ago, Liqid_Mark said:

An example is something like a Dell PERC where it has a x8 connection to the CPU, so you're trying to stuff 24 nvme down an 8x link. I'm not super familiar with the Dell PERC and I'm just going off from specifications listed. I don't think it has a real switch in it and if it does it's not a beeftastic one. Not knocking Dell, usually that front array isn't going to be used for ultra high IO, there will be an expansion box or PCIe AICs in the mix for the real grunt.

 

 We are able to fit a ton of data through the pipe by using switching and our firmware to drive the switch. Fundamentally it works just like an ethernet switch. We can fit a crap ton of data through the pcie slot by taking advantage of the tiny gaps in each request to and from the drives. So if drive 1 is "idle" for 5 microseconds then drive 2 is able to communicate during that time. The context switching is insanely fast so we're able to really saturate the link. I'm not sure if that helps.

 

The New Whonnock/all-NVMe server runs with PCI Express 3.0 x4 lanes per drive with an AMD EPYC CPU with 128 PCI Express 4.0 lanes so while I understand what you're getting at it's not the case here.

 

Personally I just put together an EPYC server with 6x Intel P4510 disks using an ASUS KRPA-U16 motherboard (6x dedicated PCI Express 4.0 x4 OCuLink drive headers) and I experienced the same underwhelming results as Linus did (luckily I don't have issues with instability, however).

 

So the question stands - you guys are doing something differently, but is it all down to a super efficient PCI Express switch or something else?

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Woodgnome said:

The New Whonnock/all-NVMe server runs with PCI Express 3.0 x4 lanes per drive with an AMD EPYC CPU with 128 PCI Express 4.0 lanes so while I understand what you're getting at it's not the case here.

 

Personally I just put together an EPYC server with 6x Intel P4510 disks using an ASUS KRPA-U16 motherboard (6x dedicated PCI Express 4.0 x4 OCuLink drive headers) and I experienced the same underwhelming results as Linus did (luckily I don't have issues with instability, however).

 

So the question stands - you guys are doing something differently, but is it all down to a super efficient PCI Express switch or something else?

Well.. it really comes down Rome. A lot of it has to do with Preferred IO and Infinity fabric. This Liqid card performs the same on Epyc as it does in TR or Ryzen as long as you set Preferred IO and turn off ARI. 

 

The PCIe switch is ultra efficient and our custom super sauce has a lot to do with how it operates.

Link to comment
Share on other sites

Link to post
Share on other sites

On 6/2/2020 at 11:56 PM, Liqid_Mark said:

Well.. it really comes down Rome. A lot of it has to do with Preferred IO and Infinity fabric. This Liqid card performs the same on Epyc as it does in TR or Ryzen as long as you set Preferred IO and turn off ARI. 

 

The PCIe switch is ultra efficient and our custom super sauce has a lot to do with how it operates.

So basically you could say the multiple NVMe drive setup suffers, because you can't set a preferred IO to all of them (unlike the Honey Badger)?

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Woodgnome said:

So basically you could say the multiple NVMe drive setup suffers, because you can't set a preferred IO to all of them (unlike the Honey Badger)?

Correct. Preferred IO targets a single bus and everything on that bus benefits. If you have two badgers on a TR or Rome motherboard only one will perform at full speed. On Ryzen we don't have enough PCIe lanes to cover both badgers.

Link to comment
Share on other sites

Link to post
Share on other sites

Also update : MSRP of a LQD4500 is somewhere around $5600 for the LQD4500 7.6TB. Seems stupid expensive but it is an enterprise card. Remember that :)

Link to comment
Share on other sites

Link to post
Share on other sites

Hello everybody!

 

I have a small question related to the video, which is not directly tech related... At 11:35 there is a music playing in the background. Does anyone knows the title? Sorry if this is not the correct place to ask it, and for my bad English :/

 

Thanks in advance!

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×