Reputation from nbritton - Linus Tech Tips

nbritton got a reaction from EtaCarinae in This Server Deployment was HORRIBLE January 25, 2021

I would recommend trying a low latency (1,000 Hz) kernel. The default generic kernel, in Ubuntu, is 250 Hz. From a performance standpoint 250 and even 100 Hz are better because the system has less "interruptions", but 1,000 Hz can probably offer better stability because it gives the system more opportunities to check in on the needs of the system as a whole. Also recommend enabling x2APIC mode and MSI-X, these will provide more interrupts for your system.

I strongly recommend ZFS (For my work I've done extensive comparison testing with Ext4 and XFS), but I must urge you to use the latest 0.8.x branch because it's way faster; in some cases it offers 2x the performance of the 0.7.0 branch. This is available in Ubuntu 19.10 natively, and via PPA in Ubuntu 16.04 and 18.04... https://launchpad.net/~jonathonf/+archive/ubuntu/zfs

In the video you said nothing about NUMA domains, you are most assuredly hitting inter-NUMA transfer bandwidth limitations. You need to ensure everything is pinned to the same NUMA domains as what the NVMe devices are attached to. You may be better off segregating all of the storage subsystem to a single NUMA domain.

For a ZFS tuning perspective 128 KB record size offers the best overall net performance gain (assuming a mixed workload), however, 64 KB is a really good choice too if you work with small files with random access patterns. The qcow2 format uses 64 KB as the default cluster size so that is also a good choice if the primary use case is VM storage. For 24 NVMe I would recommend 4x 6-disk raidz (or raidz2) vdevs, this will be nearly as fast as striped mirrors but offer a lot more useable capacity. Finding the right ashift value is not straightforward, it's best to simply performance test each value (i.g. 9, 12, 13, etc.). Use atime=off

zpool create -O recordsize=64k -O compression=on -O atime=off -o ashift=9 data

For benchmarking I recommend the following:

fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bsrange=64k-192k --numjobs=16 --group_reporting=1 --random_distribution=zipf:0.5 --norandommap=1 --iodepth=24 --size=32G --rwmixread=50 --time_based=90 --runtime=90 --readwrite=randrw

nbritton got a reaction from Ben17 in This Server Deployment was HORRIBLE February 5, 2020

I would recommend trying a low latency (1,000 Hz) kernel. The default generic kernel, in Ubuntu, is 250 Hz. From a performance standpoint 250 and even 100 Hz are better because the system has less "interruptions", but 1,000 Hz can probably offer better stability because it gives the system more opportunities to check in on the needs of the system as a whole. Also recommend enabling x2APIC mode and MSI-X, these will provide more interrupts for your system.

I strongly recommend ZFS (For my work I've done extensive comparison testing with Ext4 and XFS), but I must urge you to use the latest 0.8.x branch because it's way faster; in some cases it offers 2x the performance of the 0.7.0 branch. This is available in Ubuntu 19.10 natively, and via PPA in Ubuntu 16.04 and 18.04... https://launchpad.net/~jonathonf/+archive/ubuntu/zfs

In the video you said nothing about NUMA domains, you are most assuredly hitting inter-NUMA transfer bandwidth limitations. You need to ensure everything is pinned to the same NUMA domains as what the NVMe devices are attached to. You may be better off segregating all of the storage subsystem to a single NUMA domain.

For a ZFS tuning perspective 128 KB record size offers the best overall net performance gain (assuming a mixed workload), however, 64 KB is a really good choice too if you work with small files with random access patterns. The qcow2 format uses 64 KB as the default cluster size so that is also a good choice if the primary use case is VM storage. For 24 NVMe I would recommend 4x 6-disk raidz (or raidz2) vdevs, this will be nearly as fast as striped mirrors but offer a lot more useable capacity. Finding the right ashift value is not straightforward, it's best to simply performance test each value (i.g. 9, 12, 13, etc.). Use atime=off

zpool create -O recordsize=64k -O compression=on -O atime=off -o ashift=9 data

For benchmarking I recommend the following:

fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bsrange=64k-192k --numjobs=16 --group_reporting=1 --random_distribution=zipf:0.5 --norandommap=1 --iodepth=24 --size=32G --rwmixread=50 --time_based=90 --runtime=90 --readwrite=randrw

nbritton got a reaction from Ben17 in This Server Deployment was HORRIBLE February 5, 2020

Mellanox ConnectX cards rarely ever meet expectations with regards to performance. The ASIC is gimped and Mellanox lies about the cards capabilities because I guess they know that very few people actually have the skillset necessary to performance test high speed networking equipment and more importantly have confidence that the benchmarking was done correctly to call them on their bs. I've seen the performance problems in their ConnectX-2, ConnectX-3, and ConnectX-5 chips personally. We have ConnextX-5 Ex 100 GbE cards in the lab right now that can barely even hit 40 GbE in a Xeon Platinum 8160 system. Without the card the system is capable of 2000 Gb/s over the loopback interfaces using network namespaces to isolate each side of the connection. If you ever run across their dual port cards, just assume only one port is active because the second port is basically just a virtual function hardwired to a second physical port. If the two ports are active at the same time the ConnectX cards will behave like half duplex devices.

nbritton got a reaction from Brad Man in This Server Deployment was HORRIBLE February 4, 2020

I would recommend trying a low latency (1,000 Hz) kernel. The default generic kernel, in Ubuntu, is 250 Hz. From a performance standpoint 250 and even 100 Hz are better because the system has less "interruptions", but 1,000 Hz can probably offer better stability because it gives the system more opportunities to check in on the needs of the system as a whole. Also recommend enabling x2APIC mode and MSI-X, these will provide more interrupts for your system.

I strongly recommend ZFS (For my work I've done extensive comparison testing with Ext4 and XFS), but I must urge you to use the latest 0.8.x branch because it's way faster; in some cases it offers 2x the performance of the 0.7.0 branch. This is available in Ubuntu 19.10 natively, and via PPA in Ubuntu 16.04 and 18.04... https://launchpad.net/~jonathonf/+archive/ubuntu/zfs

In the video you said nothing about NUMA domains, you are most assuredly hitting inter-NUMA transfer bandwidth limitations. You need to ensure everything is pinned to the same NUMA domains as what the NVMe devices are attached to. You may be better off segregating all of the storage subsystem to a single NUMA domain.

For a ZFS tuning perspective 128 KB record size offers the best overall net performance gain (assuming a mixed workload), however, 64 KB is a really good choice too if you work with small files with random access patterns. The qcow2 format uses 64 KB as the default cluster size so that is also a good choice if the primary use case is VM storage. For 24 NVMe I would recommend 4x 6-disk raidz (or raidz2) vdevs, this will be nearly as fast as striped mirrors but offer a lot more useable capacity. Finding the right ashift value is not straightforward, it's best to simply performance test each value (i.g. 9, 12, 13, etc.). Use atime=off

zpool create -O recordsize=64k -O compression=on -O atime=off -o ashift=9 data

For benchmarking I recommend the following:

fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bsrange=64k-192k --numjobs=16 --group_reporting=1 --random_distribution=zipf:0.5 --norandommap=1 --iodepth=24 --size=32G --rwmixread=50 --time_based=90 --runtime=90 --readwrite=randrw

nbritton got a reaction from TechyBen in This Server Deployment was HORRIBLE February 4, 2020

I would recommend trying a low latency (1,000 Hz) kernel. The default generic kernel, in Ubuntu, is 250 Hz. From a performance standpoint 250 and even 100 Hz are better because the system has less "interruptions", but 1,000 Hz can probably offer better stability because it gives the system more opportunities to check in on the needs of the system as a whole. Also recommend enabling x2APIC mode and MSI-X, these will provide more interrupts for your system.

I strongly recommend ZFS (For my work I've done extensive comparison testing with Ext4 and XFS), but I must urge you to use the latest 0.8.x branch because it's way faster; in some cases it offers 2x the performance of the 0.7.0 branch. This is available in Ubuntu 19.10 natively, and via PPA in Ubuntu 16.04 and 18.04... https://launchpad.net/~jonathonf/+archive/ubuntu/zfs

In the video you said nothing about NUMA domains, you are most assuredly hitting inter-NUMA transfer bandwidth limitations. You need to ensure everything is pinned to the same NUMA domains as what the NVMe devices are attached to. You may be better off segregating all of the storage subsystem to a single NUMA domain.

For a ZFS tuning perspective 128 KB record size offers the best overall net performance gain (assuming a mixed workload), however, 64 KB is a really good choice too if you work with small files with random access patterns. The qcow2 format uses 64 KB as the default cluster size so that is also a good choice if the primary use case is VM storage. For 24 NVMe I would recommend 4x 6-disk raidz (or raidz2) vdevs, this will be nearly as fast as striped mirrors but offer a lot more useable capacity. Finding the right ashift value is not straightforward, it's best to simply performance test each value (i.g. 9, 12, 13, etc.). Use atime=off

zpool create -O recordsize=64k -O compression=on -O atime=off -o ashift=9 data

For benchmarking I recommend the following:

fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=test --bsrange=64k-192k --numjobs=16 --group_reporting=1 --random_distribution=zipf:0.5 --norandommap=1 --iodepth=24 --size=32G --rwmixread=50 --time_based=90 --runtime=90 --readwrite=randrw

nbritton got a reaction from Airbornchaos in Suggestions for our iMac Pro repair April 19, 2018

This one is easy.

Step 1: Buy a new iMac Pro that is identical to the one you have.
Step 2: Swap the display from the new iMac to the old iMac.
Step 3: Return the new iMac with the broken screen.

Sign In

nbritton

Posts

Joined

Last visited

Reputation Activity

My Activity Streams