NVIDIA REFUSED To Send Us This

jakkuh_t · February 23, 2022

Minionflo · February 23, 2022

can you post the command to recreate the Resnet50 benchmark?

Kizune · February 23, 2022

Correction - SXM4 version of this card does have an IHS on it.

ACastanza · February 23, 2022

3 hours ago, Minionflo said:

can you post the command to recreate the Resnet50 benchmark?

The command was:

wget https://github.com/tensorflow/benchmarks/archive/master.zip && unzip master.zip && cd benchmarks-master/scripts/tf_cnn_benchmarks && python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=512 --num_batches=100 --model=resnet50 --optimizer=momentum --variable_update=replicated --all_reduce_spec=nccl --nodistortions --gradient_repacking=2 --datasets_use_prefetch=True --per_gpu_thread_count=2 --loss_type_to_report=base_loss --compute_lr_on_cpu=True --single_l2_loss_op=True --xla_compile=True --local_parameter_device=gpu --num_gpus=1 --display_every=10 --use_fp16

Note that per a forum thread I found here: https://forum.level1techs.com/t/testing-resnet50-performance-nvidia-docker-ubuntu/145182 , it would appear that the

--xla_compile=True

flag that is passed in the above command is specifically Intel only. For AMD this should be modified to:

--xla_compile=False

I didn't change this however, and it only appears to only cause warnings but I can't say it didn't impact my performance in the benchmark (although as I note below I appeared to be power limited at a lower wattage than the MSI card that was used in the video which was expected.)

Also I don't know exactly what container they were actually running it in. but the latest container seems to be: tensorflow:22.01-tf1-py3.
Be aware the container is ~12 gigabytes.

Run with:

docker run --gpus all -it --rm nvcr.io/nvidia/tensorflow:22.01-tf1-py3

I ran this using WSL2 Ubuntu and Docker on Windows 11 and I was able to generally reproduce the video's results for an RTX 3090, with the above command and container and obtained a result of: total images/sec: 1378.18 on my Pny 3090 which was power limited at 360W.

T8z5h3 · February 23, 2022

I feel like the capability of my Nvidia Tesla t4 is more flexible.

i wonder if it will show up under task manager?

Luscious · February 23, 2022

So price tag aside this would be an efficient card to get for GPU intensive tasks, especially in a server chassis where multiple PSU's can be installed to power 8 to 10 of these things. Even in a workstation tower arrangement you could stack 3 to 4 of these depending on your mobo along with an A6000 to do the video related stuff - that's impressive.

What's NOT so impressive is the price, and the only guy to blame for that is Jensen. Even when you factor in the 80GB HBM2 there is no way in hell the BOM for this should exceed four digits. You are saving nothing in efficiency whatsoever spending $10-14K on this card versus the inflated $3K on a 3090. In fact, a water cooled 3090 would more than likely outlast one of these when used 24/7. Obviously for those buying HUNDREDS it won't matter if a card costing 10 grand dies under that 5 year extended service plan and they can get a replacement for free - the average Joe won't be throwing out that kind of money for just a GPU every 5 years.

Server installs also benefit from renewable energy infrastructure so things like massive solar arrays can be deployed to help bring down that power cost further. Never mind the subsidies companies can get for doing green energy installs, much of which just doesn't apply to a residential setup in the home. You will need more than just a handful of panels on your roof and many years before your wallet sees any difference.

Workstation users MAY be able to afford it though considering the price of a high end system ($50-60K) and the type of money they can bring in for their users over that same 5 year period. Once the warranty period is up the cards can continue to be used either until they die or as a secondary system extending that ROI. Reselling it can be a smart option at that point as well, especially if the card is still in demand because it performs well or some guy just needs a spare for cheap.

T8z5h3 · February 24, 2022

4 hours ago, Luscious said:

So price tag aside this would be an efficient card to get for GPU intensive tasks, especially in a server chassis where multiple PSU's can be installed to power 8 to 10 of these things. Even in a workstation tower arrangement you could stack 3 to 4 of these depending on your mobo along with an A6000 to do the video related stuff - that's impressive.

What's NOT so impressive is the price, and the only guy to blame for that is Jensen. Even when you factor in the 80GB HBM2 there is no way in hell the BOM for this should exceed four digits. You are saving nothing in efficiency whatsoever spending $10-14K on this card versus the inflated $3K on a 3090. In fact, a water cooled 3090 would more than likely outlast one of these when used 24/7. Obviously for those buying HUNDREDS it won't matter if a card costing 10 grand dies under that 5 year extended service plan and they can get a replacement for free - the average Joe won't be throwing out that kind of money for just a GPU every 5 years.

Server installs also benefit from renewable energy infrastructure so things like massive solar arrays can be deployed to help bring down that power cost further. Never mind the subsidies companies can get for doing green energy installs, much of which just doesn't apply to a residential setup in the home. You will need more than just a handful of panels on your roof and many years before your wallet sees any difference.

Workstation users MAY be able to afford it though considering the price of a high end system ($50-60K) and the type of money they can bring in for their users over that same 5 year period. Once the warranty period is up the cards can continue to be used either until they die or as a secondary system extending that ROI. Reselling it can be a smart option at that point as well, especially if the card is still in demand because it performs well or some guy just needs a spare for cheap.

From what research I did if you needed something that was easy to install, was willing to wait a bit longer but save a ton of power I would look at:

Nvidia Tesla T4 (at 70W) or Nvidia A2 (at 45W) and only get higher gpu's if you need the vram.

jivandabeast · February 24, 2022

I think it was interesting to see consumer GPU performance in the data science space. As such, I feel like it would be a valuable benchmark to add to GPU reviews when you guys make them... though I have a side interest in data science/machine learning so I couldn't say if the greater ltt community feel the same way

danwat1234 · February 24, 2022

On your facebook page, the video is cut off like numerous others on your page. Why not do the little bit of work in order to make the aspect ratio correct? It's just a matter of re-encoding once you find the magic settings? Because, ad revenue?

https://www.facebook.com/LinusTech/posts/517830633046006

Luscious · February 24, 2022

3 hours ago, T8z5h3 said:

From what research I did if you needed something that was easy to install, was willing to wait a bit longer but save a ton of power I would look at:

Nvidia Tesla T4 (at 70W) or Nvidia A2 (at 45W) and only get higher gpu's if you need the vram.

That T4 is a quirky little card with it's HH/HL form factor LOL. Not sure which one (Supermicro or Tyan) actually had a server SKU that crammed TWENTY of those into a single 4U chassis, all forced-air cooled. The motherboard they used for it was just as insane with a completely proprietary daughterboard that had 20 x8 PCI-e slots!!! Made for a nice 1400W space heater not including the dual Xeons and everything else in there.

FWIW I would like to see LTT now cover the AMD side of things, specifically the Instinct M100 card. It's priced about the same as the outgoing 40GB A100 shown here. How does the M100 compare performance-wise to nVidia and how does it measure when it comes to efficiency - questions that I'm sure many interested in it would want to ask.

MountainGoatAOE · February 24, 2022

Cross-posting from Reddit, as I feel this is an important consideration.

The A100 is even faster than you showed for deep learning tasks.

A major selling point of the A100 is its larger memory which is not taken advantage of in the video. It _appears_ as if you are using all the memory in both cards, but that is simply tensorflow allocating the whole GPU without actually using all of it. That means you can drastically increase the batch size in deep learning for the A100. Batch size is one of the first hyperparameters in deep learning to tune for your data<>device when you're optimizing your processing efficiency.

Batch size is basically "how many items should I push through the neural network at the same time". The more items you push through at the same time the faster you'll be in the long run, but the more memory you need. Imagine you are moving watermelons from the store to your car with a cart. The larger the cart (memory), the more watermelons you can fit (batch size), and the faster you're done. In this case, you have a a small cart (RTX 3090) and a large cart (A100) and you're only filling the A100 with the same number of watermelons as in the small cart. So you're not making optimal use of the whole cart; you could be done a lot faster if you'd filled the whole cart (larger batch size).

So in conclusion, the A100 is likely even a lot faster than you are currently seeing when making full use of the memory and optimising the batch size for each device.

Furthermore, as was noted on Reddit, Resnet-50 is an easy nut to crack in general. With its mere 23 million parameters, it's a drop of rain compared to the ocean of today's models. If you look at the field of natural language processing, we have models of hundred billion (GPT-3) to trillions (T5-XXL). But I understand that having a ready-made benchmark script to test out the device is more feasible. Would still have loved to see how far you could push the batch sizes in both cases before running into out-of-memory issues, though!

igormp · February 24, 2022

On 2/23/2022 at 7:24 PM, ACastanza said:

I ran this using WSL2 Ubuntu and Docker on Windows 11 and I was able to generally reproduce the video's results for an RTX 3090, with the above command and container and obtained a result of: total images/sec: 1378.18 on my Pny 3090 which was power limited at 360W.

When trying to do such benchmark on windows you're leaving tons of performance on the table, see: https://medium.com/analytics-vidhya/comparing-gpu-performance-for-deep-learning-between-pop-os-ubuntu-and-windows-69aa3973cc1f

Even when under WSL2 it's still slower, sadly. Here are my results with a 3060 using fp32 on linux:

And here the results from a 3060ti on windows from an acquaintance:

Whereas the 3060ti should be around 30% faster than my 3060.

Anyway, here are some values for comparison that I did some time ago on resnet50:

As you can see, the V100 is not that far behind while using a batch size 4x smaller (I didn't try a 256 batch size because there was no point in trying to compare it, I can try it again later), so a higher batch size (2056 maybe?) should be nice to see.

23 hours ago, Luscious said:

What's NOT so impressive is the price, and the only guy to blame for that is Jensen. Even when you factor in the 80GB HBM2 there is no way in hell the BOM for this should exceed four digits.

That's just the tag price, you can get those way cheaper when you're an actual big company looking to buy many GPUs at once.

14 hours ago, Luscious said:

FWIW I would like to see LTT now cover the AMD side of things, specifically the Instinct M100 card. It's priced about the same as the outgoing 40GB A100 shown here. How does the M100 compare performance-wise to nVidia and how does it measure when it comes to efficiency - questions that I'm sure many interested in it would want to ask.

Sadly you can't just run most of the workloads those nvidia gpus are meant to run. Most of the big ML frameworks are built on top of CUDA, and AMD's software stack is severely lacking when it comes to ML.

Now if you're looking into some FP64 workloads, then an AMD gpu is what you're looking for (think physics simulations).

13 hours ago, MountainGoatAOE said:

Cross-posting from Reddit, as I feel this is an important consideration.

The A100 is even faster than you showed for deep learning tasks.

A major selling point of the A100 is its larger memory which is not taken advantage of in the video. It _appears_ as if you are using all the memory in both cards, but that is simply tensorflow allocating the whole GPU without actually using all of it. That means you can drastically increase the batch size in deep learning for the A100. Batch size is one of the first hyperparameters in deep learning to tune for your data<>device when you're optimizing your processing efficiency.

Batch size is basically "how many items should I push through the neural network at the same time". The more items you push through at the same time the faster you'll be in the long run, but the more memory you need. Imagine you are moving watermelons from the store to your car with a cart. The larger the cart (memory), the more watermelons you can fit (batch size), and the faster you're done. In this case, you have a a small cart (RTX 3090) and a large cart (A100) and you're only filling the A100 with the same number of watermelons as in the small cart. So you're not making optimal use of the whole cart; you could be done a lot faster if you'd filled the whole cart (larger batch size).

So in conclusion, the A100 is likely even a lot faster than you are currently seeing when making full use of the memory and optimising the batch size for each device.

Furthermore, as was noted on Reddit, Resnet-50 is an easy nut to crack in general. With its mere 23 million parameters, it's a drop of rain compared to the ocean of today's models. If you look at the field of natural language processing, we have models of hundred billion (GPT-3) to trillions (T5-XXL). But I understand that having a ready-made benchmark script to test out the device is more feasible. Would still have loved to see how far you could push the batch sizes in both cases before running into out-of-memory issues, though!

Sadly LTT is mostly a gamer focused channel, so they're neither really knowledgeable enough to do this kind of stuff, nor would they audience really appreciate it. PugetSystems, ServeTheHome or Level1 on the other hand...

ImAlsoRan · February 25, 2022

On 2/23/2022 at 10:38 PM, danwat1234 said:

On your facebook page, the video is cut off like numerous others on your page. Why not do the little bit of work in order to make the aspect ratio correct? It's just a matter of re-encoding once you find the magic settings? Because, ad revenue?

https://www.facebook.com/LinusTech/posts/517830633046006

From my experience, Facebook and Instagram's algorithms tend to prefer taller videos, presumably because they take up the majority of a user's phone.

danwat1234 · February 25, 2022

13 hours ago, ImAlsoRan said:

From my experience, Facebook and Instagram's algorithms tend to prefer taller videos, presumably because they take up the majority of a user's phone.

I believe that. But, i think larger % of Linus community than general users will gladly landscape their mobile devices to get the full experience if they could.

T8z5h3 · February 26, 2022

On 2/24/2022 at 4:27 AM, Luscious said:

That T4 is a quirky little card with it's HH/HL form factor LOL. Not sure which one (Supermicro or Tyan) actually had a server SKU that crammed TWENTY of those into a single 4U chassis, all forced-air cooled. The motherboard they used for it was just as insane with a completely proprietary daughterboard that had 20 x8 PCI-e slots!!! Made for a nice 1400W space heater not including the dual Xeons and everything else in there.

FWIW I would like to see LTT now cover the AMD side of things, specifically the Instinct M100 card. It's priced about the same as the outgoing 40GB A100 shown here. How does the M100 compare performance-wise to nVidia and how does it measure when it comes to efficiency - questions that I'm sure many interested in it would want to ask.

I have a Nvidia Tesla T4 waiting to go In a dell percussion 3930 1 RU workstation with a Intel 9900,16 gb ddr4,500 gb hdd, dual 550W power supply and a Nvidia T400, win 10 pro (likely going win 11 pro). I will be using that card mostly for video encoding and AI upscaling.

dogwitch · February 26, 2022

i hope ltt looks at instict server gpus. those thing are monster.

Imbadatnames · February 27, 2022

Why would they send it it’s kit like they use it for it’s use case

xnamkcor · March 2, 2022

Shout out to the only true thermal paste pattern, the rice mark.

※

igormp · March 23, 2022

I did some tests myself with an A100 in case anyone is interested:

Just now, igormp said:

Got an A100 to try out because I was bored, got some nice numbers and we can clearly see that this workload is simply just too simple for this GPU.


+-------------------+---------------+----------------+----------------+----------------+----------------+---------------+----------------+----------------+----------------+----------------+-----------------+
|    GPU-Imgs/s     | FP32 Batch 64 | FP32 Batch 128 | FP32 Batch 256 | FP32 Batch 384 | FP32 Batch 512 | FP16 Batch 64 | FP16 Batch 128 | FP16 Batch 256 | FP16 Batch 384 | FP16 Batch 512 | FP16 Batch 1024 |
+-------------------+---------------+----------------+----------------+----------------+----------------+---------------+----------------+----------------+----------------+----------------+-----------------+
| 2060 Super        | 172           | NA             |            NA  |            NA  |            NA  | 405           |            444 |            NA  |            NA  |            NA  |            NA   |
| 3060              | 220           | NA             |            NA  |            NA  |            NA  | 475           |            500 |            NA  |            NA  |            NA  |            NA   |
| 3080              | 396           | NA             |            NA  |            NA  |            NA  | 900           |            947 |            NA  |            NA  |            NA  |            NA   |
| V100              | 369           | 394            |            NA  |            NA  |            NA  | 975           |           1117 |            NA  |            NA  |            NA  |            NA   |
| A100              | 766           | 837            |           873  |           865  |           OOM  | 1892          |           2148 |          2379  |          2324  |          2492  |          2362   |
| Radeon VII (ROCm) | 288           | 304            |            NA  |            NA  |            NA  | 393           |            426 |            NA  |            NA  |            NA  |            NA   |
| 6800XT (DirectML) | NA            | 63             |            NA  |            NA  |            NA  | NA            |             52 |            NA  |            NA  |            NA  |            NA   |
+-------------------+---------------+----------------+----------------+----------------+----------------+---------------+----------------+----------------+----------------+----------------+-----------------+

Also did an AI-benchmark run:

Device Inference Score: 21692
Device Training Score: 23542
Device AI Score: 45234

Mustahsin10 · August 10, 2022

On 2/24/2022 at 12:21 AM, jakkuh_t said:

What components are used in this build? Can someone provide me the details?

I can only interpret The PSU is Corsair AX 1600i , a MSI 3090 supreme and Tesla 100. What about the cpu and mobo?

Also is it mandatory to use the 3090 for display output? or any regular gpu is fine?

DhruvS · July 2, 2023

@jakkuh_tDid you ever revise that 3D-print for the cooler you designed for the A100? I'm thinking about slapping 3 similar form-factor cards in an ATX rig, and cooling is going to be an issue to say the least

Sign In

NVIDIA REFUSED To Send Us This

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites