Jump to content

AMD introduces new data center GPU. AMD Instict MI100

AndreiArgeanu
Go to solution Solved by AlexGoesHigh,
12 hours ago, thechinchinsong said:

At this point, it doesn't matter how much FP64/32 compute AMD can muster up anymore. The market for lots of ML and AI is already used to using CUDA and the various ways Nvidia have come up with to accelerate workloads through their hardware/software solutions. It's a decent attempt if you only limit the comparison to FP64/32 but in any other workload type, Nvidia has them beat and that's exactly what the market in ML and AI are moving toward anyways.

 

8 hours ago, tim0901 said:

Sounds like they're a bit better if you're using FP32, but not if you're using TF32 or FP16 (which tensor cores also accelerate) which are commonly used in mixed-precision learning.

 

I think the bigger problem will be their lack of CUDA. Every major deep learning framework supports CUDA, but many have limited support for OpenCL. AMD's card could be 50% faster in FP32, but that wouldn't matter as it can't run the code that people want it to.

 

3 hours ago, justpoet said:

I'm glad to see them attempting to compete in this space.  Even if the card were objectively better (as I'm sure it will be for at least some calculation types), the big elephant in the room isn't the card, but CUDA.  They need an answer to that, which is easy to widely adopt, if they want to actually compete heavily in the compute space.

They have a CUDA competitor/equivalent, it's called ROCm and they are launching v4.0 with this GPU and one part of it is to ease the transition/port from CUDA.

 

image.php?id=amd-mi100-rocm4&image=amd_m

 

More details from the phoronix article: https://www.phoronix.com/scan.php?page=article&item=amd-mi100-rocm4&num=1

 

Summary

 

 AMD has just released their new data center GPU's named Instinct. They claim to be the first to surpass the 10 TF FP64 barrier on a data center GPU. These GPU's also seem to have some kind of equivalent to nvidia's NVLink. The graphs they include also show the card to be faster than AMPERE at a lower power consumption (300W AMD - 400W NVIDIA).

 

Quotes

Quote

World’s Fastest HPC GPU1

Delivering up to 11.5 TFLOPs of double precision (FP64) theoretical peak performance, the AMD Instinct™ MI100 accelerator delivers leadership performance for HPC applications and a substantial up-lift in performance over previous gen AMD accelerators. The MI100 delivers up to a 74% generational double precision performance boost for HPC applications.13

image.png.9493fa552beaf5d26577736a17455005.pngimage.png.36bc23f51734d018e01bddd8851e8561.png

AMD Instinct™ MI100 accelerator is the world’s fastest HPC GPU, engineered from the ground up for the new era of computing.1 Powered by the AMD CDNA architecture, the MI100 accelerators deliver a giant leap in compute and interconnect performance, offering a nearly 3.5x the boost for HPC (FP32 matrix) and a nearly 7x boost for AI (FP16) throughput compared to prior generation AMD accelerators.2

 

 

 

My thoughts

Slowly yet surely AMD is starting to take the enterprise market as well with their Epyc, and now Instinct accelerator cards. I'm happy to see them succeed. Of course these are enterprise cards and are less relevant to an enthusiast, but it's still great to know. I'm also a bit curious about the connection the cards use to communicate with one another, it looks quite different than anything else I've seen before it. It's also a little interesting that they basically released the server gpus not long after the Intel Server GPU announcement. Oh yeah and the video AMD made about the card is pretty cool.

image.png.9dfc687416b53265b13f6bb3f7487728.png

 

Sources

 AMD

Link to comment
Share on other sites

Link to post
Share on other sites

That "superior performance for AI & Machine Learning" graph looks very sketchy and misleading.

 

For those interested, 19.5 TFLOPs of compute is on the CUDA cores. If they do it on the Tensor cores the A100 gets 156 TFLOPS of performance, which is more than 3 times as high as the MI100.

So it seems like AMD are crippling Nvidia's card in one of their comparisons.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, LAwLz said:

-snip-

So it seems like AMD are crippling Nvidia's card in one of their comparisons.

It's called marketing as usual.

PC #1 : Gigabyte Z170XP-SLI | i7-7700 | Cryorig C7 Cu | 32GB DDR4-2400 | LSI SAS 9211-8i | 240GB NVMe M.2 PCIe PNY CS2030 | SSD&HDDs 59.5TB total | Quantum LTO5 HH SAS drive | GC-Alpine Ridge | Corsair HX750i | Cooler Master Stacker STC-T01 | ASUS TUF Gaming VG27AQ 2560x1440 @ 60 Hz (plugged HDMI port, shared with PC #2) | Win10
PC #2 : Gigabyte MW70-3S0 | 2x E5-2689 v4 | 2x Intel BXSTS200C | 32GB DDR4-2400 ECC Reg | MSI RTX 3080 Ti Suprim X | 2x 1TB SSD SATA Samsung 870 EVO | Corsair AX1600i | Lian Li PC-A77 | ASUS TUF Gaming VG27AQ 2560x1440 @ 144 Hz (plugged DP port, shared with PC #1) | Win10
PC #3 : Mini PC Zotac 4K | Celeron N3150 | 8GB DDR3L 1600 | 250GB M.2 SATA WD Blue | Sound Blaster X-Fi Surround 5.1 Pro USB | Samsung Blu-ray writer USB | Genius SP-HF1800A | TV Panasonic TX-40DX600E UltraHD | Win10
PC #4 : ASUS P2B-F | PIII 500MHz | 512MB SDR 100 | Leadtek WinFast GeForce 256 SDR 32MB | 2x Guillemot Maxi Gamer 3D² 8MB in SLI | Creative Sound Blaster AWE64 ISA | 80GB HDD UATA | Fortron/Source FSP235-60GI | Zalman R1 | DELL E151FP 15" TFT 1024x768 | Win98SE

Laptop : Lenovo ThinkPad T460p | i7-6700HQ | 16GB DDR4 2133 | GeForce 940MX | 240GB SSD PNY CS900 | 14" IPS 1920x1080 | Win11

PC tablet : Fujitsu Point 1600 | PMMX 166MHz | 160MB EDO | 20GB HDD UATA | external floppy drive | 10.4" DSTN 800x600 touchscreen | AGFA SnapScan 1212u blue | Win98SE

Laptop collection #1 : IBM ThinkPad 340CSE | 486SLC2 66MHz | 12MB RAM | 360MB IDE | internal floppy drive | 10.4" DSTN 640x480 256 color | Win3.1 with MS-DOS 6.22

Laptop collection #2 : IBM ThinkPad 380E | PMMX 150MHz | 80MB EDO | NeoMagic MagicGraph128XD | 2.1GB IDE | internal floppy drive | internal CD-ROM drive | Intel PRO/100 Mobile PCMCIA | 12.1" FRSTN 800x600 16-bit color | Win98

Laptop collection #3 : Toshiba T2130CS | 486DX4 75MHz | 32MB EDO | 520MB IDE | internal floppy drive | 10.4" STN 640x480 256 color | Win3.1 with MS-DOS 6.22

And 6 others computers (Intel Compute Stick x5-Z8330, Giada Slim N10 WinXP, 2 Apple classic and 2 PC pocket WinCE)

Link to comment
Share on other sites

Link to post
Share on other sites

At this point, it doesn't matter how much FP64/32 compute AMD can muster up anymore. The market for lots of ML and AI is already used to using CUDA and the various ways Nvidia have come up with to accelerate workloads through their hardware/software solutions. It's a decent attempt if you only limit the comparison to FP64/32 but in any other workload type, Nvidia has them beat and that's exactly what the market in ML and AI are moving toward anyways.

Link to comment
Share on other sites

Link to post
Share on other sites

I consulted the documentation of nVIDIA A100 : https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet.pdf

 

156 TFLOPS, it's for FP32 standard but with the structural sparsity enabled, it runs 312 TFLOPS. It means about 7 times faster than AMD Instinct MI100 with their "FP32 Matrix".

PC #1 : Gigabyte Z170XP-SLI | i7-7700 | Cryorig C7 Cu | 32GB DDR4-2400 | LSI SAS 9211-8i | 240GB NVMe M.2 PCIe PNY CS2030 | SSD&HDDs 59.5TB total | Quantum LTO5 HH SAS drive | GC-Alpine Ridge | Corsair HX750i | Cooler Master Stacker STC-T01 | ASUS TUF Gaming VG27AQ 2560x1440 @ 60 Hz (plugged HDMI port, shared with PC #2) | Win10
PC #2 : Gigabyte MW70-3S0 | 2x E5-2689 v4 | 2x Intel BXSTS200C | 32GB DDR4-2400 ECC Reg | MSI RTX 3080 Ti Suprim X | 2x 1TB SSD SATA Samsung 870 EVO | Corsair AX1600i | Lian Li PC-A77 | ASUS TUF Gaming VG27AQ 2560x1440 @ 144 Hz (plugged DP port, shared with PC #1) | Win10
PC #3 : Mini PC Zotac 4K | Celeron N3150 | 8GB DDR3L 1600 | 250GB M.2 SATA WD Blue | Sound Blaster X-Fi Surround 5.1 Pro USB | Samsung Blu-ray writer USB | Genius SP-HF1800A | TV Panasonic TX-40DX600E UltraHD | Win10
PC #4 : ASUS P2B-F | PIII 500MHz | 512MB SDR 100 | Leadtek WinFast GeForce 256 SDR 32MB | 2x Guillemot Maxi Gamer 3D² 8MB in SLI | Creative Sound Blaster AWE64 ISA | 80GB HDD UATA | Fortron/Source FSP235-60GI | Zalman R1 | DELL E151FP 15" TFT 1024x768 | Win98SE

Laptop : Lenovo ThinkPad T460p | i7-6700HQ | 16GB DDR4 2133 | GeForce 940MX | 240GB SSD PNY CS900 | 14" IPS 1920x1080 | Win11

PC tablet : Fujitsu Point 1600 | PMMX 166MHz | 160MB EDO | 20GB HDD UATA | external floppy drive | 10.4" DSTN 800x600 touchscreen | AGFA SnapScan 1212u blue | Win98SE

Laptop collection #1 : IBM ThinkPad 340CSE | 486SLC2 66MHz | 12MB RAM | 360MB IDE | internal floppy drive | 10.4" DSTN 640x480 256 color | Win3.1 with MS-DOS 6.22

Laptop collection #2 : IBM ThinkPad 380E | PMMX 150MHz | 80MB EDO | NeoMagic MagicGraph128XD | 2.1GB IDE | internal floppy drive | internal CD-ROM drive | Intel PRO/100 Mobile PCMCIA | 12.1" FRSTN 800x600 16-bit color | Win98

Laptop collection #3 : Toshiba T2130CS | 486DX4 75MHz | 32MB EDO | 520MB IDE | internal floppy drive | 10.4" STN 640x480 256 color | Win3.1 with MS-DOS 6.22

And 6 others computers (Intel Compute Stick x5-Z8330, Giada Slim N10 WinXP, 2 Apple classic and 2 PC pocket WinCE)

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, LAwLz said:

That "superior performance for AI & Machine Learning" graph looks very sketchy and misleading.

 

For those interested, 19.5 TFLOPs of compute is on the CUDA cores. If they do it on the Tensor cores the A100 gets 156 TFLOPS of performance, which is more than 3 times as high as the MI100.

So it seems like AMD are crippling Nvidia's card in one of their comparisons.

but aren't tensor cores a special use case only? (just asking, i don't know anything about them tbh)

"If a Lobster is a fish because it moves by jumping, then a kangaroo is a bird" - Admiral Paulo de Castro Moreira da Silva

"There is nothing more difficult than fixing something that isn't all the way broken yet." - Author Unknown

Spoiler

Intel Core i7-3960X @ 4.6 GHz - Asus P9X79WS/IPMI - 12GB DDR3-1600 quad-channel - EVGA GTX 1080ti SC - Fractal Design Define R5 - 500GB Crucial MX200 - NH-D15 - Logitech G710+ - Mionix Naos 7000 - Sennheiser PC350 w/Topping VX-1

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, LAwLz said:

For those interested, 19.5 TFLOPs of compute is on the CUDA cores. If they do it on the Tensor cores the A100 gets 156 TFLOPS of performance, which is more than 3 times as high as the MI100.

But you don't/can't do FP32 on Tensor cores, very different thing. TensorFloat32 is not FP32. General compute may be faster but the equivalent accelerated paths are faster on Ampere.

Link to comment
Share on other sites

Link to post
Share on other sites

Sounds like they're a bit better if you're using FP32, but not if you're using TF32 or FP16 (which tensor cores also accelerate) which are commonly used in mixed-precision learning.

 

I think the bigger problem will be their lack of CUDA. Every major deep learning framework supports CUDA, but many have limited support for OpenCL. AMD's card could be 50% faster in FP32, but that wouldn't matter as it can't run the code that people want it to.

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

I'm glad to see them attempting to compete in this space.  Even if the card were objectively better (as I'm sure it will be for at least some calculation types), the big elephant in the room isn't the card, but CUDA.  They need an answer to that, which is easy to widely adopt, if they want to actually compete heavily in the compute space.

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, thechinchinsong said:

At this point, it doesn't matter how much FP64/32 compute AMD can muster up anymore. The market for lots of ML and AI is already used to using CUDA and the various ways Nvidia have come up with to accelerate workloads through their hardware/software solutions. It's a decent attempt if you only limit the comparison to FP64/32 but in any other workload type, Nvidia has them beat and that's exactly what the market in ML and AI are moving toward anyways.

 

8 hours ago, tim0901 said:

Sounds like they're a bit better if you're using FP32, but not if you're using TF32 or FP16 (which tensor cores also accelerate) which are commonly used in mixed-precision learning.

 

I think the bigger problem will be their lack of CUDA. Every major deep learning framework supports CUDA, but many have limited support for OpenCL. AMD's card could be 50% faster in FP32, but that wouldn't matter as it can't run the code that people want it to.

 

3 hours ago, justpoet said:

I'm glad to see them attempting to compete in this space.  Even if the card were objectively better (as I'm sure it will be for at least some calculation types), the big elephant in the room isn't the card, but CUDA.  They need an answer to that, which is easy to widely adopt, if they want to actually compete heavily in the compute space.

They have a CUDA competitor/equivalent, it's called ROCm and they are launching v4.0 with this GPU and one part of it is to ease the transition/port from CUDA.

 

image.php?id=amd-mi100-rocm4&image=amd_m

 

More details from the phoronix article: https://www.phoronix.com/scan.php?page=article&item=amd-mi100-rocm4&num=1

 

this is one of the greatest thing that has happened to me recently, and it happened on this forum, those involved have my eternal gratitude http://linustechtips.com/main/topic/198850-update-alex-got-his-moto-g2-lets-get-a-moto-g-for-alexgoeshigh-unofficial/ :')

i use to have the second best link in the world here, but it died ;_; its a 404 now but it will always be here

 

Link to comment
Share on other sites

Link to post
Share on other sites

I really like that faceplate looks so sleek. Though yeah the separate design for these CDNA based ones is a good move. 

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, AlexGoesHigh said:

 

 

They have a CUDA competitor/equivalent, it's called ROCm and they are launching v4.0 with this GPU and one part of it is to ease the transition/port from CUDA.

 

 

 

More details from the phoronix article: https://www.phoronix.com/scan.php?page=article&item=amd-mi100-rocm4&num=1

 

ROCm isn't a perfect solution though.

 

1. It doesn't support consumer GPUs. Only CDNA compute GPUs are supported (not RDNA or RDNA2), whilst CUDA can be used on anything. Lots of universities will buy Titans for machine learning applications, rather than spending 10x more on server-grade components. Same is true for startups, or even just individuals trying to get into machine learning. None of them will be buying a multi-thousand-dollar CDNA card - they have no choice but to go RTX just based on price.

 

2. It does still require work on the side of the developers to add support for AMD gpus, no matter how easy AMD makes that transition (and then they need to optimise their code for the new platform, which will take significantly longer than mere implementation). It does look promising and I hope it can shake things up, but previous versions have been around for a while now and they haven't done much to change things, so I can't see this doing much either. Just like Intel in the CPU space, Nvidia has a stronghold on the server compute market that isn't going to change with a single generation, their reputation there is far higher than AMD's.

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

The only market that I can see this card thriving is FP64-heavy, OpenCL-based computations/simulations in the HPC scenario, a really small niche.

23 hours ago, bcredeur97 said:

but aren't tensor cores a special use case only? (just asking, i don't know anything about them tbh)

A "special use case" that's widely used. Most ML stuff relies on FMA of low precision matrices, which is exactly what the tensor cores do.

 

11 hours ago, AlexGoesHigh said:

They have a CUDA competitor/equivalent, it's called ROCm and they are launching v4.0 with this GPU and one part of it is to ease the transition/port from CUDA.

Adding to the excellent points made by @tim0901, ROCm is also a pain to get everything working, and performance is really subpar. Even my 2060 Super can beat a Radeon VII is all under ROCm vs CUDA.

 

1 hour ago, tim0901 said:

It doesn't support consumer GPUs

It does work with pre-RDNA cards (like the Polaris cards), but it isn't worth it.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Lisa Su is laying a smaketh teh downeth and you know it will be the rock bottom for Nvidia... 

 

-Yours faithfully WWE & The Rock.

My computer for gaming & work. AMD Ryzen 3600x with XFR support on - Arctic Cooling LF II - ASUS Prime X570-P - Gigabyte 5700XT - 32GB Geil Orion 3600 - Crucial P1 1TB NVME - Crucial BX 500 SSD - EVGA GQ 650w - NZXT Phantom 820 Gun Metal Grey colour - Samsung C27FG73FU monitor - Blue snowball mic - External best connectivity 24 bit/ 96khz DAC headphone amp -Pioneer SE-205 headphone - Focal Auditor 130mm speakers in custom sealed boxes - inPhase audio XT 8 V2 wired at 2ohm 300RMS custom slot port compact box - Vibe Audio PowerBox 400.1

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, tim0901 said:

1. It doesn't support consumer GPUs. Only CDNA compute GPUs are supported (not RDNA or RDNA2), whilst CUDA can be used on anything. Lots of universities will buy Titans for machine learning applications, rather than spending 10x more on server-grade components. Same is true for startups, or even just individuals trying to get into machine learning. None of them will be buying a multi-thousand-dollar CDNA card - they have no choice but to go RTX just based on price.

 

i wonder if AMD will make cheaper CDNA GPUs for that kind of stuff

✨FNIGE✨

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, SlimyPython said:

i wonder if AMD will make cheaper CDNA GPUs for that kind of stuff

CDNA is just a rebranding of GCN, which they gave up on the consumer market.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, igormp said:

CDNA is just a rebranding of GCN, which they gave up on the consumer market.

https://www.anandtech.com/show/15593/amd-unveils-cdna-gpu-architecture-a-dedicated-gpu-architecture-for-data-centers

as far as I know no it isn't

 

11 minutes ago, SlimyPython said:

i wonder if AMD will make cheaper CDNA GPUs for that kind of stuff

they were already cheaper but yeah they'll have at least 2-3 more . the last gen had mi50,60,25

Good luck, Have fun, Build PC, and have a last gen console for use once a year. I should answer most of the time between 9 to 3 PST

NightHawk 3.0: R7 5700x @, B550A vision D, H105, 2x32gb Oloy 3600, Sapphire RX 6700XT  Nitro+, Corsair RM750X, 500 gb 850 evo, 2tb rocket and 5tb Toshiba x300, 2x 6TB WD Black W10 all in a 750D airflow.
GF PC: (nighthawk 2.0): R7 2700x, B450m vision D, 4x8gb Geli 2933, Strix GTX970, CX650M RGB, Obsidian 350D

Skunkworks: R5 3500U, 16gb, 500gb Adata XPG 6000 lite, Vega 8. HP probook G455R G6 Ubuntu 20. LTS

Condor (MC server): 6600K, z170m plus, 16gb corsair vengeance LPX, samsung 750 evo, EVGA BR 450.

Spirt  (NAS) ASUS Z9PR-D12, 2x E5 2620V2, 8x4gb, 24 3tb HDD. F80 800gb cache, trueNAS, 2x12disk raid Z3 stripped

PSU Tier List      Motherboard Tier List     SSD Tier List     How to get PC parts cheap    HP probook 445R G6 review

 

"Stupidity is like trying to find a limit of a constant. You are never truly smart in something, just less stupid."

Camera Gear: X-S10, 16-80 F4, 60D, 24-105 F4, 50mm F1.4, Helios44-m, 2 Cos-11D lavs

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, GDRRiley said:

https://www.anandtech.com/show/15593/amd-unveils-cdna-gpu-architecture-a-dedicated-gpu-architecture-for-data-centers

as far as I know no it isn't

 

they were already cheaper but yeah they'll have at least 2-3 more . the last gen had mi50,60,25

These cards were pretty much only available in bulk (100+ card orders) via an AMD representative, and were hardly "cheap". They didn't really have a sticker price, instead like many products like this their price depended on who was inquiring. Small, underfunded university? Maybe a few grand each. Government organisation? Try tripling that as a minimum. An RTX TITAN or a midrange Quadro was (and likely will remain) pretty competitively priced in comparison for startups and universities alike.

 

Unfortunately the Radeon Pro lineup still uses RDNA, rather than CDNA, and that doesn't look to be changing anytime soon. You have to go up two product categories to get ROCm support compared to Nvidia's offerings.

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, tim0901 said:

Unfortunately the Radeon Pro lineup still uses RDNA, rather than CDNA, and that doesn't look to be changing anytime soon. You have to go up two product categories to get ROCm support compared to Nvidia's offerings.

right because the pro line is focused on workloads that need RDNAs gaming performance with pro drivers.

Good luck, Have fun, Build PC, and have a last gen console for use once a year. I should answer most of the time between 9 to 3 PST

NightHawk 3.0: R7 5700x @, B550A vision D, H105, 2x32gb Oloy 3600, Sapphire RX 6700XT  Nitro+, Corsair RM750X, 500 gb 850 evo, 2tb rocket and 5tb Toshiba x300, 2x 6TB WD Black W10 all in a 750D airflow.
GF PC: (nighthawk 2.0): R7 2700x, B450m vision D, 4x8gb Geli 2933, Strix GTX970, CX650M RGB, Obsidian 350D

Skunkworks: R5 3500U, 16gb, 500gb Adata XPG 6000 lite, Vega 8. HP probook G455R G6 Ubuntu 20. LTS

Condor (MC server): 6600K, z170m plus, 16gb corsair vengeance LPX, samsung 750 evo, EVGA BR 450.

Spirt  (NAS) ASUS Z9PR-D12, 2x E5 2620V2, 8x4gb, 24 3tb HDD. F80 800gb cache, trueNAS, 2x12disk raid Z3 stripped

PSU Tier List      Motherboard Tier List     SSD Tier List     How to get PC parts cheap    HP probook 445R G6 review

 

"Stupidity is like trying to find a limit of a constant. You are never truly smart in something, just less stupid."

Camera Gear: X-S10, 16-80 F4, 60D, 24-105 F4, 50mm F1.4, Helios44-m, 2 Cos-11D lavs

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, GDRRiley said:

right because the pro line is focused on workloads that need RDNAs gaming performance with pro drivers.

All I'm saying is that AMD are massively limiting their target audience by not giving those cards ROCm support - they would be far more appealing to many people if they did.

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, GDRRiley said:

as far as I know no it isn't

They clearly state it on their whitepaper:

Quote

While inspired by the prior-generation GCN architecture, each CU is rearchitected and enhanced with a Matrix Core Engine

...

The CUs are derived from the earlier GCN architecture and execute wavefronts that contain 64 work-items

...

The AMD CDNA architecture builds on GCN’s foundation of scalars and vectors and adds matrices as a first class citizen

It's based on GCN with some new feature to try and reach feature parity with nvidia's tensor cores.

 

1 hour ago, tim0901 said:

by not giving those cards ROCm support

Support is coming soon: 

 

1 hour ago, tim0901 said:

they would be far more appealing to many people if they did.

Only if all you want to do is toy with ROCm and get stuff to work, otherwise you can't really get much work done, and the performance you'd get would be way worse than an equivalent nvidia (so you'd be effectively wasting money to get the same thing done).

 

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, igormp said:

Support is coming soon: 

Support for RDNA has been "coming soon" since its launch over a year ago. I'll believe it when I see it.

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×