Jump to content

Nvidia shows off Post Volta GPUs in Next Generation Drive PX

AlTech
7 hours ago, GoodBytes said:

And, you allow the software to catch up to the hardware.  Already a 1080 runs a wide variety of games at max or near max settings are super fast fps. So what is the rush?

The problem is mainly the prices. The 1080 is quite expensive for a ‎314 mm2 GPU (well, to be fair, it's not like AMD has great prices either).
The 1080 can just about run at 4k medium/high settings, look at Deus Ex Mankind Divided for example.

We need more 4k capable video cards, currently the only ones that are capable of even running 4k at somewhat decent settings in modern titles are the 1070-class or higher GPUs. Of course we also need the developers to catch up to the hardware so that vulkan and directx 12 are properly utilized.

Heck, the hardware still needs to catch up to the hardware... Where are the HDR monitors? We have plenty of HDR TVs but no Gsync HDR or Freesync 2 monitors.

hello!

is it me you're looking for?

ᴾC SᴾeCS ᴰoWᴺ ᴮEᴸoW

Spoiler

Desktop: X99-PC

CPU: i7 5820k

Mobo: X99 Deluxe

Cooler: Dark Rock Pro 3

RAM: 32GB DDR4
GPU: GTX 1080

Storage: 1TB 850 Evo, 1TB HDD, bunch of external hard drives
PSU: EVGA G2 750w

Peripherals: Logitech G502, Ducky One 711

Audio: Xonar U7, O2 amplifier (RIP), HD6XX

Monitors: 4k 24" Dell monitor, 1080p 24" Asus monitor

 

Laptop:

-Overkill Dell XPS

Fully maxed out early 2017 Dell XPS 15, GTX 1050 4GB, 7700HQ, 1TB nvme SSD, 32GB RAM, 4k display. 97Whr battery :x 
Dell was having a $600 off sale for the fully specced out model, so I decided to get it :P

 

-Crapbook

Fully specced out early 2013 Macbook "pro" with gt 650m and constant 105c temperature on the CPU (GPU is 80-90C) when doing anything intensive...

A 2013 laptop with a regular sized battery still has better battery life than a 2017 laptop with a massive battery! I think this is a testament to apple's ability at making laptops, or maybe how little CPU technology has improved even 4+ years later (at least, until the recent introduction of 15W 4 core CPUs). Anyway, I'm never going to get a 35W CPU laptop again unless battery technology becomes ~5x better than as it is in 2018.

Apple knows how to make proper consumer-grade laptops (they don't know how to make pro laptops though). I guess this mostly software power efficiency related, but getting a mac makes perfect sense if you want a portable/powerful laptop that can do anything you want it to with great battery life.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, Sniperfox47 said:

Planes also recieved signals from ground control and don't rely on computer vision though... They don't need to read traffic signs and detect when little Suzy's dog jumps out in front of them. There are electronic communications in place that *drastically* reduce the complexity of control for a plane.

 

Why do you even want them to release Volta to consumers? If you take the v100 they're using in the Teslas you'd be looking at a $1500-2000 graphics card minimum, and if they cut it down and removed the new enterprise level features from it to make it cheaper you'd have Pascal, for all intents and purposes. Volta has no practical benefit on the consumer side, and the dies are *insanely* expensive.

 

What are the Volta features?

  • 8-bit/16-bit optimized SM architecture (which gets cut out on consumer systems)
  • Higher bandwidth nvlink (which doesn't exist on consumer systems)
  • HBM2 (which would be cut out and replaced with GDDR5 for consumer systems)
  • Multi-Process Service for QoS assurances while running multiple CUDA applications (datacenter feature that doesn't apply to consumer users)
  • Unified memory and address translation with IBM power machines using nvlink (not compatible with x86 machines or PCIe)
  • And Cooperative Group APIs for CUDA9 (backported to Maxwell and Pascal, with only some patterns exclusive to Volta)

 

I'd much rather have a consumer Pascal chip with all the reduced precision support left enabled than have a Volta chip with all of that stuff still cut out.

I mean part of the reason they want all cars to have a signal for selfdriving is to make it more like planes. Planes have signals they give off telling their position allowing for other planes to navigate around them. Once all cars have this too self driving cars will become a reality. 

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, Sniperfox47 said:

Planes also recieved signals from ground control and don't rely on computer vision though... They don't need to read traffic signs and detect when little Suzy's dog jumps out in front of them. There are electronic communications in place that *drastically* reduce the complexity of control for a plane.

 

Why do you even want them to release Volta to consumers? If you take the v100 they're using in the Teslas you'd be looking at a $1500-2000 graphics card minimum, and if they cut it down and removed the new enterprise level features from it to make it cheaper you'd have Pascal, for all intents and purposes. Volta has no practical benefit on the consumer side, and the dies are *insanely* expensive.

 

What are the Volta features?

  • 8-bit/16-bit optimized SM architecture (which gets cut out on consumer systems)
  • Higher bandwidth nvlink (which doesn't exist on consumer systems)
  • HBM2 (which would be cut out and replaced with GDDR5 for consumer systems)
  • Multi-Process Service for QoS assurances while running multiple CUDA applications (datacenter feature that doesn't apply to consumer users)
  • Unified memory and address translation with IBM power machines using nvlink (not compatible with x86 machines or PCIe)
  • And Cooperative Group APIs for CUDA9 (backported to Maxwell and Pascal, with only some patterns exclusive to Volta)

 

I'd much rather have a consumer Pascal chip with all the reduced precision support left enabled than have a Volta chip with all of that stuff still cut out.

all these volta features except for CUDA are actually in vega. Vega and GCN variants all have full 8, 16,32  and 64 bit data support that they arent nerfed (except for fp64 on some cards). Vega and fiji came with HBM2, GCN is focused on compute, stuff like async threads, ability for GPU to run by itself without CPU and for multi tasking.

Unified memory and address translation is an old GPU feature, something that came out since dx10 and openCL due to requirements (even phones that support openCL also have unified memory just like snapdragon).

 

So i dont see how volta features are special, especially because before fp 32 became the norm for graphics, many GPUs were either 8 or 16 bit focused. The reason why nvidia nerfs all the data bits except fp 32 on GPU is mainly to get you to buy their tesla line for it when the consumer cards are perfectly capable of full performance, only artificially nerfed.

 

What i would like to see from nvidia is a consumer GPU that is not cut down and has nothing artificially nerfed. For instance with AMD the only difference between their pro and standard radeons are display outputs, vram amount, ECC and drivers as well and i like how they forced nvidia to make the titan relevant again by releasing vega fe being an in between card of pro and consumer. Ever since kepler nvidia has nerfed their consumer GPUs artificially for anything not fp 32 related even though the core/chip is totally capable of full performance with the units all in (talking about 8 and 16 bit for example). So if you compare AMD's 16 bit performance to nvidia's consumer card 16 bit performance, nvidia cards look embarrassingly slow.

 

However while nvidia is focusing on their compute cards for the cray and automative, AMD is focusing on gaming and server compute, so prices are only going to go up from both brands as they arent really competing against each other so consumers are going to be getting the worst when it comes to dedicated GPUs that APUs and even intel's mainstream with IGP are only going to be relevant and affordable in the near future.

 

And regarding planes flying themselves and not using nvidia, heres an example:

https://www.usatoday.com/story/travel/columnist/cox/2014/02/09/autoland-low-visibility-landings/5283931/#

There are loads more examples as this only applies to modern commercial jets but i want to make a couple of things clear regarding this

- Pilots arent exactly necessary, but they are there because you wouldnt want to fly if there wasnt one and for pilots to deal with weird issues that arise

- cars arent exactly random. If cars only drove themselves they could communicate and coordinate with each other in deciding things

- humans tend to be assholes, which is what makes driving dangerous. Flying is statistically safer than driving because of this, not because of the random things that happen on the ground, but pilots go through proper training while you can get people who dont care about others on the road being a danger to everyone

- aviation rules are very strict compared to driving rules. You can break the rules on the road and not get any fines or cops to pull you over but you cant break aviation rules without strict penalties.

 

So if we started treating driving like we do with flying (proper training + strict enforcement), the road would be much safer and self driving cars could develop at a much faster pace allowing early release and continual development and changes regularly. You dont need an nvidia supercomputer to process everything, you just need more cars to have self driving computers and to communicate with other cars  The problem is that the automative modules from nvidia cost thousands and that will significantly inflate car prices. Since nvidia is finding out they wont be able to compete in gamer space with the prices they want to charge (See AMD roadmap), they figure they'll charge loads to big companies who will pass the costs down to us, i mean if you expect one of these in every car, its more like us buying it ourselves when we buy a car.

 

So the future of GPUs is bleak because of pricing and lack of actual competition. Its not about whether one side is better than the other, but if left unchecked any company (intel/AMD/nvidia) will just stop innovating and start charging loads. If AMD was in nvidia's shoes, they'd be doing the same thing. I have yet to see a good monopoly when in theory a monopoly should mean better and cheaper products with a lot of capability for innovation. Its a human quality to be greedy and be complacent when you have wealth.

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, System Error Message said:

planes can fly themselves even with traffic, and can land themselves too, even take off themselves.

You have various different modes for autopilots in commercial planes, different for private aircraft though.

Planes also have a mode that allows for remote control, this is usually done from another aircraft that can physically see what is going on and can fly near.

 

15 hours ago, GoodBytes said:

Maybe in some research lab. But the FAA certification is so extensive... it is for a reason why plane sports 3DFX chips for its graphical interface, and not a GeForce GTX 1080.

 

Flying in the air is simple. Landing and take off if where you need a pilot, and it is most dangerous part of flying. Highly trained pilot is needed, with quick thinking to manage all sorts of changing situation as the plane takes off and lands.

 

If AI in planes was a legit thing, trust me, Air line company would ditch pilots day 1.

Large airliners can land themselves, have been able to for a very long time. However the airport itself needs the equipment to support it, obviously, however manual landing is still preferred and not all the reasons are strictly technical or to do with safety.

 

  • Passengers, would you get on a plane without a pilot? Would you feel more or less safe if the pilot wasn't actually flying the plane? If pilots don't fly the plane then how good are they actually, what happens in a disaster, can the pilot be counted on?
  • Pilot skill, if most airports support auto landing what happens when you actually do need to manually land? Are you as skilled as you would have been?
  • Traffic control, who has final authority once automated landing procedure as been initiated? Who can abort the landing? What if there needs to be a runway change on approach?

There will be many more factors, these are just the ones I can think of or have heard as a non air transport person.

 

What it actually comes down to the most is the first point, passengers. Very few people would be willing to get on a plane with no pilot so while that is still the overriding factor very little investment and development will be put in to autonomous commercial air travel. We already know it would be safer and more efficient, way more efficient at very busy airports, but this isn't the only thing that we don't do while knowing there is a better way or a better technology.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, leadeater said:

 

What it actually comes down to the most is the first point, passengers. Very few people would be willing to get on a plane with no pilot so while that is still the overriding factor very little investment and development will be put in to autonomous commercial air travel. We already know it would be safer and more efficient, way more efficient at very busy airports, but this isn't the only thing that we don't do while knowing there is a better way or a better technology.

Actually if both the control tower and planes were autonomous, communication is reliable, it would be more efficient than manning both the towers and planes. However commercial planes arent the only thing in the air. There are many private planes and control towers must be manned for emergencies too.

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, AluminiumTech said:

*FP16 & FP8.

Actually it's INT8 not FP8.

 

14 hours ago, AluminiumTech said:

VEGA has great FP16 and FP8 perf so there's no reason why GeForce Volta shouldn't have it.

However Nvidia always removes it, even from the Quadro line of cards now too. You only get good FP16 and INT8 from the Gx100 chips and never the lesser ones. I forget when they started doing this but it was a few generations ago. AMD does not alter any of the raw compute performance of the GPUs than which the hardware configuration can support.

 

13 hours ago, Sniperfox47 said:

They had to switch to AMD recently after the fallout from the big crash. Most PR/damage control in my opinion, it also helped that Jim Keller now works for Tesla

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, System Error Message said:

Actually if both the control tower and planes were autonomous, communication is reliable, it would be more efficient than manning both the towers and planes. However commercial planes arent the only thing in the air. There are many private planes and control towers must be manned for emergencies too.

Still all that aside go do a survey of 10,000 people and ask them would they get on a plane without a pilot, I'm fairly sure you won't be surprised by the results ;).

 

Unless that changes nothing else will.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Brooksie359 said:

I mean part of the reason they want all cars to have a signal for selfdriving is to make it more like planes. Planes have signals they give off telling their position allowing for other planes to navigate around them. Once all cars have this too self driving cars will become a reality. 

The issue is that even if *all* cars have positional signalling, pedestrians don't. Animals don't. Road hazards dont. There's a large number of factors that rely on computer vision with cars that simply do not exist in aviation.

 

1 hour ago, System Error Message said:

all these volta features except for CUDA are actually in vega. Vega and GCN variants all have full 8, 16,32  and 64 bit data support that they arent nerfed (except for fp64 on some cards). Vega and fiji came with HBM2, GCN is focused on compute, stuff like async threads, ability for GPU to run by itself without CPU and for multi tasking.

Unified memory and address translation is an old GPU feature, something that came out since dx10 and openCL due to requirements (even phones that support openCL also have unified memory just like snapdragon).

 

So i dont see how volta features are special, especially because before fp 32 became the norm for graphics, many GPUs were either 8 or 16 bit focused. The reason why nvidia nerfs all the data bits except fp 32 on GPU is mainly to get you to buy their tesla line for it when the consumer cards are perfectly capable of full performance, only artificially nerfed.

 

What i would like to see from nvidia is a consumer GPU that is not cut down and has nothing artificially nerfed. For instance with AMD the only difference between their pro and standard radeons are display outputs, vram amount, ECC and drivers as well and i like how they forced nvidia to make the titan relevant again by releasing vega fe being an in between card of pro and consumer. Ever since kepler nvidia has nerfed their consumer GPUs artificially for anything not fp 32 related even though the core/chip is totally capable of full performance with the units all in (talking about 8 and 16 bit for example). So if you compare AMD's 16 bit performance to nvidia's consumer card 16 bit performance, nvidia cards look embarrassingly slow.

 

However while nvidia is focusing on their compute cards for the cray and automative, AMD is focusing on gaming and server compute, so prices are only going to go up from both brands as they arent really competing against each other so consumers are going to be getting the worst when it comes to dedicated GPUs that APUs and even intel's mainstream with IGP are only going to be relevant and affordable in the near future.

A) My statement had nothing to do with Vega, it was comparing Volta to Pascal. But since you bring up Vega...

 

B) Vega does not have high throughput DP FP64 support. Nor does it have reduced precision Int-8 support. It has FP8, but not Int8.

 

C) Yes they came with HBM2. And how many issues have they had because of this, in terms of both cost and supply?

 

D) Volta's unified memory is more akin to an HSA architecture in an ARM SoC than what you're talking about. OpenCL Unified Memory is the same thing as CUDA Unified Memory. It's not to allow the GPU to use system memory directly, but rather to make memory management easier for CUDA/OpenCL apps. Volta can actually directly leverage system memory when using nvlink with a Power based CPU.

 

E) Improved throughput reduced precision does require hardware in core. It does add to complexity, size, and cost of the GPU. It's not just fused off in their cut down CPUs, it's actually removed to save size and complexity.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, leadeater said:

Actually it's INT8 not FP8.

To be fair, FP8 is a thing, so it's not unrealistic that he thought I was referring to that. And VEGA does have good FP8 performance as he mentioned, though not good Int8.

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, Sniperfox47 said:

To be fair, FP8 is a thing, so it's not unrealistic that he thought I was referring to that. And VEGA does have good FP8 performance as he mentioned, though not good Int8.

12 hours ago, Sniperfox47 said:

It literally has 1/5-1/4 the Int8 TOPS that the V100 has... 4/5ths the FP32 performance... and 1/10th the DP FP64 performance of the GV100... at 230Watts vs 250 Watts... What are you even talking about? Unless you're gaming on it (mostly FP32) it makes no sense for a datacenter. Either you're doing FP64 (scientific models, data analysis, etc.) or you're doing Int8 (Neural network tensors).

 

You do need to be careful when talking about and comparing Volta to anything else, from both vendors. The Tensor cores are much less flexible than the CUDA cores and are more fixed function then not, that is a benefit too though as that is how they get 120 TFLOPS out of them.

 

AMD on the other hand has generalist INT8 hardware capability but whether or not that is a benefit I don't know. When most of the industry is developing Tensor Flow then how is this better than something specifically dedicated to that. Also to get those good FP16 and INT8 hardware performance AMD had to cut out the hardware level FP64 capability, chasing the growing market is good though.

 

Btw Tesla V100 SXM2 is 300W and MI25 is also 300W.

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, michaelocarroll007 said:

seems like Intel is responding quickly to AMD which either means that they really were sitting and holding back or the huge more amount of money helped Make it be able to respond quicker. 

Their latest two releases have been a bit hap hazard and the whole x series thing with so many bits missing/seeming irrelevant and bits you have to pay extra for suggest they weren't ready to launch but had to anyway.  This would indicate they were not sitting on products as they nearly got caught with their pants down.

 

17 hours ago, JuztBe said:

Sadly thats what you get with lack of competition. Intel was sitting on their ass for ~5 years, now nvidia will  do the same.

I just wish that the passion CEOs once had in their fields wouldn't get sucked out so fast after seeing their wallets icreasings exponetially. Imagine EA or Activision ran by people who are passionate about gaming.

 

Sadly that is just a myth the internet likes to perpetuate, any company that sits idle ceases to be a company.  There is to a certain degree the whole sandbagging thing, but it is nowhere near as major as people seem to make out.     Money is always the end goal, and thus in order to maintain a revenue steam they have to convince consumers (both domestic and commercial) to upgrade. Which means they have to offer something better than their last.  I know this is a very raw summation of corporate economics but it explains why the it is a myth that companies without competition don't innovate.

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

51 minutes ago, Sniperfox47 said:

The issue is that even if *all* cars have positional signalling, pedestrians don't. Animals don't. Road hazards dont. There's a large number of factors that rely on computer vision with cars that simply do not exist in aviation.

 

A) My statement had nothing to do with Vega, it was comparing Volta to Pascal. But since you bring up Vega...

 

B) Vega does not have high throughput DP FP64 support. Nor does it have reduced precision Int-8 support. It has FP8, but not Int8.

 

C) Yes they came with HBM2. And how many issues have they had because of this, in terms of both cost and supply?

 

D) Volta's unified memory is more akin to an HSA architecture in an ARM SoC than what you're talking about. OpenCL Unified Memory is the same thing as CUDA Unified Memory. It's not to allow the GPU to use system memory directly, but rather to make memory management easier for CUDA/OpenCL apps. Volta can actually directly leverage system memory when using nvlink with a Power based CPU.

 

E) Improved throughput reduced precision does require hardware in core. It does add to complexity, size, and cost of the GPU. It's not just fused off in their cut down CPUs, it's actually removed to save size and complexity.

A) i know, i just brought it up as a comparison in showing theres nothing new or special about volta

 

2) Vega has int 8 support. https://medium.com/intuitionmachine/building-a-50-teraflops-amd-vega-deep-learning-box-for-under-3k-ebdd60d4a93c

Key points about vega

- unrestricted int 8, fp 16 and fp 64 (on nvidia consumers these things cant be physically removed as they are actually part of the shader units itself. That Shader consisting of ALUs and FPUs can do various different things, its part of the same thing)

- vega can address system ram and even storage directly, so volta is late with this feature and vega does it on the consumer line as well.

- the article mentions volta having dedicated deep learning stuff, this is something additional that isnt a native part of standard GPU shaders but on the chip level, this is integrated into every GPU core ( not shaders). As we all know GPU cores consist of a bunch of things so volta is gonna have a lot of these extra things.

 

However currently AMD has had the capability for this before nvidia but no one utilised them. To quite from a scottish youtuber "nvidia has mindshare", they make you believe or think that their hardware is superior when it is not. AMD has had the capability on the consumer level GPU for a while but it was never utilised, they do seem to be too far ahead but never mention it when infact they could've teased down nvidia when the AI fad became popular that their cards even on consumer level was already good at it before nvidia came out with pascal and didnt restrict int8.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, System Error Message said:

Being able to do Int8 and having accelerated Int8 are different things. Nowhere in that article does it mention accelerated "packed" reduced precision Int8, just "Int8 capability".

 

1 hour ago, System Error Message said:

vega can address system ram and even storage directly, so volta is late with this feature and vega does it on the consumer line as well.

Again, with external cards that's via a memory copy to maintain coherency. And sure Vega, like Polaris can access system memory in an HSA setup without memory copies, but that's exclusive to the APUs since it needs an HSA setup.

 

The Volta/NVLink/Power setup basically creates an HSA setup using that external interconnect to save memory overhead. That's a massively oversimplified explanation, but it's true for most practical purposes. It's not something that's going to happen with x86 systems anytime soon, if ever.

 

2 hours ago, leadeater said:

 

You do need to be careful when talking about and comparing Volta to anything else, from both vendors. The Tensor cores are much less flexible than the CUDA cores and are more fixed function then not, that is a benefit too though as that is how they get 120 TFLOPS out of them.

 

AMD on the other hand has generalist INT8 hardware capability but whether or not that is a benefit I don't know. When most of the industry is developing Tensor Flow then how is this better than something specifically dedicated to that. Also to get those good FP16 and INT8 hardware performance AMD had to cut out the hardware level FP64 capability, chasing the growing market is good though.

 

Btw Tesla V100 SXM2 is 300W and MI25 is also 300W.

Yeah, but he had mentioned the RW 9100 which is why I figured comparing it to the PCIe version was more fair, and why I mentioned that it was more of a workstation card than a datacenter card.

 

And yeah, they may be fixed function but they work with TF, Caffe, and other major network frameworks so what does it really matter? When are you realistically going to be using accelerated Int8 for anything other than deep learning?

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, DildorTheDecent said:

No point in releasing Volta for the gamer crowd at this time. 

 

Nvidia would only be competing with themselves. 

 

Vega would look like a toy. 

Vega is already a bit of a joke to me. I don't see how AMD did so well in the CPU market but then the GPU division is still just sitting in the corner and eating glue.

Ketchup is better than mustard.

GUI is better than Command Line Interface.

Dubs are better than subs

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Sniperfox47 said:

And yeah, they may be fixed function but they work with TF, Caffe, and other major network frameworks so what does it really matter? When are you realistically going to be using accelerated Int8 for anything other than deep learning?

Um...  that one AMD sponsored title and then.... :P

 

Anyway I know you didn't make the mistake but others have, that 120 TFLOPS only applies to that and nothing else and I've seen people use that figure for non INT8.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Sniperfox47 said:

The Volta/NVLink/Power setup basically creates an HSA setup using that external interconnect to save memory overhead. That's a massively oversimplified explanation, but it's true for most practical purposes. It's not something that's going to happen with x86 systems anytime soon, if ever.

Gen-z will be what x86 can use, HPE have a working system using it but it's very early days and ARM. AMD is a pretty big player and a founding member in the consortium.

 

When we had the briefing meeting with HPE for the up coming Gen 10 servers about 3-4 months ago they were talking up Gen-z a lot, in the Proliant Gen 10 context for both Intel and AMD. I assume I'm allowed to say that, Gen 10 has been announced but very little about Gen-z but they could have been full of it so time will tell.

 

Radeon Instinct has a lot of the features you talked about in a few posts, what I don't know is how equivalent they are. Nvidia has the maturity advantage and AMD has the cross ecosystem advantage (when they get there).

http://gpgpu10.athoura.com/ROCM_GPGPU_Keynote.pdf

 

Quote

In-node

  • Large BAR support BAR = Base Address Register    Making the GPU memory visible BAR 1 Region Suppoted in Radeon Instinct MI25,MI8, MI6
  • ROCr Base driver has P2P API support
  • ROCr (HSA) AGENT API with Peer to Peer support
  • HCC Language Runtime support of P2P    ROCr Agent API
  • HIP Language Runtime support of P2P    P2P API’s model after CUDA P2P API’s
  • OpenCL Language Runtime P2P API    Peer-to-Peer API  with Autocopy support over Intel QPI bus
  • API name -  clEnqueueBufferCopyP2PAMD
  • Releasing in OpenCL with ROCm 1.6.2
  • HIP based Communication Primitives Helper Library to make it easier to use P2P - In Development
  • ROCr level IPC     Inter Process Communication     API
  • IPC is Supported in HIP API

Out of Node

  • Remote DMA technology  ( RDMA)     Peer-to-Peer bridge driver for PeerDirect
  • libibverbs    Linux RDMA library    YES -since ROCm 1.0
  • PeerDirect    Mellanox Peer API for Infiniband

https://rocm.github.io/ROCmMultiGPU.html

 

Quote

The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capbilities needed for queuing and signaling memory operations.

 

The new PCIe AtomicOps operate as completers for CAS(Compare and Swap), FetchADD, SWAP atomics. The AtomicsOps are initiated by the I/O device which support 32-, 64- and 128-bit operand which target address have to be naturally aligned to operation sizes.

 

Currently ROCm use this capability as following:

  • Update HSA queue’s read_dispatch_id: 64bit atomic add used by the command processor on the GPU agent to update the packet ID it processed.
  • Update HSA queue’s write_dispatch_id: 64bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
  • Update HSA Signals – 64bit atomic ops are used for CPU & GPU synchronization.

https://rocm.github.io/ROCmPCIeFeatures.html

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, mr moose said:

Their latest two releases have been a bit hap hazard and the whole x series thing with so many bits missing/seeming irrelevant and bits you have to pay extra for suggest they weren't ready to launch but had to anyway.  This would indicate they were not sitting on products as they nearly got caught with their pants down.

 

 

Sadly that is just a myth the internet likes to perpetuate, any company that sits idle ceases to be a company.  There is to a certain degree the whole sandbagging thing, but it is nowhere near as major as people seem to make out.     Money is always the end goal, and thus in order to maintain a revenue steam they have to convince consumers (both domestic and commercial) to upgrade. Which means they have to offer something better than their last.  I know this is a very raw summation of corporate economics but it explains why the it is a myth that companies without competition don't innovate.

Yeah Remind me again why monopoly's are Illegal. Oh year its the fact that companies have no Competition. I Agree your probably right about them needing to make there Chips better and better overall to keep having a reason for customers to actually purchase there new stuff. But It can and dose affect Pricing & at times affect progress. Probably no where near to the extent people on this forum believe though that they got some superchips there just sitting on to release. 

 

Prime example why Competition and monopoly's shouldn't exist is This AMD comes out with Chips Intel Responds by offering better chips at lower prices then before. Weird after ryzen launches suddenly the I7 can have a 6 core processor in its line up

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, leadeater said:

Gen-z will be what x86 can use, HPE have a working system using it but it's very early days and ARM. AMD is a pretty big player and a founding member in the consortium.

 

When we had the briefing meeting with HPE for the up coming Gen 10 servers about 3-4 months ago they were talking up Gen-z a lot, in the Proliant Gen 10 context for both Intel and AMD. I assume I'm allowed to say that, Gen 10 has been announced but very little about Gen-z but they could have been full of it so time will tell.

 

Radeon Instinct has a lot of the features you talked about in a few posts, what I don't know is how equivalent they are. Nvidia has the maturity advantage and AMD has the cross ecosystem advantage (when they get there).

http://gpgpu10.athoura.com/ROCM_GPGPU_Keynote.pdf

I stand corrected. Based on the Radeon Instinct article on ROCm (https://instinct.radeon.com/en/the-potential-disruptiveness-of-amds-open-source-deep-learning-strategy/) it sounded like they had a shared memory address space, but not actual shared memory and that their HSA intermediate layer handled the memory copies. My apologies for spreading misinformation.

 

GenZ is going to be interesting. Although I don't know too much about it beyond "Epic storage class memory! Super speed storage! Petabytes of directly addressable memory! Storage class features with memory class latency!".

 

And just to be clear, I'm pretty sure that Nvidia's nvlink solution with Power CPUs isn't actual HSA, it's just HSAesque. Afaik they're not even a member and seem to just be doing their own thing for now.

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, System Error Message said:

However currently AMD has had the capability for this before nvidia but no one utilised them. To quite from a scottish youtuber "nvidia has mindshare", they make you believe or think that their hardware is superior when it is not. AMD has had the capability on the consumer level GPU for a while but it was never utilised, they do seem to be too far ahead but never mention it when infact they could've teased down nvidia when the AI fad became popular that their cards even on consumer level was already good at it before nvidia came out with pascal and didnt restrict int8.

The funny thing is NVIDIA focused on FP16 before this was even a thing and asked developers to write FP16 shaders. But they all said "screw it" and used FP24. And suddenly FP16 is now some sort of box to check off on a list of whizzbang features that nobody will probably care about other than people looking for ammo in their flame wars.

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, M.Yurizaki said:

The funny thing is NVIDIA focused on FP16 before this was even a thing and asked developers to write FP16 shaders. But they all said "screw it" and used FP24. And suddenly FP16 is now some sort of box to check off on a list of whizzbang features that nobody will probably care about other than people looking for ammo in their flame wars.

FP16 does come in handy for scientific modeling and certain professional workloads (ex. preview rendering). Gets listed as a selling point because while not important at all to normal consumers/gamers, it is an important selling point for certain parts of the professional and enterprise market.

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, Sniperfox47 said:

FP16 does come in handy for scientific modeling and certain professional workloads (ex. preview rendering). Gets listed as a selling point because while not important at all to normal consumers/gamers, it is an important selling point for certain parts of the professional and enterprise market.

I meant that towards the gaming sector. I see NVIDIA's manufacturing strategy making sense because why give the consumer version, which is/was mostly used for gaming, features that game developers don't use or care about? AMD's "one-size-fits-all" approach is noble, but ultimately doesn't let them be flexible.

 

If you really care about those things because you actually use them, more power to you. But then again I'm not sure if you should be using a consumer card for anything critical.

Edited by M.Yurizaki
Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, michaelocarroll007 said:

Yeah Remind me again why monopoly's are Illegal. Oh year its the fact that companies have no Competition. I Agree your probably right about them needing to make there Chips better and better overall to keep having a reason for customers to actually purchase there new stuff. But It can and dose affect Pricing & at times affect progress. Probably no where near to the extent people on this forum believe though that they got some superchips there just sitting on to release. 

 

Prime example why Competition and monopoly's shouldn't exist is This AMD comes out with Chips Intel Responds by offering better chips at lower prices then before. Weird after ryzen launches suddenly the I7 can have a 6 core processor in its line up

I never said companies were above reproach, I simply said they don't lay idle just becasue they have no competition. .

 

That said,  the problem with statements like the one I quoted originally, is that because there is an easily perceived element of truth, people believe them without question.  The next thing you know we can't have a discussion about a product on these forums because people are salty about said myth and can't be rational with new information. 

 

 

 

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Sniperfox47 said:

I stand corrected. Based on the Radeon Instinct article on ROCm (https://instinct.radeon.com/en/the-potential-disruptiveness-of-amds-open-source-deep-learning-strategy/) it sounded like they had a shared memory address space, but not actual shared memory and that their HSA intermediate layer handled the memory copies. My apologies for spreading misinformation.

 

GenZ is going to be interesting. Although I don't know too much about it beyond "Epic storage class memory! Super speed storage! Petabytes of directly addressable memory! Storage class features with memory class latency!".

 

And just to be clear, I'm pretty sure that Nvidia's nvlink solution with Power CPUs isn't actual HSA, it's just HSAesque. Afaik they're not even a member and seem to just be doing their own thing for now.

All of this is very new, I don't even know how well it actually works. Not surprising that this isn't well known. AMD is still very much fighting for legitimacy in the datacenter, most won't even consider them as an option so don't even know about these developments or refuse to look at them. Sticking with tried and tested is not a bad thing though, specially in the datacenter in tens of millions of dollar projects where you don't want to put everything on the line for something that might work.

 

Radeon Instinct isn't going to change this though however EPYC will, even though that is unrelated it will have a much greater effect of getting AMD's name back in to the minds of system architects and systems engineers who will then start looking at other options from AMD.

 

Edit:

Also wow this got rather off topic lol, pff car tech xD

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, M.Yurizaki said:

AMD's "one-size-fits-all" approach is noble, but ultimately doesn't let them be flexible.

Probably less noble and more of a necessity due to budget constraints. If AMD had the resources to do so I'd fully expect them to customize architectures and designs for datacenter/professional and gaming which would actually be better for us gamers. Part of the reason why AMD GPUs are not able to turn their full power in to gaming performance is due to not being able to tweak the architecture to improve that while not hurting other use cases.

 

I would say it would have been better for AMD to dropped trying to do professional and server GPUs and just focused on the consumer market but I bet there was a lot of fear and resistance to that within the company from the last time they tried to do that, when AMD dropped out of the server market by choice to do ARM.

Link to comment
Share on other sites

Link to post
Share on other sites

21 hours ago, michaelocarroll007 said:

Weird after ryzen launches suddenly the I7 can have a 6 core processor in its line up

To be fair, it would have happened regardless.  CL has been on Intel's roadmap for a while.  Having said that, I'm quite certain it wouldn't have happened so soon.  I suspect they were planning a 2018/2019 launch, rather than a late 2017 one.  Hence the extremely short release cycle for KL and the 200 series boards.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×