Jump to content

NVIDIA Prepping High-Performance Volta GPU Lineup

VagabondWraith
11 hours ago, marldorthegreat said:

It wont be soon at all. It makes no sense at all. They would undercut the 1080 and 1070. It will be 2H 2017. 

It could be early 2017 just in time for when Vega is anticipated to come. Since I doubt those rumors about Vega being this year are true, especially with the shortages that they've been having.

a Moo Floof connoisseur and curator.

:x@handymanshandle x @pinksnowbirdie || Jake x Brendan :x
Youtube Audio Normalization
 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, marldorthegreat said:

-snip-

What people fail to see is that AMD has managed to achieve an architecture that is (or almost is) as efficient as Maxwell, while retaining full compute capabilities. If Nvidia wants ACEs in the Volta GPUs, then you can expect to see a minor difference in power efficiency between Vega and Volta. In fact I can see AMD edging this out next round.

        Pixelbook Go i5 Pixel 4 XL 

  

                                     

 

 

                                                                           

                                                                              

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, marldorthegreat said:

It still doesnt explain the fact that amd is still behind nvidia even AFTER a process jump. 

Nvidia can't produce cards enough, so they're worthless.

Besides nvidia's efficiency is based on a rsterization which is causing image degradation in some reproducible cases, so they're not at all behind nvidia since they just didn't make that compromise 

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, dalekphalm said:

I was gonna say, from a business perspective, they would be fucking idiots to release Volta so soon, when half of Pascal is still in short supply. The only possible reason they would rush Volta out earlier, is if they're scared shitless of Vega, and think it will utterly destroy their lineup.

 

(Because we've seen in the past that even if the two companies are neck and neck technology and performance wise, NVIDIA has nothing to worry about. People still buy NVIDIA in droves, even when AMD offers the superior alternative at a specific price segment)

That depends on their intention. If the intention is to drown AMD completely and end this dog and pony show, it's not dumb at all.

 

Nvidia really has no reason to be afraid of Vega as disappointing as Polaris was.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Citadelen said:

What people fail to see is that AMD has managed to achieve an architecture that is (or almost is) as efficient as Maxwell, while retaining full compute capabilities. If Nvidia wants ACEs in the Volta GPUs, then you can expect to see a minor difference in power efficiency between Vega and Volta. In fact I can see AMD edging this out next round.

Not even close. The 480 doesn't even match the 970 in efficiency. It's blown completely out of the water.

 

35 minutes ago, laminutederire said:

Nvidia can't produce cards enough, so they're worthless.

Besides nvidia's efficiency is based on a rsterization which is causing image degradation in some reproducible cases, so they're not at all behind nvidia since they just didn't make that compromise 

That's fixable with drivers or working with the game devs, so I wouldn't call that a permanent problem. You also forget Nvidia has SMP which has completely flipped the table on AMD's VR prospects.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, patrickjp93 said:

Not even close. The 480 doesn't even match the 970 in efficiency. It's blown completely out of the water.

 

That's fixable with drivers or working with the game devs, so I wouldn't call that a permanent problem. You also forget Nvidia has SMP which has completely flipped the table on AMD's VR prospects.

It's not fixable as easily as that. They gain a lot of fps by doing that way. They can fix it but they'll lose the performance advantage they had.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, laminutederire said:

It's not fixable as easily as that. They gain a lot of fps by doing that way. They can fix it but they'll lose the performance advantage they had.

Says who? It could be as simple as changing the way the data enters the rasterization engine.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, patrickjp93 said:

Says who? It could be as simple as changing the way the data enters the rasterization engine.

Says them. Ton have tile rasterization they put datas in smaller caches which improve framerate by allowing faster access to datas, but they cause graphical artifacts when the gpu has to replace datas in those caches. To prevent that they need either higher bandwidth Vram to make those changes non noticeable, or they need to switch back to access VRAM which would be slower and therefore impacting framerate. 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, laminutederire said:

Says them. Ton have tile rasterization they put datas in smaller caches which improve framerate by allowing faster access to datas, but they cause graphical artifacts when the gpu has to replace datas in those caches. To prevent that they need either higher bandwidth Vram to make those changes non noticeable, or they need to switch back to access VRAM which would be slower and therefore impacting framerate. 

Actually they need lower latency VRAM (cache optimization is kinda my thing), but that's not the problem. The problem is the reconciliation when reusing a tile is not quite correct, and the reconciliation algorithm isn't perfect yet.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, patrickjp93 said:

Actually they need lower latency VRAM (cache optimization is kinda my thing), but that's not the problem. The problem is the reconciliation when reusing a tile is not quite correct, and the reconciliation algorithm isn't perfect yet.

I'm not sure it's that. Because people have noticed low resolution textures really often when changing areas. It may be both then.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, laminutederire said:

I'm not sure it's that. Because people have noticed low resolution textures really often when changing areas. It may be both then.

The problem isn't being able to pull enough data (seriously, at 256GB/s+, it would be horrifying if that was still the problem). It's being able to retrieve it. GDDR5's latency (and HBM's) is about 2X as high as fairly loose DDR3. Instead of 12nanosecond access time and 100 nanosecond transfer time, you're looking at double that.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, patrickjp93 said:

The problem isn't being able to pull enough data (seriously, at 256GB/s+, it would be horrifying if that was still the problem). It's being able to retrieve it. GDDR5's latency (and HBM's) is about 2X as high as fairly loose DDR3. Instead of 12nanosecond access time and 100 nanosecond transfer time, you're looking at double that.

Maybe your interpretation is the right one, but in both mine and yours, software can't make up for hardware unless if they don't do tiled rasterization like that. they'd have to switch to global ways of rendering the frame, and thus affecting performances. That being said, it isn't probably the sole difference in terms of performance/ watt ratio, but it gives a significant edge anyway on the performance side without changing consumption significantly.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, patrickjp93 said:

-snip-

From what I've seen the 480 is slightly more power hungry than the 970 while being as fast as it. If I'm wrong I'd like some evidence of it. I'm not just going to take your word.

        Pixelbook Go i5 Pixel 4 XL 

  

                                     

 

 

                                                                           

                                                                              

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

No title tags allowed. I have removed it.

After investigating...URL was change to Motley Fool URL directly.

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, VagabondWraith said:

I am curious to see if they manage to implement Asynchronous Compute into Volta considering its a brand new architecture

The problems with Async Compute are 2-fold.

 

The first problem is Definition. By definition, Async Compute is a means of utilizing gaps in GPU workload, thus increasing efficiency. Some people still believe its some kind of magic, or that huge performance gains can be had by using it.

 

The second problem is the Method of achieving Async Compute. AMD have brainwashed a number of people into believing that there is only one way to skin a cat, and that Any vendor not using AMD's method (interleaving and parallel) to accomplish Async Compute are doing it wrong.

 

Unlike Pascal/Volta, GCN hardware has been designed from top to bottom for the consoles and console API's. AMD do not have the resources to create a different architecture for PC gaming, so the Vulkan and DX12 API's are crucial to bringing console efficiency to PC for GCN. The Console/DX12/Vulkan API's send multiple Queues of instructions to the GPU, which in turn will be handled concurrently if the hardware supports concurrency, or one at a time for architectures like Kepler/Fermi/etc.

 

The giant purple elephant in the room is efficiency. GCN already has a huge efficiency deficit compared to Pascal, and even Maxwell or Kepler if you consider process nodes. I firmly believe that if AMD had not landed the contracts for the gaming consoles, AMD GPU's would more closely resemble the efficiency and characteristics of Nvidia GPU's. DX12 and Vulkan are still good leaps forward in technology because of multiple Queues, but the asynchronous compute element is a giant red herring. AMD need it to avoid ugly context switching, but Nvidia simply don't need it.

 

So the big question is: Why on earth would nvidia alter their hardware in any way that would decrease efficiency, in order to then rely upon the new API's to bring that efficiency back up to what they already have?  What you will likely see instead is more hardware tricks that are used in blackbox code (i.e. Vulkan Doom using AMD's proprietary Shader Functions or Frame flip optimization code, or Games that use Nvidia gameworks code).

R9 3900XT | Tomahawk B550 | Ventus OC RTX 3090 | Photon 1050W | 32GB DDR4 | TUF GT501 Case | Vizio 4K 50'' HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, Briggsy said:

The problems with Async Compute are 2-fold.

 

The first problem is Definition. By definition, Async Compute is a means of utilizing gaps in GPU workload, thus increasing efficiency.

 

The second problem is the Method of achieving Async Compute. AMD have brainwashed a number of people into believing that there is only one way to skin a cat, and that Any vendor not using AMD's method (interleaving and parallel) to accomplish Async Compute are doing it wrong.

 

Unlike Pascal/Volta, GCN hardware has been designed from top to bottom for the consoles and console API's. AMD do not have the resources to create a different architecture for PC gaming, so the Vulkan and DX12 API's are crucial to bringing console efficiency to PC for GCN. The Console/DX12/Vulkan API's send multiple Queues of instructions to the GPU, which in turn will be handled concurrently if the hardware supports concurrency, or one at a time for architectures like Kepler/Fermi/etc.

 

The giant purple elephant in the room is efficiency. GCN already has a huge efficiency deficit compared to Pascal, and even Maxwell or Kepler if you consider process nodes. I firmly believe that if AMD had not landed the contracts for the gaming consoles, AMD GPU's would more closely resemble the efficiency and characteristics of Nvidia GPU's. DX12 and Vulkan are still good leaps forward in technology because of multiple Queues, but the asynchronous compute element is a giant red herring. AMD need it to avoid ugly context switching, but Nvidia simply don't need it.

 

So the big question is: Why on earth would nvidia alter their hardware in any way that would decrease efficiency, in order to then rely upon the new API's to bring that efficiency back up to what they already have?  What you will likely see instead is more hardware tricks that are used in blackbox code (i.e. Vulkan Doom using AMD's proprietary Shader Functions or Frame flip optimization code, or Games that use Nvidia gameworks code).

You have some very interesting points here.

 

However, I've highlighted one thing. You say that GCN has been designed for consoles from "top to bottom". I highly question this statement, because of a few simple facts.

 

Fact #1: GCN was first introduced in 2011

Fact #2: The first modern Console APU manufactured by AMD was released in 2013 - two years after GCN was introduced

Fact #3: It takes YEARS to develop a GPU architecture

 

So by these three facts, I highly doubt that AMD planned GCN "for the consoles and console API's". Why? Because when AMD started developing GCN (Probably 2008-ish? This is entirely guesswork), the XBOX One was likely just some vague project on a whiteboard at Microsoft's XBOX Division headquarters. Certainly not far enough along to have been planning the exact hardware and commissioning AMD to create a whole new architecture just for consoles.

 

My point being is that, I think it's entirely coincidence that GCN happened to be very well suited for consoles. Once AMD got the console contracts, of course they would continue to develop GCN with the console architecture in mind, but that in no way makes them "designed from top to bottom". Current and ongoing developments, maybe, have consoles in mind first, but definitely not from the beginning, and definitely not from top to bottom.

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, laminutederire said:

Maybe your interpretation is the right one, but in both mine and yours, software can't make up for hardware unless if they don't do tiled rasterization like that. they'd have to switch to global ways of rendering the frame, and thus affecting performances. That being said, it isn't probably the sole difference in terms of performance/ watt ratio, but it gives a significant edge anyway on the performance side without changing consumption significantly.

Eh, yes it can. In high performance computing you always consider the hardware you have as the platform, and you marry your code to it. Apart from some cache-free algorithms (mostly divide and conquer) that recurse until the problem size fits nicely in cache, you always have to be paying attention to the cache. If your algorithm doesn't respect it, you lose performance.

 

One way I would handle this, for instance, is to do ray tracing just from available light sources to a given tile, do the light map, move to the next tile, and if there are reflective surfaces, you can go back and give a second pass on the previous tile AND leave a flag up for future tiles to then also consider the reflective portions of the second tile in their lighting phases. It's a self-correcting algorithm that minimizes performance hits up front while ensuring accuracy down the line.

 

Mind you, that's a very general explanation, but you get the idea.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Citadelen said:

From what I've seen the 480 is slightly more power hungry than the 970 while being as fast as it. If I'm wrong I'd like some evidence of it. I'm not just going to take your word.

https://www.techpowerup.com/reviews/AMD/RX_480/25.html

 

I wouldn't call 20% a slight difference.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, patrickjp93 said:

Eh, yes it can. In high performance computing you always consider the hardware you have as the platform, and you marry your code to it. Apart from some cache-free algorithms (mostly divide and conquer) that recurse until the problem size fits nicely in cache, you always have to be paying attention to the cache. If your algorithm doesn't respect it, you lose performance.

 

One way I would handle this, for instance, is to do ray tracing just from available light sources to a given tile, do the light map, move to the next tile, and if there are reflective surfaces, you can go back and give a second pass on the previous tile AND leave a flag up for future tiles to then also consider the reflective portions of the second tile in their lighting phases. It's a self-correcting algorithm that minimizes performance hits up front while ensuring accuracy down the line.

 

Mind you, that's a very general explanation, but you get the idea.

It minimizes perfomance hit but it doesn't make it null in all cases either, that's the point I'm trying to make.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, laminutederire said:

It minimizes perfomance hit but it doesn't make it null in all cases either, that's the point I'm trying to make.

In a 1920x1080 resolution where reflections show up in the last tile and the entire tile list must be traversed, it looks like roughly a 7% hit. It's negligible.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, patrickjp93 said:

In a 1920x1080 resolution where reflections show up in the last tile and the entire tile list must be traversed, it looks like roughly a 7% hit. It's negligible.

Well 7% is about the difference in performance between a 1060 and a 480, most people say it isn't negligible. 

Point I'm trying to make with that example is that in the grand scheme of things,  it is. But when compared to something roughly the same, it isn't that negligible since it can hinder the performance gap.

Mind you, 7% isn't going to change the performance /watt ratio from 1 to 0.1, but you can expect a change of roughly 5% at least, to even more like 15% or more if doing this augment the power consumption at the same time it hinders performance. When you look at techpowerup review of the rx 480, that impact of perf/watt ratio puts maxwell at around fury level, what maxwell was against in high tiers. (That makes the rx 480 still a bit dissapointing though)

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, dalekphalm said:

You have some very interesting points here.

 

However, I've highlighted one thing. You say that GCN has been designed for consoles from "top to bottom". I highly question this statement, because of a few simple facts.

 

Fact #1: GCN was first introduced in 2011

Fact #2: The first modern Console APU manufactured by AMD was released in 2013 - two years after GCN was introduced

Fact #3: It takes YEARS to develop a GPU architecture

 

So by these three facts, I highly doubt that AMD planned GCN "for the consoles and console API's". Why? Because when AMD started developing GCN (Probably 2008-ish? This is entirely guesswork), the XBOX One was likely just some vague project on a whiteboard at Microsoft's XBOX Division headquarters. Certainly not far enough along to have been planning the exact hardware and commissioning AMD to create a whole new architecture just for consoles.

 

My point being is that, I think it's entirely coincidence that GCN happened to be very well suited for consoles. Once AMD got the console contracts, of course they would continue to develop GCN with the console architecture in mind, but that in no way makes them "designed from top to bottom". Current and ongoing developments, maybe, have consoles in mind first, but definitely not from the beginning, and definitely not from top to bottom.

If you read into the history of the PS4's design, it started in early 2009 where they worked with their best game designers to figure out what they wanted/needed to make the development learning curve as easy as possible (unlike the ps3), keep hardware costs down and hammer out the specs. They knew even then they wanted 8 cpu cores, async compute to get the gpu performing other tasks like audio and compute, etc... The original design called for DDR3 memory and a small SRam cache to speed up graphical performance, and also 2 async compute engines with (I believe) a 7790 sized GPU (not specifically because there was no such GPU yet, just for comparison). The design for the PS4 APU was later changed to have 8 async compute engines, 7870 sized GPU (again, for comparison), GDDR5 and to remove the SRam cache. Microsoft stuck with the original design because it was cheaper to build. AMD worked with Sony and Microsoft to create the custom silicon, and rumors were flying in 2009/2010 about AMD designing Fusion 2.0 for the "Xbox 720."

 

When AMD launched GCN in 2012, they were way ahead of Nvidia for 28nm, up to half a year. The HD 7000 series had 2 async compute engines, and the architecture was capable of handling multiple queues and interleave them in parallel. No API existed on PC to utilize that hardware, and wouldn't exist until AMD released Mantle, which matches the features of console API's quite closely. Microsoft created DX11.3 to do most of what DX12 now does, and Sony already had their API(s) ready or in development since 2010 for the PS4.

 

I'm not necessarily pulling an assumption out of my ass about what GCN was originally designed for. The work they did with Sony from 2009 onward (and what sony was looking for in a GPU for their PS4) reflects in modern AMD discrete GPU's.

R9 3900XT | Tomahawk B550 | Ventus OC RTX 3090 | Photon 1050W | 32GB DDR4 | TUF GT501 Case | Vizio 4K 50'' HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, patrickjp93 said:

-snip-

Either I'm useless at reading charts, or the biggest difference shown was the 970 being 5% less efficient than the 480. Am I missing something here?

        Pixelbook Go i5 Pixel 4 XL 

  

                                     

 

 

                                                                           

                                                                              

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, Briggsy said:

If you read into the history of the PS4's design, it started in early 2009 where they worked with their best game designers to figure out what they wanted/needed to make the development learning curve as easy as possible (unlike the ps3), keep hardware costs down and hammer out the specs. They knew even then they wanted 8 cpu cores, async compute to get the gpu performing other tasks like audio and compute, etc... The original design called for DDR3 memory and a small SRam cache to speed up graphical performance, and also 2 async compute engines with (I believe) a 7790 sized GPU (not specifically because there was no such GPU yet, just for comparison). The design for the PS4 APU was later changed to have 8 async compute engines, 7870 sized GPU (again, for comparison), GDDR5 and to remove the SRam cache. Microsoft stuck with the original design because it was cheaper to build. AMD worked with Sony and Microsoft to create the custom silicon, and rumors were flying in 2009/2010 about AMD designing Fusion 2.0 for the "Xbox 720."

 

When AMD launched GCN in 2012, they were way ahead of Nvidia for 28nm, up to half a year. The HD 7000 series had 2 async compute engines, and the architecture was capable of handling multiple queues and interleave them in parallel. No API existed on PC to utilize that hardware, and wouldn't exist until AMD released Mantle, which matches the features of console API's quite closely. Microsoft created DX11.3 to do most of what DX12 now does, and Sony already had their API(s) ready or in development since 2010 for the PS4.

 

I'm not necessarily pulling an assumption out of my ass about what GCN was originally designed for. The work they did with Sony from 2009 onward (and what sony was looking for in a GPU for their PS4) reflects in modern AMD discrete GPU's.

I'll agree that mid-development, AMD likely saw opportunities to adjust and change the course of GCN development, but even if they did start working with Sony in 2009 to help plan the PS4 SoC, that would most likely have been after AMD had started designing GCN. I'm more inclined to believe that AMD altered the APU designs specifically, to be more console optimized. And then they rolled those optimizations into further iterations of GCN.

 

In either case, I think there was a lot more influence to GCN design then just the consoles - even if they might have had a significant impact in the direction and evolution of the design.

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, That Norwegian Guy said:

The past 2 generations?

 

Launch: Fury X beaten by 980 Ti.

6 months after launch and beyond: Fury X beats 980 Ti.

 

Launch: 290x beaten by 780 Ti.

6 months after launch and beyond: 290x destroys 780 Ti. (especially in price/ perf)

 

Nvidia just cook their releases more, having used up more of the potential in the drivers than AMD has on every fresh SKU. This is how they take advantage of their cult, which seemingly lack the mental faculties to catch on to this strategy.

But that is the problem. if you buy a $650 gpu you want it to work when you buy it, not 6 months later. By that point the gpu has been branded as less powerful, and people will use out of date benchmarks. 

Hello This is my "signature". DO YOU LIKE BORIS????? http://strawpoll.me/4669614

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×