[Updated] Oxide responds to AotS Conspiracies, Maxwell Has No Native Support For DX12 Asynchronous Compute

Sammael · September 1, 2015

A useful overview of the current state of the state of knowledge.

http://thegametechnician.com/2015/08/31/analysis-amds-long-game-realization/

Ashaira · September 1, 2015

I haven't ignored him. I'm a senior and master's student in college, I have 20 hours of classes, homework, research to do, time and events to plan, exercise to do to get in better shape for a new suit before career fair. I am busy! I'm not going to go trawling through the net for 4K benches from months ago when I only get batches of 10 minutes for breaks. Sheesh...

so you are to busy to uphold your argument. I'm sorry but if you can't be bothered to provide proof i can't be bothered to believe anything you say.

have a nice degree.

Prysin · September 1, 2015

this needs further clarification from an nVidia source because Maxwell can do async compute

there is no question "if Maxwell can do async compute" - they do it already in TESLA accelerators

They can do async compute... but you are basically watching the difference in compute lanes...

in essence.... Maxwell vs GCN is like watching a Core i7 4790k take on a 32 thread Xeon in Cinebench... sure the i7 is fast... but that Xeon will win with so much margin it could almost run the test twice before the i7 finishes....

Prysin · September 1, 2015

if it's "optional", then something smells fishy because Oxyde claimed nVidia asked them to disable it

but, if it's "optional" why Oxyde hasn't disabled it?!

Maxwell has 31 async compute threads. AMD has 64...

So at 31 vs 31 threads, Nvidia will be much faster. Once you hit over 64 threads being used simultaneously, you are looking at Nvidia getting rekt, because they can only deal with chunks of 31 operations/cycle. While AMD does 64 operations/cycle....

zMeul · September 1, 2015

Maxwell has 31 async compute threads. AMD has 64...

So at 31 vs 31 threads, Nvidia will be much faster. Once you hit over 64 threads being used simultaneously, you are looking at Nvidia getting rekt, because they can only deal with chunks of 31 operations/cycle. While AMD does 64 operations/cycle....

that's beyond the scope since they're different architectures

the "issue" at hand is that the internet declared Maxwell doesn't have async compute

it was natural that running GCN optimized code on Maxwell will return broken results - it's the same as running Intel optimized code on AMD's CPUs

Prysin · September 1, 2015

that's beyond the scope since they're different architectures

the "issue" at hand is that the internet declared Maxwell doesn't have async compute

it was natural that running GCN optimized code on Maxwell will return broken results - it's the same as running Intel optimized code on AMD's CPUs

honestly, it is probably NOT about the code, more about the workload...

you can use Async Compute for AI... how many AI isnt there in those screenshots we see? there is shitloads, all over the place....

Why does this matter?

thread(s) vs workload....

I strongly believe that we are looking at a Maxwell HARDWARE limit (31 threads) rather then a "optimized AMD code"....

In all honesty. AMD doesnt have the funding to pay off devs... their Q2 results took a dump, and they are far from releasing anything new, so in a marketing sense it doesnt make sense to start the flame war/hype train now.. it is too early. Nvidia has plenty of time to "recover" from a marketing attack by launchign their own PR campaign. So consider the whole "AMD paid someone off" or "OXIDE did this for AMD" as a fallacy.

Apart from those who are jumping ship, AMD arent getting any long term benefits from this short of customers holding onto their GCN cards longer (thus both Nvidia and AMD sells less GPUs... so neither wins in this case, actually AMD loses in this case as they still gotta make new drivers for old cards)...

What we can assume is that Oxide tried to unload many threads, perhaps not very complex tasks, but many of them, onto the GPU using AC. Depending on how many simultaneous tasks there is, Nvidia would hit their 31 thread limitation before AMD hits their 64 thread limitation. And while Nvidia has a much quicker GPU, eating through their 31 "jobs" faster. AMD simply beats them at "quantitity"...

Even if Nvidia is twice as fast... 31x2 = 62....

that means even with TWICE the speed, they are still 2 threads behind in every, damn scenario there is.... you cannot catch up to that... you simply cannot.

More over, if there IS indeed complex mathematics involved, AMDs cards should deal with it better as the GPUs themselves are more mathematically tuned then consumer Nvidia cards (Teslas cannot be brought up... you are comparing a 650 USD AMD card to a 3000+ USD Nvidia Workstation card.... so Tesla and quadro is completely useless for comparison here).

This is by all honesty, almost like watching a FX vs i3 in rendering.... i3 is faster, nearly 55% higher IPC infact at stock vs stock... yet a FX would wreck a i3 in multi thread enviroments by a factor of 2 despite being extremely much slower in normal performance...

PlayStation 2 · September 1, 2015

And this is what happens when you have fucking fanboys come and pillage a thread. From both sides, and that makes it WORSE.

Prysin · September 1, 2015

And this is what happens when you have fucking fanboys come and pillage a thread. From both sides, and that makes it WORSE.

i guess you should pack up and leave then. So you do not make it any worse.

zMeul · September 1, 2015

snip

and you still trying to make it be about something else but the issue at hand - the issue at hand being that Maxwell doesn't have async compute, or does it

as for your i3 vs Xeon comparison, no .. it's not the same

Prysin · September 1, 2015

and you still trying to make it be about something else but the issue at hand - the issue at hand being that Maxwell doesn't have async compute

it does. By all definition it DOES.

But it does a poor job at it.....

By all technical and philosophical definitions, Maxwell DO have async compute. But the internal scheduler is not up to the task of doing it efficiently. IN addition to a low thread count limiting the performance.

I am by no means trying to defend Nvidias shitty behaviour. If anything, i am a AMD fanboy by far, and i really do hope my R9 295x2 can benefit from DX12.

HOWEVER.

Justice should be served where justice is needed.

Maxwell has Async Compute. It is shit at it, it wasnt built to do it efficiently. It was built to HAVE it (by definition) so it would meet the 12.0 spec....

Without Async Compute (of some sort). Maxwell would NEVER EVER been given classification as "D3D12.0 Compliant"

This is a fact. If Nvidia passed MS validation for DX12, then it HAS IT..... it just SUCKS AT IT.

If anything, slapping on a half-broken feature just so you by definition could get certified is a more half-arsed and destructive solution then simply NOT adding it at all.

Notional · September 1, 2015

that's beyond the scope since they're different architectures

the "issue" at hand is that the internet declared Maxwell doesn't have async compute

it was natural that running GCN optimized code on Maxwell will return broken results - it's the same as running Intel optimized code on AMD's CPUs

It doesn't Maxwell cannot do async compute and graphics stack at the same time, so there is no parallel compute. The tests you yourself have linked earlier (below), shows this, as Maxwell is graphics + compute rendertime in milliseconds, and GCN is only Compute milliseconds, meaning the latter is using actual async compute.

might also want to look into this: https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

source: https://forum.beyond3d.com/posts/1869058/

This is not a benchmark btw, so no lower is not better. What it shows is that Maxwell can only handle 31 compute threads, so each time it gets the extra thread to pass 31, it doubles in latency, as it needs to wait for the former 31 threads to finish. In AMD you have 64 compute threads, that all works in parallel, so it can handle all the threads instantly (The test used 128 threads I believe).

The test is fundamentally pointless though, as no game would ever be structured in such a way, but it does show the limitations of 31 compute units on Maxwell. The test is still being discussed in it's design and functionality with the dev on the forum.

MageTank · September 1, 2015

Zah · September 1, 2015

If anything, slapping on a half-broken feature just so you by definition could get certified is a more half-arsed and destructive solution then simply NOT adding it at all.

But they didn't "Slap it on", async shaders have been a part of nvidia hardware since fermi. And B3D has already been showing that the async compute works fine, and we need more information.

For you to exclaim immediately "It's shit" off of one benchmark, means you aren't thinking scientifically.

Prysin · September 1, 2015

But they didn't "Slap it on", async shaders have been a part of nvidia hardware since fermi. And B3D has already been showing that the async compute works fine, and we need more information.

For you to exclaim immediately "It's shit" off of one benchmark, means you aren't thinking scientifically.

if preliminairy tests are showing clear issues, then digging deeper will only worsen the deal.

The fact that preliminary tests are struggling to make graphic + compute work at the same time in Maxwell, asynchronusly, means there is a hardware limitation. Most likely in the scheduler.

So fermi had it? SO WHAT?

Outside of supercomputers, and lab tests, i cannot think of ANY CONSUMER GRADE SYSTEM THAT HAS BEEN ACTIVELY USING ASYNC COMPUTE FOR CONSUMER NEEDS BEFORE NOW.

So in short, while they may have had it. they never had the time, chance or need to test it....

Sparviero · September 1, 2015

Wat?!? Star control? Syreens and huNams?! I need a paper map to be utilized for copyright protection for the memories and great justice!

Zah · September 1, 2015

if preliminairy tests are showing clear issues, then digging deeper will only worsen the deal.

The fact that preliminary tests are struggling to make graphic + compute work at the same time in Maxwell, asynchronusly, means there is a hardware limitation. Most likely in the scheduler.

So fermi had it? SO WHAT?

Outside of supercomputers, and lab tests, i cannot think of ANY CONSUMER GRADE SYSTEM THAT HAS BEEN ACTIVELY USING ASYNC COMPUTE FOR CONSUMER NEEDS BEFORE NOW.

So in short, while they may have had it. they never had the time, chance or need to test it....

Preliminary TEST. TEST. And while I cannot think of why it async shaders would be used until now, I have proven your point of "Slapping it on" to be invalid, so I wish to remain on that topic instead of changing to a strawman.

Briggsy · September 1, 2015

It doesn't Maxwell cannot do async compute and graphics stack at the same time, so there is no parallel compute. The tests you yourself have linked earlier (below), shows this, as Maxwell is graphics + compute rendertime in milliseconds, and GCN is only Compute milliseconds, meaning the latter is using actual async compute.

This is not a benchmark btw, so no lower is not better. What it shows is that Maxwell can only handle 31 compute threads, so each time it gets the extra thread to pass 31, it doubles in latency, as it needs to wait for the former 31 threads to finish. In AMD you have 64 compute threads, that all works in parallel, so it can handle all the threads instantly (The test used 128 threads I believe).

The test is fundamentally pointless though, as no game would ever be structured in such a way, but it does show the limitations of 31 compute units on Maxwell. The test is still being discussed in it's design and functionality with the dev on the forum.

Given that AMD have 8 compute engines running in parallel with a separate graphic pipeline, that test is a poster child for GCN architecture and how it functions.

From my understanding, Maxwell doesn't do 31 compute threads in parallel, but rather it can queue up to 31 compute operations in the asynchronous compute warp buffer, and execute any one of them out of order, but it cannot have both compute and graphic operations in the pipeline at a single time so it has to bounce back and forth. In this test, the compute and graphic tasks appear to be dished out in a predictable manner, so the Nvidia scheduler can go back and forth without a problem. If my guess is correct, they are not throwing random graphic and compute tasks at the cards in this test, which would probably decimate Nvidia's asynchronous compute warp. Just a guess, but time will reveal whether its true or not.

Notional · September 1, 2015

Apart from those who are jumping ship, AMD arent getting any long term benefits from this short of customers holding onto their GCN cards longer (thus both Nvidia and AMD sells less GPUs... so neither wins in this case, actually AMD loses in this case as they still gotta make new drivers for old cards)...

What we can assume is that Oxide tried to unload many threads, perhaps not very complex tasks, but many of them, onto the GPU using AC. Depending on how many simultaneous tasks there is, Nvidia would hit their 31 thread limitation before AMD hits their 64 thread limitation. And while Nvidia has a much quicker GPU, eating through their 31 "jobs" faster. AMD simply beats them at "quantitity"...

I generally agree with your post, but these two points stick out:

If people get the notion that AMD can handle DX12 better, AMD should gain market share both in the near future and long term, as people are looking forward to upcoming titles, utilizing DX12. So I very much doubt AMD will suffer from this. On the contrary.

NVidia hardware is not faster. Not in flops and not in DX12. NVidia's performance gains are due to multithreaded low overhead DX11 drivers. Those are rendered irrelevant in DX12, which is one of the biggest reasons we are seeing this shift.

Notional · September 1, 2015

Given that AMD have 8 compute engines running in parallel with a 9th graphic pipeline, that test is a poster child for GCN architecture and how it functions.

From my understanding, Maxwell doesn't do 31 compute threads in parallel, but rather it can queue up to 31 compute operations in the asynchronous compute warp buffer, and execute any one of them out of order, but it cannot have both compute and graphic operations in the pipeline at a single time so it has to bounce back and forth. In this test, the compute and graphic tasks appear to be dished out in a predictable manner, so the Nvidia scheduler can go back and forth without a problem. If my guess is correct, they are not throwing random graphic and compute tasks at the cards in this test, which would probably decimate Nvidia's asynchronous compute warp. Just a guess, but time will reveal whether its true or not.

Exactly. I guess it depends on the definition of asynchronous compute, but from this it does not sounds like proper support, as the total render time should be limited by the highest part (either graphics or compute, but usually compute it seems). In Maxwell we see the two render times add up, which means they run in serial, not parallel. If the scheduler gets screwed by more complex work, that just sounds even work.

Prysin · September 1, 2015

Given that AMD have 8 compute engines running in parallel with a separate graphic pipeline, that test is a poster child for GCN architecture and how it functions.

From my understanding, Maxwell doesn't do 31 compute threads in parallel, but rather it can queue up to 31 compute operations in the asynchronous compute warp buffer, and execute any one of them out of order, but it cannot have both compute and graphic operations in the pipeline at a single time so it has to bounce back and forth. In this test, the compute and graphic tasks appear to be dished out in a predictable manner, so the Nvidia scheduler can go back and forth without a problem. If my guess is correct, they are not throwing random graphic and compute tasks at the cards in this test, which would probably decimate Nvidia's asynchronous compute warp. Just a guess, but time will reveal whether its true or not.

correct. That is what everything is pointing towards at this point

Sammael · September 1, 2015

Given that AMD have 8 compute engines running in parallel with a separate graphic pipeline, that test is a poster child for GCN architecture and how it functions.

From my understanding, Maxwell doesn't do 31 compute threads in parallel, but rather it can queue up to 31 compute operations in the asynchronous compute warp buffer, and execute any one of them out of order, but it cannot have both compute and graphic operations in the pipeline at a single time so it has to bounce back and forth. In this test, the compute and graphic tasks appear to be dished out in a predictable manner, so the Nvidia scheduler can go back and forth without a problem. If my guess is correct, they are not throwing random graphic and compute tasks at the cards in this test, which would probably decimate Nvidia's asynchronous compute warp. Just a guess, but time will reveal whether its true or not.

This is the issue. When a workload consists entirely of compute tasks, that does not highlight the issues as well with maxwell. Those compute tasks will be loaded into their 31 queues and be completed in whatever time it takes.

The issue arises when there is a mix of compute and graphics work. It seems Maxwell can't mix in results from both compute and general graphics tasks on the fly as quickly as GCN, if it needs to due to the demands of a game it has to incur some sort of context switching penalty. This is why results from some of the recent tests show workloads using asynchronous compute adding up the time taken for graphics and compute. A useful example given was from silverforce who frequents the anandtech forums and reddit.

https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

SilverforceG 11 points

17 hours ago

Let me explain again why the OP's interpretation of the data is wrong, simplified.

Example: Compute takes 10ms. Graphics takes 10ms.

If Async Compute functions, doing Compute + Graphics together = 10ms.

NOT 20ms. 20ms indicates serial operation.

This is what happens with Maxwell. When it does compute or graphics separately, it does it in less time. When it does both task at once, the time is EXACTLY the sum of the two task individually, thus its operating in serial mode. It cannot do Async Compute.

Don't believe me? Make a b3d forum account and ask the creator of the program.

Now, I think the statement that it cannot to async compute is perhaps a bit too far, it can do it well enough, so long as that is ALL that's going on. But I am not 100 % sure on any of these details, I still feel like I am staring through a darkened glass to glean more understanding and details of what's going on as a lay person, but so far, asynch compute is not being executed in parallel with other graphics tasks on nvidia cards like it is on amd cards with gcn.

Note, even nvidia fanboy extraordinaire patrick green blood in his veins jr has semi shifted his hate rants about amd toward things like nvidia designing for the CURRENT market and not "over engineering" cards for future performance gains. Actually trying to make a case that year long planned obsolescence is a great business feature not a bug as nvidia can suck more money from its sycophants.

Either way, this looks good for amd. And now the async performance cat is out of the bag, people will be looking for it. I suspect upcoming dx12 games using unreal will rely on this to a lesser extent to buoy up nvidia performance numbers along with gameworks, but at least now people can look at that and call them out if they do that.

Kinda Bottlenecked · September 1, 2015

nvidia fanboy extraordinaire patrick green blood in his veins jr

Not to pick on anyone but ain't that name calling? Though it sounds funny...

Prysin · September 1, 2015

Not to pick on anyone but ain't that name calling? Though it sounds funny...

well it would be if it was accurate. But there is a lot of blue in there too so you basically get some sort or turqiose blood i guess

CTR640 · September 1, 2015

As a GTX780 user I am very happy to see this happen. Finally AMD can wipe nVidia's ass and make them force to get their ass in gear

and start thinking before selling their GPU's at extorbitant prices. Go AMD!

In the meanwhile I'll only upgrade to either nVidia Pascal or AMD Arctic Islands to play GTAV in all its glory at its maxed settings including MSAA and Advanced Settings.

Kinda Bottlenecked · September 1, 2015

well it would be if it was accurate. But there is a lot of blue in there too so you basically get some sort or turqiose blood i guess

Heh... That took me awhile to get...

Sign In

[Updated] Oxide responds to AotS Conspiracies, Maxwell Has No Native Support For DX12 Asynchronous Compute

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites