[Updated] Oxide responds to AotS Conspiracies, Maxwell Has No Native Support For DX12 Asynchronous Compute

Briggsy · August 30, 2015

http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1200#post_24356995

Oxide developer "Kollack" Made a lengthy response on Overclock.net forums last night, dispelling some of the myths surrounding Ashes of the Singularity and GPU Vendor Bias.

I have now divided this post up into sections with spoilers, so it's somewhat readable.

Originally Posted by PhantomTaco View Post

This doesn't really prove anything, past performance isn't an indicator of anything, it may hold merit sometimes, but without evidence it's nothing more than suspicion.

This doesn't say much either.

This, though, does say something. I'm interested to see when UE4 based Ark launches the DX12 patch next week to get some more data points to add. While it is nice to know that they did open the source code up, it doesn't entirely mean it is unbiased. As I recall Oxide games was one of the first to work with AMD on mantle, meaning they had a past track record with AMD working on developing their engine. In that respect it makes me wonder whether or not they still did make choices that specifically benefitted AMD back with mantle that were repeated with Ashes. It also means (in theory at least), that AMD has had more than the past year working with Oxide on this title, whereas Intel and Nvidia have had a year working on it. I'm not calling foul play, but I am still questioning the data until more titles are launched based on different engines.

Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.

Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD wink.gif As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.

If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic wink.gif).

Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.

From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.

I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.

Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?

In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it, wink.gif)

--

P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.

TL;DR version:

- Oxide has been working openly with Nvidia, AMD and Intel, but more with Nvidia in recent months than AMD or Intel.

- If Ashes running on Nitrous engine is AMD biased, than any Unreal 4 game must be Nvidia biased (exposing a logical fallacy)

- Ashes is not the only game being developed with Nitrous Engine, so it is not some benchmark outlier

- Nvidia DX11 performance improved heavily over the past month, DX12 overhead still better for Nvidia than DX11 and has not been fully tweaked yet

- A separate rendering path was made at Nvidia's request to disable async compute on Nvidia hardware.

- enabling DX12 async compute for Maxwell was an unmitigated disaster

- Maxwell does not appear to support async Compute, at least not natively for DX12 gaming

- Ashes uses a modest amount of DX12 Async Compute, Ashes is not a poster child for advanced GCN features.

- Some newer console games in development are seeing up to 30% performance gain by turning on async compute for gaming effects

- Minimum frame times in DX12 should be the same for AMD and Nvidia, but 290x benched slightly better minimum frame times than 980ti, which appears to be a variance in GPU hardware and not the application

- AMD has not complained to Oxide about working with Nvidia, even though AMD are in a marketing agreement with Stardock

- Oxide not in a fight with Nvidia. Nvidia not disputing what oxide said in their blog. Nvidia had a knee jerk reaction to oxide not giving in to Nvidia PR pressure to disable certain features in the benchmark.

So from the horses mouth (again), there is no GPU Vendor bias in regards to AotS benchmark results.

Kollack, from the same thread responds: http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1210#post_24357053

AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not. Weather or not Async Compute is better or not is subjective, but it definitely does buy some performance on AMD's hardware. Whether it is the right architectural decision for Maxwell, or is even relevant to it's scheduler is hard to say.

Kollacks comment "Whether or not Async Compute is better or not is subjective" carries more weight behind it than even I realized at first. His mention of Nvidia's scheduler reveals the reason why Nvidia are struggling with Async Compute, and the extra work they need to do in their drivers.

Kollack, again in the same thread: http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1400#post_24360916

I think you are confusing a few issues. Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute. It's has to do with the number of root level descriptors we can pass. In tier 3, it turns out we can basically never update a descriptor during a frame, but in tier 2 we sometimes have to build a few . I don't think it's a significant performance issue though, just a techinical detail.

Compute shaders executed in parallel and out-of-sync with cpu draw calls may very well be where developers are headed for gaming, or at least an option they are considering, if there is any merit to what Kollack says. Given the need for Asynchronous Compute in VR to reduce latency, it makes sense for now.

Kollack has responded regarding Nvidia and Async Compute Support. http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/2130#post_24379702

A good synopsis of how Nvidia handles async compute can be found here in this post.

https://www.reddit.com/r/pcgaming/comments/3j1916/get_your_popcorn_ready_nv_gpus_do_not_support/

ELI5 (Explain Like I'm 5) version https://www.reddit.com/r/pcgaming/comments/3j1916/get_your_popcorn_ready_nv_gpus_do_not_support/cullj3d

http://gearnuke.com/amd-employee-claims-nvidias-maxwell-gpus-utterly-incapable-performing-async-compute/#

http://thegametechnician.com/2015/08/31/analysis-amds-long-game-realization/

http://wccftech.com/preemption-context-switching-allegedly-best-amd-pretty-good-intel-catastrophic-nvidia/

http://wccftech.com/nvidia-async-compute-directx-12-oxide-games/

http://www.pcper.com/reviews/Graphics-Cards/DX12-GPU-and-CPU-Performance-Tested-Ashes-Singularity-Benchmark

http://www.extremetech.com/gaming/212314-directx-12-arrives-at-last-with-ashes-of-the-singularity-amd-and-nvidia-go-head-to-head

http://www.eurogamer.net/articles/digitalfoundry-2015-ashes-of-the-singularity-dx12-benchmark-tested

http://www.legitreviews.com/ashes-of-the-singularity-directx-12-vs-directx-11-benchmark-performance_170787

http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks

http://www.overclock.net/t/1572716/directx-12-asynchronous-compute-an-exercise-in-crowd-sourcing#post_24385652

TL;DR - Nvidia does Async Compute with hardware and software using slow context switching, resulting in possible latency.

AMD_Robert Dive's into the ~~clusterf~~ brawl: https://www.reddit.com/r/AdvancedMicroDevices/comments/3iwn74/kollock_oxide_games_made_a_post_discussing_dx12/

AMD_Robert- EmployeeAMD 12 points 17 hours ago

Oxide effectively summarized my thoughts on the matter. NVIDIA claims "full support" for DX12, but conveniently ignores that Maxwell is utterly incapable of performing asynchronous compute without heavy reliance on slow context switching.

GCN has supported async shading since its inception, and it did so because we hoped and expected that gaming would lean into these workloads heavily. Mantle, Vulkan and DX12 all do. The consoles do (with gusto). PC games are chock full of compute-driven effects.

If memory serves, GCN has higher FLOPS/mm2 than any other architecture, and GCN is once again showing its prowess when utilized with common-sense workloads that are appropriate for the design of the architecture

It's probably important not to take Robert out of context here. His comment "Maxwell is utterly incapable of performing asynchronous compute" should not be separated from the rest of the sentence: "without heavy reliance on slow context switching." Reading between the lines, he is saying that Maxwell can do asynchronous compute just fine, but with a heavy latency penalty when graphic shaders and compute shaders/calculations are thrown into the mix together ("context" in this case = type of task in the render pipeline).

AMD_Robert then clarifies that GCN 1.2 is indeed fully DX12_0 feature compliant, but missing DX12_1 features:

What about Fury? What aspect(s) of DX12 is it missing?

Raster Ordered Views and Conservative Raster. Thankfully, the techniques that these enable (like global illumination) can already be done in other ways at high framerates (see: DiRT Showdown).

no official statement has been released from Nvidia (to my knowledge) specifying what DX12 features Maxwell 2.0 does not support.

On a recent Techreport Podcast, David Kanter -a very well versed expert in silicon chips- and Scott Wason really get to the heart and truth of the matter:

Scott: So there are three things with Asynchronous compute that I want to understand the difference between. Like, ah, like say Fiji or Hawaii the latest big GCNs and Maxwell. Right? There are three separate things I think? that I would I would put into important catagories here: One of them is the ability to Queue multiple different types of work, um, and uh, Nvidia has had...

This discussion exposes, or at least gives us a peek at what is happening behind the scenes in the gaming industry, and where Nvidia, AMD, and most surprisingly Intel stand right now for Next Gen graphic technology. Discussion about this podcast is ongoing in this thread: http://linustechtips.com/main/topic/444217-d-kanter-on-as-oculus-preemption-for-context-switches-best-on-amd-good-on-intel-possibly-catastrophic-for-nvidia/

http://www.extremetech.com/extreme/213519-asynchronous-shading-amd-nvidia-and-dx12-what-we-know-so-far

In the two graphs below, the Yellow line represents Ideal Latency for Asynchronous Compute Execution, While the Red Line reveals the actual latency of each card (R9 290 and GTX 980 TI, respectively). It would appear that the 290's latency is consistent regardless of workload, while the 980Ti's latency becomes erratic past 31 threads.

For a good synopsis of Asynchronous Compute and how Nvidia likely supports it, check out This Post by Mahigan.

No official response has been scheduled by Nvidia yet. If Nvidia does respond, I will include it in this post.

David: Not just queue, but also you can queue it, but you wanna get it into the shader array, you need to dispatch it, right?

Scott: So you wanna have that, Nvidia has had the ability to do that since Fermi, I believe? They've had this hyper-queue feature? oh, no it was in big GK110, 32 uh different queued items in hardware.

David: Right, but I think they have to be of the same type. I don't think...

Scott: That's the other one, is can you run kernels of different types on the GPU concurrently. And, and the last one is preemption, um, right?

David: Yep

Scott: You gotta be able to, especially if you can't, uh, run different things concurrently, like you need to stop and start something else... what does that look like? umm, so...

David: So I've been told by, uh, folks at Oculus that the preemption is -and this is prior to the skylake Gen 9 Architecture, which has better preemption- but that the best preemption support for context switches was with AMD, by far. Intel was pretty good, and Nvidia was possibly catastrophic. um, like, what they, so the real issue is: if you have a shader running -a graphics shader- you need to let it finish. and, you know, it could take you a long time, it could take you over 16 milliseconds to finish. I have heard from, and this is like from engineers who are working on like really crazy stuff in the driver of like graphic shaders that took a couple seconds to run. Obviously there something is totally busted.

Scott: Completely unworkable for "real-time anything."

David: Right yeah, I mean this is really great for your power point slide show, maybe, umm but the point is: And Nvidia's very you know to their credit they're open and honest this in how you tune for oculus rift, is that you have to be super careful cuz you could miss a frame boundry because the preemption is not particularly low latency. and again, this is, it's not like this is a bad decision on the part of Nvidia, it's you know, that's just, that's what made sense, and preemption wasn't something that was super important when chip was designed, and the API support was, eh uhh, there wasn't as much bang for your buck. And so now I'm sure they are going to improve that in Pascal, Right? Nvidia is full of good, sharp architects, and they'll probably fix it in Pascal.

Scott: And that is one of the things, they kinda projected better preemption from Maxwell, but then they didn't build a Maxwell for GPU Compute, like HPC type scenarios. I mean, they don't have a big one for that.

David: I mean, Maxwell's not really a great fit for compute, uh, because I think the way it got more power efficient is they threw out a lot of the scheduling hardware, er uh, almost all of it. and, again Nvidia hasn't talked about this, but you know and that's a fine thing to do, it increases your power efficiency, especially for graphics where you can compile it ahead of time, and you generally know they are going to be doing the same stuff. The problem is, uh like in compute land, you can never be sure what your crazy users are gonna do, so you, you, you, you know, the value, I mean look, Intel has a full blown out-of-order Core in knights Landing. you know they didn't do it because they're dumb, right, they did it because it's a good idea....snip... the point is compute load are inherently unpredictable, and so, some of the tricks that work great in graphics, they're just not amenable to compute, and we've seen this historically, right, where there's a different Core, with more shared memory, and different double precision and single precision balance for Nvidia's HPC parts. And so I think they just looked at Maxwell, and they said 'look we're, you know, these are the resources that have, here are our competitive pressures, you know, just do Pascal.'

Scott: And it makes sense probably, I can see that. the GPU's that we have right now are... retargetted because 20nm didn't work out, so yeah they (the GPU's) are what they had to do at the time.

Wow, lots more posts here, there is just too many things to respond to so I'll try to answer what I can.

/inconvenient things I'm required to ask or they won't let me post anymore

Regarding screenshots and other info from our game, we appreciate your support but please refrain from disclosing these until after we hit early access. It won't be long now.

/end

Regarding batches, we use the term batches just because we are counting both draw calls and dispatch calls. Dispatch calls are compute shaders, draw calls are normal graphics shaders. Though sometimes everyone calls dispatchs draw calls, they are different so we thought we'd avoid the confusion by calling everything a draw call.

Regarding CPU load balancing on D3D12, that's entirely the applications responsibility. So if you see a case where it’s not load balancing, it’s probably the application not the driver/API. We’ve done some additional tunes to the engine even in the last month and can clearly see usage cases where we can load 8 cores at maybe 90-95% load. Getting to 90% on an 8 core machine makes us really happy. Keeping our application tuned to scale like this definitely on ongoing effort.

Additionally, hitches and stalls are largely the applications responsibility under D3D12. In D3D12, essentially everything that could cause a stall has been removed from the API. For example, the pipeline objects are designed such that the dreaded shader recompiles won’t ever have to happen. We also have precise control over how long a graphics command is queued up. This is pretty important for VR applications.

Also keep in mind that the memory model for D3d12 is completely different the D3D11, at an OS level. I’m not sure if you can honestly compare things like memory load against each other. In D3D12 we have more control over residency and we may, for example, intentionally keep something unused resident so that there is no chance of a micro-stutter if that resource is needed. There is no reliable way to do this in D3D11. Thus, comparing memory residency between the two APIS may not be meaningful, at least not until everyone's had a chance to really tune things for the new paradigm.

Regarding SLI and cross fire situations, yes support is coming. However, those options in the ini file probablly do not do what you think they do, just FYI. Some posters here have been remarkably perceptive on different multi-GPU modes that are coming, and let me just say that we are looking beyond just the standard Crossfire and SLI configurations of today. We think that Multi-GPU situations are an area where D3D12 will really shine. (once we get all the kinks ironed out, of course). I can't promise when this support will be unvieled, but we are commited to doing it right.

Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn't hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn't hold Ashes up as the premier example of this feature.

We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.

Also, we are pleased that D3D12 support on Ashes should be functional on Intel hardware relatively soon, (actually, it's functional now it's just a matter of getting the right driver out to the public).

Thanks!

In regards to the purpose of Async compute, there are really 2 main reasons for it:

1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can't divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell's schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.

2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I'm not sure of the background of Async Compute, but it's quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks.

It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec.

Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.

I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.

Biggerisbetter · August 30, 2015

Are people really this butthurt over this. Very sad to see

Kinda Bottlenecked · August 30, 2015

Are people really this butthurt over this. Very sad to see

When you see a 290x almost reaching a 980 ti of course peeps will be butthurt

Biggerisbetter · August 30, 2015

When you see a 290x almost reaching a 980 ti of course peeps will be butthurt

Who are the individuals that are complaining. Even nvidia stop complaining and took it like a man. From the looks of it's nvidia crazies have their tits in a bunch. It's just one freaking bench. The PC world is very toxic.

AlwaysFSX · August 30, 2015

Let the shitstorm from misinformed children BEGIN!

..

Children have arrived.

Kinda Bottlenecked · August 30, 2015

Who are the individuals that are complaining. Even nvidia stop complaining and took it like a man. From the looks of it's nvidia crazies have their tits in a bunch. It's just one freaking bench. The PC world is very toxic.

Its not just Nvidia fanboys, everyone is going crazy. The media is just adding more fuel to the fire

Biggerisbetter · August 30, 2015

Its not just Nvidia fanboys, everyone is going crazy. The media is just adding more fuel to the fire

Its only one freaking bench, currently there isn't enough conclusive data. Why cant people just relax and wait for more DATA. But instead they talk out of their ass.

spartaman64 · August 30, 2015

lol the hypocrisy when nvidia was doing gameworks all the fanboys were defending them and now comes this and amd isnt even handing the developers code telling them to add it in the game and fanboys a freaking out about vendor bias

HKZeroFive · August 30, 2015

And I'm just waiting here for DX12 on Ark...

Shame they delayed it, hoping they get it out next week as promised.

linuxfan66 · August 30, 2015

Actually why not have benchmarks done on publicly available code*cough*jarringly missing 3dmark for dx 12 which should have been available at w10 launch(and should have a priority project)*cough*

carolkarine · August 30, 2015

lol the hypocrisy when nvidia was doing gameworks all the fanboys were defending them and now comes this and amd isnt even handing the developers code telling them to add it in the game and fanboys a freaking out about vendor bias

This. I find this hilarious.

Kepler and Maxwell were just great DX11 architectures. Too bad that amd just focused on parallelization and now it's just finally bearing fruit for them.

Slinkey · August 30, 2015

And here I am sitting in my chair eating lunch and finding it funny that people took those benchmarks very seriously.

Although my initial reaction was the same, I didn't care after 3 minutes of seeing the article.

All in all, unless AMD come with a a GPU which stomps the next-gen by Nvidia, I am gonna be happy with my fanboy-ism towards Nvidia.

Notional · August 30, 2015

A few things to conclude/speculate from this:

NVidia has excellent driver performance in DX11 due to multithreading and lower CPU overhead.
NVidia hardware is not at all excellent, but seems to be weaker and more obsolete than even a Hawaii GCN chip.
If any vendor bias is present in Ashes of the Singularity, it's pro NVidia due to NVidia specific code implemented.
NVidia's PR department reacted in such a way, that supports the nomenclature of NVidia as the graphics card mafia.

Most of this is already known in this thread: http://linustechtips.com/main/topic/432063-first-directx-12-game-benchmarked-update-2-more-benchmarks/?view=findpost&p=5792302

However, hearing that some Console engines are so effective on GCN, that it gives up to 30% performance increase, will be very exciting to see in a PC game. Unless the graphics card mafia throws a hissy fit again.

Hearing that Maxwell's Async compute is either extremely poor, or outright defective, is news that is very surprising. I think as DX12 gets used, we will see AMD get more and more ahead with their GCN architecture. It does worry me that Pascal might not be very good if NVidia cannot make async compute work properly.

Biggerisbetter · August 30, 2015

A few things to conclude/speculate from this:

NVidia has excellent driver performance in DX11 due to multithreading and lower CPU overhead.

NVidia hardware is not at all excellent, but seems to be weaker and more obsolete than even a Hawaii GCN chip.

If any vendor bias is present in Ashes of the Singularity, it's pro NVidia due to NVidia specific code implemented.

NVidia's PR department reacted in such a way, that supports the nomenclature of NVidia as the graphics card mafia.

Most of this is already known in this thread: http://linustechtips.com/main/topic/432063-first-directx-12-game-benchmarked-update-2-more-benchmarks/?view=findpost&p=5792302

However, hearing that some Console engines are so effective on GCN, that it gives up to 30% performance increase, will be very exciting to see in a PC game. Unless the graphics card mafia throws a hissy fit again.

Hearing that Maxwell's Async compute is either extremely poor, or outright defective, is news that is very surprising. I think as DX12 gets used, we will see AMD get more and more ahead with their GCN architecture. It does worry me that Pascal might not be very good if NVidia cannot make async compute work properly.

I was very surprised myself to see Maxwell Async was not being used. It was an eye opener and if it was being used. I believe Maxwell would have preformed far below Hawaii.

Notional · August 30, 2015

I was very surprised myself to see Maxwell Async was not being used. It was an eye opener and if it was being used. I believe Maxwell would have preformed far below Hawaii.

Well it sounds like it performed worse than not using it at all. The interesting part is whether it is a hardware or software issue. The latter could be fixed by NVidia. If not, then too bad.

Curufinwe_wins · August 30, 2015

Is async double compute? If so then ofc Maxwell will get wrecked.

People seem to forget 24/7 that in order to keep up with nvidia on dx11, amd made cards with LITERALLY 40% more SP compute performance.

Any situation where the nvidia card isn't getting wrecked ought to be a huge slap in the face to amd.

On a side, stop all shit whiney Bullshit about 290vs 980ti.

The fury x vs 980 ti put them basically at an exact dead heat in dx 12 so who cares if the performance gain was negligible as long as it competes with its competitors. (Yes the fury x had up to a 1-3 fps lead in some cases who cares. Oc takes care of that with ease).

If you have a gm204 card, well Hawaii seems better straight performance on this game in dx 12, but you still got the best prwformance to power card in the world (unless the nano breaks that, which I have to hope it will).

Notional · August 30, 2015

Is async double compute? If so then ofc Maxwell will get wrecked.

People seem to forget 24/7 that in order to keep up with nvidia on dx11, amd made cards with LITERALLY 40% more SP compute performance.

Any situation where the nvidia card isn't getting wrecked ought to be a huge slap in the face to amd.

On a side, stop all shit whiney Bullshit about 290vs 980ti.

The fury x vs 980 ti put them basically at an exact dead heat in dx 12 so who cares if the performance gain was negligible as long as it competes with its competitors. (Yes the fury x had up to a 1-3 fps lead in some cases who cares. Oc takes care of that with ease).

If you have a gm204 card, well Hawaii seems better straight performance on this game in dx 12, but you still got the best prwformance to power card in the world (unless the nano breaks that, which I have to hope it will).

It's not double precision. You can see more in this video:

patrickjp93 · August 30, 2015

A few things to conclude/speculate from this:

NVidia has excellent driver performance in DX11 due to multithreading and lower CPU overhead.

NVidia hardware is not at all excellent, but seems to be weaker and more obsolete than even a Hawaii GCN chip.

If any vendor bias is present in Ashes of the Singularity, it's pro NVidia due to NVidia specific code implemented.

NVidia's PR department reacted in such a way, that supports the nomenclature of NVidia as the graphics card mafia.

Most of this is already known in this thread: http://linustechtips.com/main/topic/432063-first-directx-12-game-benchmarked-update-2-more-benchmarks/?view=findpost&p=5792302
However, hearing that some Console engines are so effective on GCN, that it gives up to 30% performance increase, will be very exciting to see in a PC game. Unless the graphics card mafia throws a hissy fit again.

Hearing that Maxwell's Async compute is either extremely poor, or outright defective, is news that is very surprising. I think as DX12 gets used, we will see AMD get more and more ahead with their GCN architecture. It does worry me that Pascal might not be very good if NVidia cannot make async compute work properly.

If you think Pascal won't knock AMD on its ass, you don't know how this game is played. Nvidia uses planned obsolescence brilliantly. Maxwell destroyed AMD in DX 11, and now Pascal will come out with better HBM 2 configs and earlier than Greenland thanks to Samsung stepping in. Further, Pascal is just Maxwell with native 64-bit support, mixed precision, and the remainder of the DX 12 bells and whistles, not to mention DX 12.1 full support which Maxwell already had. AMD has been making the same mistakes in GPUs that it was making in CPUs: building beyond the use and demands of the market and leaving no room for improvement. Nvidia will come out on top with Pascal like it did with Kepler and Maxwell.

HKZeroFive · August 30, 2015

If you think Pascal won't knock AMD on its ass, you don't know how this game is played. Nvidia uses planned obsolescence brilliantly. Maxwell destroyed AMD in DX 11, and now Pascal will come out with better HBM 2 configs and earlier than Greenland thanks to Samsung stepping in. Further, Pascal is just Maxwell with native 64-bit support, mixed precision, and the remainder of the DX 12 bells and whistles, not to mention DX 12.1 full support which Maxwell already had. AMD has been making the same mistakes in GPUs that it was making in CPUs: building beyond the use and demands of the market and leaving no room for improvement. Nvidia will come out on top with Pascal like it did with Kepler and Maxwell.

Again, these are all assumptions. You don't know if Pascal will beat out Arctic Islands.

So far it looks like AMD is winning DX12.

Notional · August 30, 2015

If you think Pascal won't knock AMD on its ass, you don't know how this game is played. Nvidia uses planned obsolescence brilliantly. Maxwell destroyed AMD in DX 11, and now Pascal will come out with better HBM 2 configs and earlier than Greenland thanks to Samsung stepping in. Further, Pascal is just Maxwell with native 64-bit support, mixed precision, and the remainder of the DX 12 bells and whistles, not to mention DX 12.1 full support which Maxwell already had. AMD has been making the same mistakes in GPUs that it was making in CPUs: building beyond the use and demands of the market and leaving no room for improvement. Nvidia will come out on top with Pascal like it did with Kepler and Maxwell.

Planned obsolescence is yet another reason to not buy NVidia, as it is very anti consumer centric. Maxwell is not even one year old and it's already becoming obsolete. Can't imagine people being so extremely fanboyish to accept this in the end.

So far only AMD has provided a functional GPU with HBM. We have no idea how far NVidia has come. Also define "better HBM 2 configs"? Because that is just guesswork from your part. If PAscal is "just Maxwell", Pascal will suck at async. shaders too, based on what we've seen so far.

As for DX 12.1, the .1 never seems to be utilized in games anyways. 10.1, 11.1, not very popular. And if NVidia cannot even make async shaders work in basic DX12, how would they ever be better than AMD in DX12?

Your entire post is a mix of guesses and speculation, and has nothing to do with facts or the empirics we've seen so far.

patrickjp93 · August 30, 2015

Again, these are all assumptions. You don't know if Pascal will beat out Arctic Islands.

So far it looks like AMD is winning DX12.

AMD's old cards which were over-engineered for the markets in which they were being deployed are winning in a more appropriate market. Nvidia wins when it deploys cards for the current market. I'm not assuming anything. I know.

Prysin · August 30, 2015

When you see a 290x almost reaching a 980 ti of course peeps will be butthurt

but when you look at compute POWER in AMD cards, its not really hard to imagine...

R9 290X

Pixel Rate: 64.0 GPixel/s

Texture Rate: 176 GTexel/s

Floating-point performance: 5,632 GFLOPS

R9 Fury X

Pixel Rate: 67.2 GPixel/s

Texture Rate: 269 GTexel/s

Floating-point performance: 8,602 GFLOPS

GTX 980

Pixel Rate: 72.1 GPixel/s

Texture Rate: 144 GTexel/s

Floating-point performance: 4,616 GFLOPS

GTX 980Ti

Pixel Rate: 96.0 GPixel/s

Texture Rate: 176 GTexel/s

Floating-point performance: 5,632 GFLOPS

GTX TitanX

Pixel Rate: 96.0 GPixel/s

Texture Rate: 192 GTexel/s

Floating-point performance: 6,144 GFLOPS

data taken from https://www.techpowerup.com/gpudb/

We can clearly see that in terms of RAW POWER, AMD does have a massive advantage that isnt being used, at all...

HKZeroFive · August 30, 2015

AMD's old cards which were over-engineered for the markets in which they were being deployed are winning in a more appropriate market. Nvidia wins when it deploys cards for the current market. I'm not assuming anything. I know.

An educated guess doesn't mean it's accurate. You are basing your 'facts' over the past. There's no evidence so far that shows Pascal is superior in performance in comparison to Arctic Islands, mainly because it hasn't been fully developed as of yet.

What we do know that AMD's 2 year old card is contesting, and in some cases, beating NVIDIA's flagship model in a DX12 game as of now. That my friend, is a fact.

Curufinwe_wins · August 30, 2015

It's not double precision. You can see more in this video:

Ok cool. If you check out the maxwell vs pascal slides it doesnt look like the current shader process would handle asynchronous shasing well.

Thanks for clearinf that up.

Prysin · August 30, 2015

An educated guess doesn't mean it's accurate. You are basing your 'facts' over the past. There's no evidence so far that shows Pascal is superior in performance in comparison to Arctic Islands, mainly because it hasn't been fully developed as of yet.

What we do know that AMD's 2 year old card is contesting, and in some cases, beating NVIDIA's flagship model in a DX12 game as of now. That my friend, is a fact.

just ignore him. He always sprouts out stuff without even having a single link to some unknown website to back him up... BUT on the other hand, he can look at these numbers and try to explain his theory thought..

Nvidia is going to catch up to or surpass ATLEAST the 8.6 TFLOP computational power of the Fury X in ONE generation?

Lets see how far they got from Kepler to Maxwell shall we?

GTX 680

Pixel Rate: 32.2 GPixel/s Texture Rate: 129 GTexel/s Floating-point performance: 3,090 GFLOPS

GTX 780Ti

Pixel Rate: 52.5 GPixel/s Texture Rate: 210 GTexel/s Floating-point performance: 5,040 GFLOPS

current gen 980Ti

Pixel Rate: 96.0 GPixel/s Texture Rate: 176 GTexel/s Floating-point performance: 5,632 GFLOPS

So from 680 to 780Ti there was massive gains, not surprisingly though. But from Kepler to Maxwell... well... they barely gained anything...Pixel fill rate was increased A LOT (probably due to Maxwell NOT being so heavy on double precision but rather focusing on gaming performance), the texel rate goes down? and incredibly enough, they gain just 600MFLOPS of computation....

If we look at AMD from GCN 1.0 to 1.2

7970 GHz edt

Pixel Rate: 32.0 GPixel/s Texture Rate: 128 GTexel/s Floating-point performance: 4,096 GFLOPS

290X

Pixel Rate: 64.0 GPixel/s Texture Rate: 176 GTexel/s Floating-point performance: 5,632 GFLOPS

Fury X

Pixel Rate: 67.2 GPixel/s Texture Rate: 269 GTexel/s Floating-point performance: 8,602 GFLOPS

in AMDs case we just see a steady improvement, with the FuryX topping out at nearly double the computational power of the Tahiti based 7970 GHz edt.

Sign In

[Updated] Oxide responds to AotS Conspiracies, Maxwell Has No Native Support For DX12 Asynchronous Compute

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites