Search the Community
Showing results for tags 'asynchronous compute'.
-
Here's the story, I want to upgrade my PC graphics card from the GTX 770 2gb to the GTX 1060 6gb graphics card to play some new titles. Titles that I am interested in are Gears of War 4, Forza Horizon 3, and Battlefield 1. After Microsoft announced that certain games would be under the "Xbox play anywhere" initiative I have been contemplating upgrading my graphics card and playing more Xbox games on my PC. The only problem is all the games I wanted to play needed more power than I have. I had already plan to upgrade my whole rig soon after the GTX 1080 Ti would be released, but on top of unexpected financial issues and this asynchronous compute debate with Nividia pascal and beyond graphics cards, I was just going to sit out the whole 1000 series and wait until Volta dropped with the 1100 series. The plan was to continue to play certain games on my PC and other games on my Xbox. Then competition in the graphics card market brought forth $200 dollar graphics cards that can run games with high fidelity and frame rates. I started to thinking "Now I don't have to pay for Xbox live and I can play my Xbox games". Not to mention a lot of the problems I have with playing on consoles like, sub-par resolution and frame rates, NAT type issues(I am in college), and Xbox live issues are all gone now. To be fair, free games every month is a lost, but most of those games I never would have purchased and even though there free I still would have most likely not played them. All in all losing Xbox live is a trade-off I am willing to except. However, I started to ponder was it worth it to upgrade my PC when I plan to upgrade it again soon after graduating in 2019? I know that sounds crazy because that is about two and half years from now, but my PC so far has mostly played indie titles and a couple AA and AAA titles at 1440p 60 fps, it was not until I tried to play Doom(2016) that I saw real game play issues. Plus, the next time I wanted to pay over a $1000 for a PC was for a new standard of game resolution and frame rate: 4K 60fps solid on most titles. My thought process was to then save my money and wait it out. However great games are out now, and I want to play them on my PC. After some research I have determined that my 8gb of 1600MHz DDR3 ram, i7-4820k 3.7 GHz CPU will not bottle neck my performance for at least one to one and a half years maybe two. Next I needed to see if this card would card would run at my screen's native resolution of 1440p, and in most cases I think I will be fine.Therefore I will see a performance boost from just adding a graphics card. Or, I could pay $60 for Xbox live buy my games and know they will at least run even if there not at a favorable resolution or frame rate. So for 3 more years of Xbox live I am looking at $180 and for upgrading my PC I am looking at $250. Now there are somethings I can do to save money on both sides, but right now it is more expensive to play on my PC. Plus, is this upgrade I am doing going to hold me over until I get a serious upgrade or am I paying $250 to play games from this year; then next year my CPU bottle necks me, or games run better on GDD4 RAM, or the new AAA game I wanna play can't run at my screen's resolution. I do not want to pay $250 to turn around and find out my computer won't run a game well, so I have to go purchase Xbox live again and waste money to play a game the runs well, but isn't smooth and looks sub-par. Or I could just continue to deal with the sub-par console, and slowly decaying PC. To put in to perspective I could purchase Xbox live and all of the games listed above right now for still less money it would take to upgrade my graphics card and buy NO games. In the end I would just like honest opinions and advice from you ladies and gentlemen on whether or not you believe it is worth it, thank you and have a nice day.
-
http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1200#post_24356995 Oxide developer "Kollack" Made a lengthy response on Overclock.net forums last night, dispelling some of the myths surrounding Ashes of the Singularity and GPU Vendor Bias. I have now divided this post up into sections with spoilers, so it's somewhat readable. Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole. Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD wink.gif As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games. If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic wink.gif). Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports. From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one. I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features. Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet? In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it, wink.gif) -- P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally. TL;DR version: - Oxide has been working openly with Nvidia, AMD and Intel, but more with Nvidia in recent months than AMD or Intel. - If Ashes running on Nitrous engine is AMD biased, than any Unreal 4 game must be Nvidia biased (exposing a logical fallacy) - Ashes is not the only game being developed with Nitrous Engine, so it is not some benchmark outlier - Nvidia DX11 performance improved heavily over the past month, DX12 overhead still better for Nvidia than DX11 and has not been fully tweaked yet - A separate rendering path was made at Nvidia's request to disable async compute on Nvidia hardware. - enabling DX12 async compute for Maxwell was an unmitigated disaster - Maxwell does not appear to support async Compute, at least not natively for DX12 gaming - Ashes uses a modest amount of DX12 Async Compute, Ashes is not a poster child for advanced GCN features. - Some newer console games in development are seeing up to 30% performance gain by turning on async compute for gaming effects - Minimum frame times in DX12 should be the same for AMD and Nvidia, but 290x benched slightly better minimum frame times than 980ti, which appears to be a variance in GPU hardware and not the application - AMD has not complained to Oxide about working with Nvidia, even though AMD are in a marketing agreement with Stardock - Oxide not in a fight with Nvidia. Nvidia not disputing what oxide said in their blog. Nvidia had a knee jerk reaction to oxide not giving in to Nvidia PR pressure to disable certain features in the benchmark. So from the horses mouth (again), there is no GPU Vendor bias in regards to AotS benchmark results. Kollack, from the same thread responds: http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1210#post_24357053 Kollacks comment "Whether or not Async Compute is better or not is subjective" carries more weight behind it than even I realized at first. His mention of Nvidia's scheduler reveals the reason why Nvidia are struggling with Async Compute, and the extra work they need to do in their drivers. Kollack, again in the same thread: http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1400#post_24360916 It's probably important not to take Robert out of context here. His comment "Maxwell is utterly incapable of performing asynchronous compute" should not be separated from the rest of the sentence: "without heavy reliance on slow context switching." Reading between the lines, he is saying that Maxwell can do asynchronous compute just fine, but with a heavy latency penalty when graphic shaders and compute shaders/calculations are thrown into the mix together ("context" in this case = type of task in the render pipeline). AMD_Robert then clarifies that GCN 1.2 is indeed fully DX12_0 feature compliant, but missing DX12_1 features: no official statement has been released from Nvidia (to my knowledge) specifying what DX12 features Maxwell 2.0 does not support. David: Not just queue, but also you can queue it, but you wanna get it into the shader array, you need to dispatch it, right? Scott: So you wanna have that, Nvidia has had the ability to do that since Fermi, I believe? They've had this hyper-queue feature? oh, no it was in big GK110, 32 uh different queued items in hardware. David: Right, but I think they have to be of the same type. I don't think... Scott: That's the other one, is can you run kernels of different types on the GPU concurrently. And, and the last one is preemption, um, right? David: Yep Scott: You gotta be able to, especially if you can't, uh, run different things concurrently, like you need to stop and start something else... what does that look like? umm, so... David: So I've been told by, uh, folks at Oculus that the preemption is -and this is prior to the skylake Gen 9 Architecture, which has better preemption- but that the best preemption support for context switches was with AMD, by far. Intel was pretty good, and Nvidia was possibly catastrophic. um, like, what they, so the real issue is: if you have a shader running -a graphics shader- you need to let it finish. and, you know, it could take you a long time, it could take you over 16 milliseconds to finish. I have heard from, and this is like from engineers who are working on like really crazy stuff in the driver of like graphic shaders that took a couple seconds to run. Obviously there something is totally busted. Scott: Completely unworkable for "real-time anything." David: Right yeah, I mean this is really great for your power point slide show, maybe, umm but the point is: And Nvidia's very you know to their credit they're open and honest this in how you tune for oculus rift, is that you have to be super careful cuz you could miss a frame boundry because the preemption is not particularly low latency. and again, this is, it's not like this is a bad decision on the part of Nvidia, it's you know, that's just, that's what made sense, and preemption wasn't something that was super important when chip was designed, and the API support was, eh uhh, there wasn't as much bang for your buck. And so now I'm sure they are going to improve that in Pascal, Right? Nvidia is full of good, sharp architects, and they'll probably fix it in Pascal. Scott: And that is one of the things, they kinda projected better preemption from Maxwell, but then they didn't build a Maxwell for GPU Compute, like HPC type scenarios. I mean, they don't have a big one for that. David: I mean, Maxwell's not really a great fit for compute, uh, because I think the way it got more power efficient is they threw out a lot of the scheduling hardware, er uh, almost all of it. and, again Nvidia hasn't talked about this, but you know and that's a fine thing to do, it increases your power efficiency, especially for graphics where you can compile it ahead of time, and you generally know they are going to be doing the same stuff. The problem is, uh like in compute land, you can never be sure what your crazy users are gonna do, so you, you, you, you know, the value, I mean look, Intel has a full blown out-of-order Core in knights Landing. you know they didn't do it because they're dumb, right, they did it because it's a good idea....snip... the point is compute load are inherently unpredictable, and so, some of the tricks that work great in graphics, they're just not amenable to compute, and we've seen this historically, right, where there's a different Core, with more shared memory, and different double precision and single precision balance for Nvidia's HPC parts. And so I think they just looked at Maxwell, and they said 'look we're, you know, these are the resources that have, here are our competitive pressures, you know, just do Pascal.' Scott: And it makes sense probably, I can see that. the GPU's that we have right now are... retargetted because 20nm didn't work out, so yeah they (the GPU's) are what they had to do at the time. Wow, lots more posts here, there is just too many things to respond to so I'll try to answer what I can. /inconvenient things I'm required to ask or they won't let me post anymore Regarding screenshots and other info from our game, we appreciate your support but please refrain from disclosing these until after we hit early access. It won't be long now. /end Regarding batches, we use the term batches just because we are counting both draw calls and dispatch calls. Dispatch calls are compute shaders, draw calls are normal graphics shaders. Though sometimes everyone calls dispatchs draw calls, they are different so we thought we'd avoid the confusion by calling everything a draw call. Regarding CPU load balancing on D3D12, that's entirely the applications responsibility. So if you see a case where it’s not load balancing, it’s probably the application not the driver/API. We’ve done some additional tunes to the engine even in the last month and can clearly see usage cases where we can load 8 cores at maybe 90-95% load. Getting to 90% on an 8 core machine makes us really happy. Keeping our application tuned to scale like this definitely on ongoing effort. Additionally, hitches and stalls are largely the applications responsibility under D3D12. In D3D12, essentially everything that could cause a stall has been removed from the API. For example, the pipeline objects are designed such that the dreaded shader recompiles won’t ever have to happen. We also have precise control over how long a graphics command is queued up. This is pretty important for VR applications. Also keep in mind that the memory model for D3d12 is completely different the D3D11, at an OS level. I’m not sure if you can honestly compare things like memory load against each other. In D3D12 we have more control over residency and we may, for example, intentionally keep something unused resident so that there is no chance of a micro-stutter if that resource is needed. There is no reliable way to do this in D3D11. Thus, comparing memory residency between the two APIS may not be meaningful, at least not until everyone's had a chance to really tune things for the new paradigm. Regarding SLI and cross fire situations, yes support is coming. However, those options in the ini file probablly do not do what you think they do, just FYI. Some posters here have been remarkably perceptive on different multi-GPU modes that are coming, and let me just say that we are looking beyond just the standard Crossfire and SLI configurations of today. We think that Multi-GPU situations are an area where D3D12 will really shine. (once we get all the kinks ironed out, of course). I can't promise when this support will be unvieled, but we are commited to doing it right. Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn't hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn't hold Ashes up as the premier example of this feature. We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more. Also, we are pleased that D3D12 support on Ashes should be functional on Intel hardware relatively soon, (actually, it's functional now it's just a matter of getting the right driver out to the public). Thanks! In regards to the purpose of Async compute, there are really 2 main reasons for it: 1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can't divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell's schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one. 2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I'm not sure of the background of Async Compute, but it's quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks. It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec. Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p. I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.
- 468 replies
-
- async compute
- maxwell
-
(and 8 more)
Tagged with: