Jump to content

Star Swarm, DirectX 12 AMD APU Performance Preview

Opcode

Source

 

DirectX12_678x452.png

 

After several requests and a week’s break from our initial DirectX 12 article, we’re back again with an investigation into Star Swarm DirectX 12 performance scaling on AMD APUs. As our initial article was run on various Intel CPU configurations, this time we’re going to take a look at how performance scales on AMD’s Kaveri APUs, including whether DX12 is much help for the iGPU, and if it can help equalize the single-threaded performance gap been Kaveri and Intel’s Core i3 family.

 

To keep things simple, this time we’re running everything on either the iGPU or a GeForce GTX 770. Last week we saw how quickly the GPU becomes the bottleneck under Star Swarm when using the DirectX 12 rendering path, and how difficult it is to shift that back to the CPU. And as a reminder, this is an early driver on an early OS running an early DirectX 12 application, so everything here is subject to change.

 

71532.png

 

71451.png

 

71534.png

 

To get right down to business then, are AMD’s APUs able to shift the performance bottleneck on to the GPU under DirectX 12? The short answer is yes. Highlighting just how bad the single-threaded performance disparity between Intel and AMD can be under DirectX 11, what is a clear 50%+ lead for the Core i3 with Extreme and Mid qualities becomes a dead heat as all 3 CPUs are able to keep the GPU fully fed. DirectX 12 provides just the kick that the AMD APU setups need to overcome DirectX 11’s CPU submission bottleneck and push it on to the GPU. Consequently at Extreme quality we see a 64% performance increase for the Core i3, but a 170%+ performance increase for the AMD APUs.

 

It's nice to see DX12 working in AMD's favor in some cases. By these numbers the $80 Athlon x4 860k will be in a dead heat with the $130 Core i3-4330.

 

iGPU performance gain is really negligible with unplayable frame rates. Tho this is good news for budget oriented gamers as DX12 games roll out (and scales as well).

Link to comment
Share on other sites

Link to post
Share on other sites

It's nice to see DX12 working in AMD's favor in some cases. By these numbers the $80 Athlon x4 860k will be in a dead heat with the $130 Core i3-4330.

iGPU performance gain is really negligible with unplayable frame rates. Tho this is good news for budget oriented gamers as DX12 games roll out (and scale as well).

Until software catches up to newer instruction sets, and then AMD's current-gem chips will fall back again due to weaker ALU and FPU scheduling.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Until software catches up to newer instruction sets, and then AMD will fall back again due to weaker ALU and FPU scheduling.

Then you don't have much faith in Jim Keller. AMD chips already support all of the same important instructions that Intel offers. Without evidence of how Zen operates and performs your opinion doesn't have a leg to stand on. From what can be seen is DX12 games built today are showing that single thread performance is becoming less relevant for gaming.

Link to comment
Share on other sites

Link to post
Share on other sites

Then you don't have much faith in Jim Keller. AMD chips already support all of the same important instructions that Intel offers. Without evidence of how Zen operates and performs your opinion doesn't have a leg to stand on. From what can be seen is DX12 games built today are showing that single thread performance is becoming less relevant for gaming.

Clarification: OLDER AMD chips (current APUs) will drag behind. I have high expectations of Keller, though I don't think he'll really eclipse everything up to Haswell in one shot.

 

Support of instructions is not the issue. Everyone knows AMD's FPU especially is its weakest link at almost 11 cycles for a 32-bit multiply due to having to check if the other half of the module scheduled a 256-bit float instruction. Also, Haswell did come out with a couple not found in Kaveri (and I don't know if Carrizo has them or not). I'm just saying once software starts to take advantage of instructions where AMD has a sore clock disadvantage, such as the MMX integer vector instructions, then you'll see the gap in performance widen, even if both progress forward to tolerable levels, unless of course all CPU-side bottlenecks are eliminated, leaving only CPU-GPU I/O and the GPU itself.

 

Therefore, I wouldn't say single-threaded performance is irrelevant just yet. Also, where are the FX 63xx and 83xx tests? It would be interesting to see how DX 12 really scales with cores now, and Vishera is only a small step back in IPC from Steamroller.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Clarification: OLDER AMD chips (current APUs) will drag behind. I have high expectations of Keller, though I don't think he'll really eclipse everything up to Haswell in one shot.

 

Support of instructions is not the issue. Everyone knows AMD's FPU especially is its weakest link at almost 11 cycles for a 32-bit multiply due to having to check if the other half of the module scheduled a 256-bit float instruction. Also, Haswell did come out with a couple not found in Kaveri (and I don't know if Carrizo has them or not). I'm just saying once software starts to take advantage of instructions where AMD has a sore clock disadvantage, such as the MMX integer vector instructions, then you'll see the gap in performance widen, even if both progress forward to tolerable levels, unless of course all CPU-side bottlenecks are eliminated, leaving only CPU-GPU I/O and the GPU itself.

 

Therefore, I wouldn't say single-threaded performance is irrelevant just yet. Also, where are the FX 63xx and 83xx tests? It would be interesting to see how DX 12 really scales with cores now, and Vishera is only a small step back in IPC from Steamroller.

In that context so will the current Core i3 offerings.

 

The Bulldozer implementation of sharing two 128-bit FMAC's isn't exactly optimal tho as shown they don't really play a huge role in game performance. Especially with a game such as this which heavily relies on floating point operations for hundreds of objects. For a A8-7600 to not "bottleneck" what would still be considered a somewhat higher end discrete GPU goes to show that even with sub-par core performance AMD's offerings will likely spring new life shortly with the launch of Windows 10 (Microsoft plans on launching a few DX12 games on the same day). The day of buying Intel's offerings for a budget gaming rig (due to DirectX 9-11) are coming to an end. The tests conducted even show the APU having a much lower batch latency than the Core i3. It would of been nice for them to include CPU utilization numbers so we can get an idea of how far these cheap APU's can go in terms of pushing higher end cards like the GTX 980 to their full potential.

 

Single thread performance isn't irrelevant tho it is becoming more irrelevant due to the multi-threaded nature of DirectX 12. Their main DirectX 12 review tests I believe a higher end Intel chip (6 cores). Tho it will be interesting to see where the FX-8350 stands with DirectX 12 testing.

Link to comment
Share on other sites

Link to post
Share on other sites

In that context so will the current Core i3 offerings.

The Bulldozer implementation of sharing two 128-bit FMAC's isn't exactly optimal tho as shown they don't really play a huge role in game performance. Especially with a game such as this which heavily relies on floating point operations for hundreds of objects. For a A8-7600 to not "bottleneck" what would still be considered a somewhat higher end discrete GPU goes to show that even with sub-par core performance AMD's offerings will likely spring new life shortly with the launch of Windows 10 (Microsoft plans on launching a few DX12 games on the same day). The day of buying Intel's offerings for a budget gaming rig (due to DirectX 9-11) are coming to an end. The tests conducted even show the APU having a much lower batch latency than the Core i3. It would of been nice for them to include CPU utilization numbers so we can get an idea of how far these cheap APU's can go in terms of pushing higher end cards like the GTX 980 to their full potential.

Single thread performance isn't irrelevant tho it is becoming more irrelevant due to the multi-threaded nature of DirectX 12. Their main DirectX 12 review tests I believe a higher end Intel chip (6 cores). Tho it will be interesting to see where the FX-8350 stands with DirectX 12 testing.

Bear in mind the batch latency could be driver related. Also, remember the Haswell I3 have 4 ALUs per core vs. 2(3?) in Steamroller, Intel's being the faster of the two. I think Benchmarks just need to be updated and not let developers get away with staying behind.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

What about the dual core pentiums ?

I really hope that they get a performance boost as well, having recommended my friend a G3220+R7 265 for a budget/light gaming/family computer. He's been happy with it for the past 3 months. I want that setup, combined with 8 gigs of RAM, to survive at least 2-3 years in MOST games with medium (eh, maybe low with some games) settings.

I am excited for DirectX 12. It looks like its going to benefit everyone if used right by developers :)

(and guys, we need some more love for people who can't afford a decent gaming system. for most people with these kinds of systems, it wouldn't be a deal breaker to play at reduced settings as long as they play without stutter/lag. So I would love to see games run great on all ranges of hardware, from a 400$ machine to a 2000$ beast.)

Link to comment
Share on other sites

Link to post
Share on other sites

What about the dual core pentiums ?

I really hope that they get a performance boost as well, having recommended my friend a G3220+R7 265 for a budget/light gaming/family computer. He's been happy with it for the past 3 months. I want that setup, combined with 8 gigs of RAM, to survive at least 2-3 years in MOST games with medium (eh, maybe low with some games) settings.

I am excited for DirectX 12. It looks like its going to benefit everyone if used right by developers :)

(and guys, we need some more love for people who can't afford a decent gaming system. for most people with these kinds of systems, it wouldn't be a deal breaker to play at reduced settings as long as they play without stutter/lag. So I would love to see games run great on all ranges of hardware, from a 400$ machine to a 2000$ beast.)

You'll hit a GPU wall long before the CPU becomes an issue. Basically what you see when using Mantle is what you can come to expect out of DirectX 12 titles.

Link to comment
Share on other sites

Link to post
Share on other sites

Nice,  does anyone know what the dx12 implications are for the cheaper dual core pentiums?

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

What about the dual core pentiums ?

I really hope that they get a performance boost as well, having recommended my friend a G3220+R7 265 for a budget/light gaming/family computer. He's been happy with it for the past 3 months. I want that setup, combined with 8 gigs of RAM, to survive at least 2-3 years in MOST games with medium (eh, maybe low with some games) settings.

I am excited for DirectX 12. It looks like its going to benefit everyone if used right by developers :)

(and guys, we need some more love for people who can't afford a decent gaming system. for most people with these kinds of systems, it wouldn't be a deal breaker to play at reduced settings as long as they play without stutter/lag. So I would love to see games run great on all ranges of hardware, from a 400$ machine to a 2000$ beast.)

snap :)

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

Nice,  does anyone know what the dx12 implications are for the cheaper dual core pentiums?

As stated above if you can dig up Mantle numbers in comparison to DirectX 11 using a dual core Pentium then that's probably your best case scenario.

 

 

With the API being now threaded the paradigm has shifted to four slow cores > two fast cores. Tho you could expect some extra performance with CPU cycles being cut down.

Link to comment
Share on other sites

Link to post
Share on other sites

The funny thing is that he hasn't got any mantle supported titles. :D He's new to "proper" PC gaming really, his old machine (the one he was using before I built this one for him) had a Pentium 4, something like a Geforce 5200 and 1 gig of RAM inside LOL

It also sounded like a jet engine. Also, I am thankful that the old machines PSU didn't blow up and set his house on fire :)

And yes, he seriously used that machine until Q4 2014. Just saying so you can guess his standarts. :)

And, thanks, good to know that nothing will bottleneck and it will be similiar to Mantle performance. I guess DX12 will be much more common, considering Mantle is AMD only and DX12 supports both AMD and Nvidia. (actually the GPU's support the API, but nevermind..)

And... Damn! Finally we may get builds like these to hit the performance level of consoles with the openness of the PC. I think a R7 265 is about the performance of the APU inside the PS4 and a G3220 should be close if not better than the APU's CPU performance. Also, 8GB DDR3 and 2GB GDDR5 means more memory than the consoles, HOORAY!

Of course, its just my theory about all the upcoming games being perfectly optimized for PC. If that happens, that'll mean other functionality other than gaming at a similiar price and cheaper games through steam, humble bundle etc.

HOORAY! OPTIMIZE YOUR GAMES YOU LAZY DEVELOPERS!

TL;DR I like DirectX 12 and I am sorry for posting walls of text.

Link to comment
Share on other sites

Link to post
Share on other sites

And, thanks, good to know that nothing will bottleneck and it will be similiar to Mantle performance. I guess DX12 will be much more common, considering Mantle is AMD only and DX12 supports both AMD and Nvidia. (actually the GPU's support the API, but nevermind..)

APU's are dead. The iGP won't hold it for long out and there's no point going with APU's from a budget perspective considering you can get i3's/G3258 for cheaper.

 

 

Bear in mind the batch latency could be driver related. Also, remember the Haswell I3 have 4 ALUs per core vs. 2(3?) in Steamroller, Intel's being the faster of the two. I think Benchmarks just need to be updated and not let developers get away with staying behind.

That's just within margin of error.

71579.png

Link to comment
Share on other sites

Link to post
Share on other sites

The funny thing is that he hasn't got any mantle supported titles.  :D He's new to "proper" PC gaming really, his old machine (the one he was using before I built this one for him) had a Pentium 4, something like a Geforce 5200 and 1 gig of RAM inside LOL

It also sounded like a jet engine. Also, I am thankful that the old machines PSU didn't blow up and set his house on fire  :)

And yes, he seriously used that machine until Q4 2014. Just saying so you can guess his standarts.  :)

And, thanks, good to know that nothing will bottleneck and it will be similiar to Mantle performance. I guess DX12 will be much more common, considering Mantle is AMD only and DX12 supports both AMD and Nvidia. (actually the GPU's support the API, but nevermind..)

And... Damn! Finally we may get builds like these to hit the performance level of consoles with the openness of the PC. I think a R7 265 is about the performance of the APU inside the PS4 and a G3220 should be close if not better than the APU's CPU performance. Also, 8GB DDR3 and 2GB GDDR5 means more memory than the consoles, HOORAY!

Of course, its just my theory about all the upcoming games being perfectly optimized for PC. If that happens, that'll mean other functionality other than gaming at a similiar price and cheaper games through steam, humble bundle etc.

HOORAY! OPTIMIZE YOUR GAMES YOU LAZY DEVELOPERS!

TL;DR I like DirectX 12 and I am sorry for posting walls of text.

The PS4 has 1152 shaders so a bit more GPU grunt behind it (depending on clock frequencies). The G3220 will be better in the desktop segment because of DirectX 11. Tho once you move to DirectX 12 the 860k will have a ring around the Pentium's neck (including a massively overclocked G3258). If you plan on building a rig like his, one thing I would recommend is possibly investing a little extra for the G3258 over the G3220. As the overclocking potential is worth the extra $10. That or spend the extra $10 and get a 860k for "future proofing" as these early tests are showing that the 860k can handle much stronger cards such as the GTX 770 in DirectX 12 titles. The G3258 chokes in modern games because of the lack of threads. So if you're building a machine to last 3+ years on a budget the 860k is the better option with a tiny bit of added cost. The R7 265 isn't exactly a massive card to begin with so even the 860k will push it to its performance wall in DirectX 11 titles.

 

APU's are dead. The iGP won't hold it for long out and there's no point going with APU's from a budget perspective considering you can get i3's/G3258 for cheaper.

He's planning on running a similar setup to his buddies (discrete R7 265 with a G3220).

Link to comment
Share on other sites

Link to post
Share on other sites

Until software catches up to newer instruction sets, and then AMD's current-gem chips will fall back again due to weaker ALU and FPU scheduling.

What instruction sets are you refering to?

 

 

Clarification: OLDER AMD chips (current APUs) will drag behind. I have high expectations of Keller, though I don't think he'll really eclipse everything up to Haswell in one shot.

 

Support of instructions is not the issue. *Everyone knows AMD's FPU especially is its weakest link at almost 11 cycles for a 32-bit multiply due to having to check if the other half of the module scheduled a 256-bit float instruction. Also, Haswell did come out with a couple not found in Kaveri (and I don't know if Carrizo has them or not). **I'm just saying once software starts to take advantage of instructions where AMD has a sore clock disadvantage, such as the MMX integer vector instructions, then you'll see the gap in performance widen, even if both progress forward to tolerable levels, unless of course all CPU-side bottlenecks are eliminated, leaving only CPU-GPU I/O and the GPU itself.

 

Therefore, I wouldn't say single-threaded performance is irrelevant just yet. Also, where are the FX 63xx and 83xx tests? It would be interesting to see how DX 12 really scales with cores now, and ***Vishera is only a small step back in IPC from Steamroller.

* The shared FPU have never been the bulldozer designs biggest issue. 

Remember, the CPU works ahead. So this latency can quickly become irrelevant, unless you are working with very tight execution.

 

**Adoption for newer instructions have been rather slow. Why would one use MMX over something like SSE? 

***Well, they did actually make quite the IPC improvement with steamroller, however had the effect of lower frequencies, which negated the IPC benefit.

 

 

Bear in mind the batch latency could be driver related. Also, remember the Haswell I3 have 4 ALUs per core vs. 2(3?) in Steamroller, Intel's being the faster of the two. I think Benchmarks just need to be updated and not let developers get away with staying behind.

Amount of ALUs can be irrelevant. It is how many the can actively use, by exploiting the super-scalar architecture.

Link to comment
Share on other sites

Link to post
Share on other sites

What I see in these threads:

 

Don't worry game developers, DX12 and Mantle will optimize your games for you.

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

OP, you get an up vote for bravery.

The Internet is the first thing that humanity has built that humanity doesn't understand, the largest experiment in anarchy that we have ever had.

Link to comment
Share on other sites

Link to post
Share on other sites

Wait a minute...hasn't this already been posted multiple times?

And already been on the WAN show?

Specs: 4790k | Asus Z-97 Pro Wifi | MX100 512GB SSD | NZXT H440 Plastidipped Black | Dark Rock 3 CPU Cooler | MSI 290x Lightning | EVGA 850 G2 | 3x Noctua Industrial NF-F12's

Bought a powermac G5, expect a mod log sometime in 2015

Corsair is overrated, and Anime is ruined by the people who watch it

Link to comment
Share on other sites

Link to post
Share on other sites

 

-snip- (I guess you shouldn`t quote huge posts in this forum)

 

I wasn`t talking about APU`s. APU`s for gaming doesn`t really make sense to me, where there are combinations like G3220+ R7 260X (that costs about 20$ more) that perform better (I guess, I can`t really dig for how many shaders R7 260X has now). But well, I agree with you in the end. :D  If you want to go AMD, you take a 860K, I guess.

-snip again-

Thanks for your detailed comment bud. :)  I have already built the machine. He`s using it right now. It has been built before the end of 2014, around 3-4 months ago.

 

And yeah, G3258 is worth the 10 bucks. Also, 860K is probably going to be better, as you said, because of multithreading. But when I built the PC for him, there wasn`t this new boy 860K (I guess it`s CPU Cores use a newer architecture than 760K) and the G3258 was not $10, but $20 more expensive. Also, its was my first ever build (I mean, the first build that I have built PHYSICALLY. I have put builds together in the past for people on tomshardware.com forums :)  ) and I wasn`t really sure if the motherboard I selected (Asus H81M-D  -great budget board by the way, at least for the price I bought it. Easy to work with- *expect they send you the  I/O Shield in a weird shape so you need to flat it out YOURSELF LOL ) would support Overclocking for the G3258, so I said "Damn it, lets play it safe. This guy has a ancient PC, so it won`t matter to him much anyways."

 

So yeah, I have some (many) excuses for selecting the G3220 instead of G3258 or the 860K. Enough with my friends ultra budget stories anyways :lol:  I think he`s gonna be happy for at least 2-3 years if game devs don`t go completely crazy and minimum specs skyrocket.

 

And, for myself, I would say I wouldn`t put together a system with a Pentium if possible (I have a Core Duo E7500 right now. Basically old stuff. Also yeah, I plan to build for myself too :)  You guessed right :D  ). I would aim at least a Haswell i3 (or one of these fancy AMD "QUAD" core Athlons, if they fare good enough in benchmarks). But, having used a 120$ CPU for the last 5 years (E7500 came out at Q1 2009, but I bought the system around the end of 2009, then upgraded GPU, HDD and stuff anyways. -If you want to see my ancient dragon [more like a baby dragon] go and check out my profile page) I want to get a "locked" i5. Right now, I am on the wait for Broadwell/Skylake.

 

The thing I heard about Broadwell, is that its the usual "efficiency"  tick. Performance won`t improve more than %5, as some people say. Skylake, on the other hand, will feature completely new chipsets and a new architecture, and possibly big performance gains. But my hunger for a new PC grows every day, so I might just go Broadwell i5/H97 if I get the money around that time. Skylake for desktop is coming around on  2016 as I heard.

 

TL;DR Thanks for your post. I have built the PC already. Hes happy with it. I think he will be happy for at least 2 more years considering he is coming from a 10 year old PC. His most demanding game right now is BF3 and he plays at 1440x900 anyways. And for myself, I want to get a i3 at least, more favorably, an locked i5.

(Sorry for humongous amount of text)

Link to comment
Share on other sites

Link to post
Share on other sites

What instruction sets are you refering to?

 

 

* The shared FPU have never been the bulldozer designs biggest issue. 

Remember, the CPU works ahead. So this latency can quickly become irrelevant, unless you are working with very tight execution.

 

**Adoption for newer instructions have been rather slow. Why would one use MMX over something like SSE? 

***Well, they did actually make quite the IPC improvement with steamroller, however had the effect of lower frequencies, which negated the IPC benefit.

 

 

Amount of ALUs can be irrelevant. It is how many the can actively use, by exploiting the super-scalar architecture.

 

mainly the vector and multi-precision instructions, which includes SSE.

 

The CPU can only work so far ahead and predict so many branches correctly. AMD's branch predictor is also quite inferior to Intel's at about 72% accuracy, and its pipelines aren't as deep. Those latencies only disappear in situations of long linearity (lack of branching for long periods).

 

It is among the top 5 design problems and is the reason AVX sucks on Bulldozer and its derivative architectures.

 

The adoption is literally an update of compilers and the use of an optimization flag. This slow adoption rate is senseless. It's purpose-dependent

 

Eh, not really. Even with normalized clocks the difference is ~12%, nothing astounding. Carrizo looks to be a larger clock reduction, but HDL supposedly is bringing them a 20% IPC increase. It'll be interesting to see if it's true.

 

They wouldn't just dump ALUs onto their cores if they couldn't use them actively, hence Intel having 4 to AMD's 2 (3?).

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

mainly the vector and multi-precision instructions, which includes SSE.

 

*The CPU can only work so far ahead and predict so many branches correctly. AMD's branch predictor is also quite inferior to Intel's at about 72% accuracy, and its pipelines aren't as deep. Those latencies only disappear in situations of long linearity (lack of branching for long periods).

 

**It is among the top 5 design problems and is the reason AVX sucks on Bulldozer and its derivative architectures.

 

***The adoption is literally an update of compilers and the use of an optimization flag. This slow adoption rate is senseless. It's purpose-dependent

 

****Eh, not really. Even with normalized clocks the difference is ~12%, nothing astounding. Carrizo looks to be a larger clock reduction, but HDL supposedly is bringing them a 20% IPC increase. It'll be interesting to see if it's true.

 

*****They wouldn't just dump ALUs onto their cores if they couldn't use them actively, hence Intel having 4 to AMD's 2 (3?).

* We were discussing a latency of 11 cycles. This latency can be atleast halfed by pre-fetch, and pre-execution. Only in complex branches, will AMD branch predictor fall behind. In most cases, it will predict the branch right, and continue prefetching instructions.

So the latency would only become apparent in complex branches, and only if you are running tight executions, it would have an negative effect.

** Design problems? No. There are far worse design problem which have a worse/equal effect on the performance with no actual gains.

for 90% it really doesn't matter.

*** Should have clarified. Most people dont see any benefit past SSE2.

 

**** You have to look at it at a per module, not per "core".

 

***** K10 featured 3 ALUs, the third was almost never used. Intel uses SMT, which is the only reason to adding the forth ALU to their architecture. AMD have 4 ALUs per module.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×