AMD Zen Architecture Deep Dive

Bananasplit_00 · February 19, 2017

1 minute ago, N3v3r3nding_N3wb said:

Yeah, I got it when it was only like $240 (It is up to $260 now on Newegg). I have no idea. I have heard that before, but I've never explored it. I have precisely 0 experience with building computers, just a whole lot of research, so I probably am not going to try anything advanced like that.

understandable, im going to try it once i get my Fury, they are like $230 now off of amazon lol, but it might be a different version. either way its going to be a while untill i can afford it if im getting a new CPU and motherboard soon. from what iv seen in Swedish priceing i might not be able to get a 1700X and a decent B370 motherboard but hey if Ryzen sucks il be able to get a 1070 instead lol

randomhkkid · February 19, 2017

Gosh that's heavy stuff. I've only got partway through but essentially AMDs micro-op implementation and cache hierachy now closely resembles the intel 'Core' lineup of chips.

Micro-ops are like instructions that the CPU can carry out without incurring wasted time 'fetching' full instructions from higher cache levels. Unlike Intel, AMD seem to be keeping the integer and floating point parts of the CPU seperate possibly meaning more throughput in the case of mixed instructions.

Branch prediction means that the next instructions are predicted and fetched ahead of time speeding up operations if correctly predicted but incurring losses if it is wrong.

At a basic level AMD is going for a 'wide' core, this means betting on parallelism of instructions rather than relying on high frequency, despite this they seem to have fairly high clockspeeds even on their bigger chips. This is fairly surprising as efficiency may suffer.

The cache on the Zen cores is actually split, unlike intel this means that only 4 cores will 'see' what each other are doing at a time. In effect an 8 core cpu will have 2 sets of 4 cores sharing the same cache. Despite intel marketing it as 2MB per core this is a disadvantage compared to the intel architecture. At the same time they are putting more faster lower level cache closer to the cores meaning improved performance in single core workloads.

Just a general overview of what I skim read. May have got things wrong, please feel free to point them out.

Another great article from August '16 by the Phd holding expert Ian Cutress of Anandtech

leadeater · February 20, 2017

1 hour ago, randomhkkid said:

-snip-

Regarding the L3 cache, the article you linked from Anandtech has been updated and gives a good reason why the L3 cache isn't unified.

Quote

This would means 2 MB/core, but it also implies that there is no last-level unified cache in silicon across all cores, which Intel has. The reasons behind something like this is typically to do with modularity, and being able to scale a core design from low core counts to high core counts. But it would still leave a Zen core with the same L3 cache per core as Intel.

So pluses and minuses, like with most things.

I'll be interested to see specific tests to try and find the impact of this and by how much.

The other take away is that AMD have scaled back certain instruction sets compared to Intel that are less applicable to the desktop market to save cost and simplify the design.

Tomsen · February 21, 2017

On 2/19/2017 at 3:26 PM, DocSwag said:

I've found it quite interesting even if I didn't understand all of it. It seems though that zen won't be very suitable for HPC; perhaps they did this to save die space in order to maximize profits of consumer and server CPUs. As well, it seems zen ipc would be between Ivy bridge and haswell. I'm honestly not surprised, there have only been a limited amount of official benchmarks from AMD and since they were the ones showing them off I wouldn't be surprised if they were cherry picked. However, even if it's only ivy bridge IPC, zen still could very well shake up the whole CPU market.

It actually makes sense why AMD didn't go all in on HPC with their CPU. That is NOT their plan. They are working towards a HPC APU. No reason to have extremely wide SIMD paths in their CPU architecture, if those are meant to be done on the integrated GPU. Remember, a GPU is *basically* just an array of SIMD clusters. HSA exascale APU incoming!

Also, IPC is not a static value. It might have haswell IPC in some workloads, but lower/higher in others.

On 2/19/2017 at 7:58 PM, DocSwag said:

The thing to remember though is that they've said zen is bad for HPC where I think integer performance is important and they're comparing the integer ipc performance so perhaps the FPU performance or something else will make up for it. But then again I'm just guessing

HPC is all about those wide SIMD paths.

On 2/19/2017 at 11:39 PM, randomhkkid said:

Gosh that's heavy stuff. I've only got partway through but essentially AMDs micro-op implementation and cache hierachy now closely resembles the intel 'Core' lineup of chips.

Micro-ops are like instructions that the CPU can carry out without incurring wasted time 'fetching' full instructions from higher cache levels. Unlike Intel, AMD seem to be keeping the integer and floating point parts of the CPU seperate possibly meaning more throughput in the case of mixed instructions.

Branch prediction means that the next instructions are predicted and fetched ahead of time speeding up operations if correctly predicted but incurring losses if it is wrong.

At a basic level AMD is going for a 'wide' core, this means betting on parallelism of instructions rather than relying on high frequency, despite this they seem to have fairly high clockspeeds even on their bigger chips. This is fairly surprising as efficiency may suffer.

The cache on the Zen cores is actually split, unlike intel this means that only 4 cores will 'see' what each other are doing at a time. In effect an 8 core cpu will have 2 sets of 4 cores sharing the same cache. Despite intel marketing it as 2MB per core this is a disadvantage compared to the intel architecture. At the same time they are putting more faster lower level cache closer to the cores meaning improved performance in single core workloads.

Just a general overview of what I skim read. May have got things wrong, please feel free to point them out.

Another great article from August '16 by the Phd holding expert Ian Cutress of Anandtech

Micro-operations is the internal instruction set that the back-end CPU runs. The front-end (fetch and decode) gets the x86 instruction and translate it into a micro-operation that the back-end understands (since it doesn't understand x86).

Branch prediction is actually rather simple. It tries to predict what branch the instruction flow will go before solving the condition.

Like if you have:

if (x < y){
           // do something
}
else {
           // do something else
}

So before the processor actually solves if x < y, it will already fetch AND execute one of the branches. This is called speculative execution. If of course it end up mispredicting, it will have to discard it, and start the other branch. This is to avoid potential stall time.

With regards to AMDs CPU complex, it might also have some benefits. Lookup times can be severely reduced if it is within the complex, this really is going to show with 32 core monster. I wonder what penalties there are for cross complex lookups. They probably have some sort of OS support to avoid these kind of scenarios:

[Complex 0] (Core 0, 1, 2) is used and you launch a new application that runs 2 threads. The threads gets placed on [complex 0] (core 3) and [complex 1](core 0). I imagine there most be some penalty there.

Or you launch a application that will run 5+ threads, how exactly will they avoid potential cross-communication penalty?

DocSwag · February 21, 2017

1 hour ago, Tomsen said:
It actually makes sense why AMD didn't go all in on HPC with their CPU. That is NOT their plan. They are working towards a HPC APU. No reason to have extremely wide SIMD paths in their CPU architecture, if those are meant to be done on the integrated GPU. Remember, a GPU is *basically* just an array of SIMD clusters. HSA exascale APU incoming!

Also, IPC is not a static value. It might have haswell IPC in some workloads, but lower/higher in others.

HPC is all about those wide SIMD paths.

Micro-operations is the internal instruction set that the back-end CPU runs. The front-end (fetch and decode) gets the x86 instruction and translate it into a micro-operation that the back-end understands (since it doesn't understand x86).

Branch prediction is actually rather simple. It tries to predict what branch the instruction flow will go before solving the condition.

Like if you have:
if (x < y){
           // do something
}
else {
           // do something else
}
So before the processor actually solves if x < y, it will already fetch AND execute one of the branches. This is called speculative execution. If of course it end up mispredicting, it will have to discard it, and start the other branch. This is to avoid potential stall time.

With regards to AMDs CPU complex, it might also have some benefits. Lookup times can be severely reduced if it is within the complex, this really is going to show with 32 core monster. I wonder what penalties there are for cross complex lookups. They probably have some sort of OS support to avoid these kind of scenarios:

[Complex 0] (Core 0, 1, 2) is used and you launch a new application that runs 2 threads. The threads gets placed on [complex 0] (core 3) and [complex 1](core 0). I imagine there most be some penalty there.

Or you launch a application that will run 5+ threads, how exactly will they avoid potential cross-communication penalty?

Yeah I know IPC isn't static. I was just referring in general on average in workloads what the IPC would be around

Tomsen · February 22, 2017

5 hours ago, DocSwag said:

Yeah I know IPC isn't static. I was just referring in general on average in workloads what the IPC would be around

Oh ok, just as long as you don't claim IPC cant exceed 1, I'm happy

Castdeath97 · February 22, 2017

On 19/02/2017 at 5:00 PM, MageTank said:

The biggest hurdle Zen has at the moment, is the extremely high overclocking Kabylake offers. People seem overly obsessed with the "5ghz" number. Kaby seems to be quite capable of achieving that. If Zen faces a 5-10% IPC deficit as you say, then another 10% core clock deficit on top of that might not bode too well for gamers. Though, you are right when it comes to potential cost. If Zen is cheap enough, that performance deficit won't really matter. Besides, CPU clock speed shouldn't be that big of a deal these days for gamers. Get a better monitor and make your GPU the bottleneck, lol.

Well since the i3 7350K at 4.8 GHz is getting the same performance as a i5 7400 at times even with the significantly lower clocks I wouldn't worry too much about overclockings/clock if AMD can get us a good core count. After all most intel CPUs under 170 USD can't even overclock in the first place, so maybe AMD can nail it in particular when it comes to the sub 200 USD market.

Sign In

AMD Zen Architecture Deep Dive

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I Was Never Meant to Have This Prototype CPU

Latest From Tech Quickie:

Why Do Speakers Hiss?

Latest From TechLinked:

Yep, it’s an App

Latest From GameLinked:

Bethesda Knows It’s Broken

Latest From ShortCircuit:

How is this even handheld?! - OneXPlayer X1

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!