Jump to content

AMD Zen Architecture Deep Dive

DocSwag
1 minute ago, N3v3r3nding_N3wb said:

Yeah, I got it when it was only like $240 (It is up to $260 now on Newegg).  I have no idea.  I have heard that before, but I've never explored it.  I have precisely 0 experience with building computers, just a whole lot of research, so I probably am not going to try anything advanced like that.

understandable, im going to try it once i get my Fury, they are like $230 now off of amazon lol, but it might be a different version. either way its going to be a while untill i can afford it if im getting a new CPU and motherboard soon. from what iv seen in Swedish priceing i might not be able to get a 1700X and a decent B370 motherboard but hey if Ryzen sucks il be able to get a 1070 instead lol

I spent $2500 on building my PC and all i do with it is play no games atm & watch anime at 1080p(finally) watch YT and write essays...  nothing, it just sits there collecting dust...

Builds:

The Toaster Project! Northern Bee!

 

The original LAN PC build log! (Old, dead and replaced by The Toaster Project & 5.0)

Spoiler

"Here is some advice that might have gotten lost somewhere along the way in your life. 

 

#1. Treat others as you would like to be treated.

#2. It's best to keep your mouth shut; and appear to be stupid, rather than open it and remove all doubt.

#3. There is nothing "wrong" with being wrong. Learning from a mistake can be more valuable than not making one in the first place.

 

Follow these simple rules in life, and I promise you, things magically get easier. " - MageTank 31-10-2016

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Gosh that's heavy stuff. I've only got partway through but essentially AMDs micro-op implementation and cache hierachy now closely resembles the intel 'Core' lineup of chips.

 

Micro-ops are like instructions that the CPU can carry out without incurring wasted time 'fetching' full instructions from higher cache levels. Unlike Intel,  AMD seem to be keeping the integer and floating point parts of the CPU seperate possibly meaning more throughput in the case of mixed instructions.

 

Branch prediction means that the next instructions are predicted and fetched ahead of time speeding up operations if correctly predicted but incurring losses if it is wrong.

 

At a basic level AMD is going for a 'wide' core, this means betting on parallelism of instructions rather than relying on high frequency, despite this they seem to have fairly high clockspeeds even on their bigger chips. This is fairly surprising as efficiency may suffer.

 

The cache on the Zen cores is actually split, unlike intel this means that only 4 cores will 'see' what each other are doing at a time. In effect an 8 core cpu will have 2 sets of 4 cores sharing the same cache. Despite intel marketing it as 2MB per core this is a disadvantage compared to the intel architecture. At the same time they are putting more faster lower level cache closer to the cores meaning improved performance in single core workloads.

 

Just a general overview of what I skim read. May have got things wrong, please feel free to point them out.

 

Another great article from August '16 by the Phd holding expert Ian Cutress of Anandtech

 

 

Data Scientist - MSc in Advanced CS, B.Eng in Computer Engineering

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, randomhkkid said:

-snip-

Regarding the L3 cache, the article you linked from Anandtech has been updated and gives a good reason why the L3 cache isn't unified.

Quote

This would means 2 MB/core, but it also implies that there is no last-level unified cache in silicon across all cores, which Intel has. The reasons behind something like this is typically to do with modularity, and being able to scale a core design from low core counts to high core counts. But it would still leave a Zen core with the same L3 cache per core as Intel.

So pluses and minuses, like with most things.

 

I'll be interested to see specific tests to try and find the impact of this and by how much.

 

The other take away is that AMD have scaled back certain instruction sets compared to Intel that are less applicable to the desktop market to save cost and simplify the design. 

Link to comment
Share on other sites

Link to post
Share on other sites

On 2/19/2017 at 3:26 PM, DocSwag said:

I've found it quite interesting even if I didn't understand all of it. It seems though that zen won't be very suitable for HPC; perhaps they did this to save die space in order to maximize profits of consumer and server CPUs. As well, it seems zen ipc would be between Ivy bridge and haswell. I'm honestly not surprised, there have only been a limited amount of official benchmarks from AMD and since they were the ones showing them off I wouldn't be surprised if they were cherry picked. However, even if it's only ivy bridge IPC, zen still could very well shake up the whole CPU market.

It actually makes sense why AMD didn't go all in on HPC with their CPU. That is NOT their plan. They are working towards a HPC APU. No reason to have extremely wide SIMD paths in their CPU architecture, if those are meant to be done on the integrated GPU. Remember, a GPU is *basically* just an array of SIMD clusters. HSA exascale APU incoming!

 

Also, IPC is not a static value. It might have haswell IPC in some workloads, but lower/higher in others.

 

On 2/19/2017 at 7:58 PM, DocSwag said:

The thing to remember though is that they've said zen is bad for HPC where I think integer performance is important and they're comparing the integer ipc performance so perhaps the FPU performance or something else will make up for it. But then again I'm just guessing :D 

HPC is all about those wide SIMD paths.

 

On 2/19/2017 at 11:39 PM, randomhkkid said:

Gosh that's heavy stuff. I've only got partway through but essentially AMDs micro-op implementation and cache hierachy now closely resembles the intel 'Core' lineup of chips.

 

Micro-ops are like instructions that the CPU can carry out without incurring wasted time 'fetching' full instructions from higher cache levels. Unlike Intel,  AMD seem to be keeping the integer and floating point parts of the CPU seperate possibly meaning more throughput in the case of mixed instructions.

 

Branch prediction means that the next instructions are predicted and fetched ahead of time speeding up operations if correctly predicted but incurring losses if it is wrong.

 

At a basic level AMD is going for a 'wide' core, this means betting on parallelism of instructions rather than relying on high frequency, despite this they seem to have fairly high clockspeeds even on their bigger chips. This is fairly surprising as efficiency may suffer.

 

The cache on the Zen cores is actually split, unlike intel this means that only 4 cores will 'see' what each other are doing at a time. In effect an 8 core cpu will have 2 sets of 4 cores sharing the same cache. Despite intel marketing it as 2MB per core this is a disadvantage compared to the intel architecture. At the same time they are putting more faster lower level cache closer to the cores meaning improved performance in single core workloads.

 

Just a general overview of what I skim read. May have got things wrong, please feel free to point them out.

 

Another great article from August '16 by the Phd holding expert Ian Cutress of Anandtech

 

 

Micro-operations is the internal instruction set that the back-end CPU runs. The front-end (fetch and decode) gets the x86 instruction and translate it into a micro-operation that the back-end understands (since it doesn't understand x86).

 

Branch prediction is actually rather simple. It tries to predict what branch the instruction flow will go before solving the condition.

Like if you have:

if (x < y){
           // do something
}
else {
           // do something else
}

So before the processor actually solves if x < y, it will already fetch AND execute one of the branches. This is called speculative execution. If of course it end up mispredicting, it will have to discard it, and start the other branch. This is to avoid potential stall time.

 

With regards to AMDs CPU complex, it might also have some benefits. Lookup times can be severely reduced if it is within the complex, this really is going to show with 32 core monster. I wonder what penalties there are for cross complex lookups. They probably have some sort of OS support to avoid these kind of scenarios:

 

[Complex 0] (Core 0, 1, 2) is used and you launch a new application that runs 2 threads. The threads gets placed on [complex 0] (core 3) and [complex 1](core 0). I imagine there most be some penalty there.

 

Or you launch a application that will run 5+ threads, how exactly will they avoid potential cross-communication penalty?

Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Tomsen said:

It actually makes sense why AMD didn't go all in on HPC with their CPU. That is NOT their plan. They are working towards a HPC APU. No reason to have extremely wide SIMD paths in their CPU architecture, if those are meant to be done on the integrated GPU. Remember, a GPU is *basically* just an array of SIMD clusters. HSA exascale APU incoming!

 

Also, IPC is not a static value. It might have haswell IPC in some workloads, but lower/higher in others.

 

HPC is all about those wide SIMD paths.

 

Micro-operations is the internal instruction set that the back-end CPU runs. The front-end (fetch and decode) gets the x86 instruction and translate it into a micro-operation that the back-end understands (since it doesn't understand x86).

 

Branch prediction is actually rather simple. It tries to predict what branch the instruction flow will go before solving the condition.

Like if you have:


if (x < y){
           // do something
}
else {
           // do something else
}

So before the processor actually solves if x < y, it will already fetch AND execute one of the branches. This is called speculative execution. If of course it end up mispredicting, it will have to discard it, and start the other branch. This is to avoid potential stall time.

 

With regards to AMDs CPU complex, it might also have some benefits. Lookup times can be severely reduced if it is within the complex, this really is going to show with 32 core monster. I wonder what penalties there are for cross complex lookups. They probably have some sort of OS support to avoid these kind of scenarios:

 

[Complex 0] (Core 0, 1, 2) is used and you launch a new application that runs 2 threads. The threads gets placed on [complex 0] (core 3) and [complex 1](core 0). I imagine there most be some penalty there.

 

Or you launch a application that will run 5+ threads, how exactly will they avoid potential cross-communication penalty?

Yeah I know IPC isn't static. I was just referring in general on average in workloads what the IPC would be around :) 

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, DocSwag said:

Yeah I know IPC isn't static. I was just referring in general on average in workloads what the IPC would be around :) 

Oh ok, just as long as you don't claim IPC cant exceed 1, I'm happy ;) 

Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to comment
Share on other sites

Link to post
Share on other sites

On 19/02/2017 at 5:00 PM, MageTank said:

The biggest hurdle Zen has at the moment, is the extremely high overclocking Kabylake offers. People seem overly obsessed with the "5ghz" number. Kaby seems to be quite capable of achieving that. If Zen faces a 5-10% IPC deficit as you say, then another 10% core clock deficit on top of that might not bode too well for gamers. Though, you are right when it comes to potential cost. If Zen is cheap enough, that performance deficit won't really matter. Besides, CPU clock speed shouldn't be that big of a deal these days for gamers. Get a better monitor and make your GPU the bottleneck, lol. 

Well since the i3 7350K at 4.8 GHz is getting the same performance as a i5 7400 at times even with the significantly lower clocks I wouldn't worry too much about overclockings/clock if AMD can get us a good core count. After all most intel CPUs under 170 USD can't even overclock in the first place, so maybe AMD can nail it in particular when it comes to the sub 200 USD market.

If you want to reply back to me or someone else USE THE QUOTE BUTTON!                                                      
Pascal laptops guide

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×