Zen Engineering Samples, Specs Spotted

patrickjp93 · July 20, 2016

http://www.guru3d.com/news-story/ams-zen-engineering-sample-specs-leaked.html

To start, keep in mind these are A0 engineering samples. Cache sizes, clock speed, and TDP can all change between now and retail launch.

Sample CPUs with 4, 8, 24, and 32 cores have been spotted making the rounds. Of these, only the 4 and 8C variants were made for the AM4 socket.

The 4-core variant is limited to 8MB of L3 cache and holds a 65W TDP at clock speeds of 2.8GHz base to 3.2GHz boost.

The 8-core has 16MB of L3 cache and holds a 95W TDP with the same clock speeds.

Further, AMD has chosen to double the size of its L2 cache over Intel's offerings, at 512KB. Knowing that cache timings get larger in proportion to the cache size, this is intriguing. We'll have to come up with some cache-thrashing benchmarks to see whose solution is better.

The most interesting part of the article imho states that the 4 and 8-core chips idle at 550MHz and consume just 2.5 and 5W respectively. This is in stark contrast to the FX series where, at idle, an 8350 still consumes almost 30W of power.

The 24 and 32-core chips have 160 and 180W TDPs at 2.75 and 2.9GHz boost, with a 2x32MB L3 cache configuration. Idle speed for these chips is even lower at 400MHz, not that it matters since big iron server chips should NEVER be idling. Remember this is a 2-die solution, so there will be a price to pay in terms of cache coherency, but it's still quite impressive. It's not stated if the boost clock is for all cores or just one, though based on the TDP scaling of the smaller counterparts, I'm going to guess it's single-core only.

Opinion:

I suspect AMD will be fudging its TDP just a bit or have lower cache clock speeds to keep its TDPs under Intel's, but so far this is right around what I expected for clock speeds. Had AMD been able to pull off the speeds Bulldozer and Vishera enjoyed AND stay inside a healthy 95W while being performance-competitive core for core with Intel, I must admit I'd have to give AMD very high praise.

Remember, these are A0 samples, so there's room for clocks to tick upward just a bit, which is why I say clock speeds will likely come right in line with Haswell-E.

Edit: You guys are getting slow or lazy. In 24 hours I'm the one posting all the Intel and AMD news even though the articles were up hours before I found them.

AlTech · July 20, 2016

I'm confused. @patrickjp93 Why does the quad core have 2MB L3 cache when the Eight core has 8MB?

Shouldn't the quad core have 6MB L3 cache or 8MB like Intel's quad cores do?

alexyy · July 20, 2016

Not too concerned with TDP myself but it seems AMD are this time around, wonder if we'll see any 9590 madness again. lets hope zen truly live upto the hype.

Master Disaster · July 20, 2016

AMD, please don't fuck this up...

AresKrieger · July 20, 2016

2 minutes ago, CUDA_Cores said:

32 cores! I hope they aren't 32 sh*t cores like AMD pulled to bulldozer.

Doesn't really matter since it's a server chip, not AM4

1 minute ago, AluminiumTech said:

I'm confused. Why does the quad core have 2MB L3 cache when the Eight core has 8MB?

Shouldn't the quad core have 4MB L3 cache or 6MB like Intel's quad cores do?

If they want optimal work flow then yes (unless the chip is weaker than we think)

Quote

I suspect AMD will be fudging its TDP just a bit or have lower cache clock speeds to keep its TDPs under Intel's,

Sure seems that way at least based on this info

patrickjp93 · July 20, 2016

1 minute ago, AluminiumTech said:

I'm confused. @patrickjp93 Why does the quad core have 2MB L3 cache when the Eight core has 8MB?

Shouldn't the quad core have 4MB L3 cache or 6MB like Intel's quad cores do?

Partly defective sample is my guess.

Intel's quads have 8. On I5s 2MB is disabled though. It's the same die as a mainstream I7.

AlTech · July 20, 2016

Just now, patrickjp93 said:

Partly defective sample is my guess.

Intel's quads have 8. On I5s 2MB is disabled though. It's the same die as a mainstream I7.

Yeah I realized my mistake. But then does that mean they might ship the quad core with 4MB L3 cache?

Also it's a shame AMD didn't put any high performance 100GB/s eDRAM L4 cache on there .

Master Disaster · July 20, 2016

Could it be that they're doing 2 tiers of consumer chips, one with low low cache (like a HTPC variant) and one with big cache? Perhaps the chip mentioned in this article is merely the low cache model?

Just spitballing.

patrickjp93 · July 20, 2016

Just now, AluminiumTech said:

Yeah I realized my mistake. But then does that mean they might ship the quad core with 4MB L3 cache?

Also it's a shame AMD didn't put any high performance 100GB/s eDRAM L4 cache on there .

4MB is too little for a quad core imho. I think AMD is just shipping out what samples it can.

eDRAM is expensive to make and is more useful for an iGPU. Yes, it can help some CPU tasks, but most of what consumers do is either so big you'd need a Gig of cache before the miss ratio would drop (browsers) or is so small it fits in cache anyway (word processing).

patrickjp93 · July 20, 2016

Just now, Master Disaster said:

Could it be that they're doing 2 tiers of consumer chips, one with low low cache (like a HTPC variant) and one with big cache? Perhaps the chip mentioned in this article is merely the low cache model?

Just spitballing.

I hadn't considered the HTPC angle, but APUs are still a better fit for that, or AMD I suppose could provide a custom board of their own with a tiny iGPU embedded on the board. But I think that would blow up the price of the one-off chip.

AlTech · July 20, 2016

Just now, patrickjp93 said:

4MB is too little for a quad core imho. I think AMD is just shipping out what samples it can.

eDRAM is expensive to make and is more useful for an iGPU. Yes, it can help some CPU tasks, but most of what consumers do is either so big you'd need a Gig of cache before the miss ratio would drop (browsers) or is so small it fits in cache anyway (word processing).

Hopefully we see 8MB on the quad core and 15/16MB for the eight core.

And i'd like a 3GHz base clock.

Yeah. I know eDRAM is expensive. But Intel is giving it out like candy in their mobile i5s and i7s.

patrickjp93 · July 20, 2016

Just now, AluminiumTech said:

Hopefully we see 8MB on the quad core and 15/16MB for the eight core.

Yeah. I know eDRAM is expensive. But Intel is giving it out like candy in their mobile i5s and i7s.

Those CPUs sell for $600 or more. The quad-core I7s start at $800. I wouldn't call it giving out eDRAM like candy.

The Benjamins · July 20, 2016

Do you think they may throw HBM on a high end APU?

patrickjp93 · July 20, 2016

Just now, The Benjamins said:

Do you think they may throw HBM on a high end APU?

I only know about the HPC APU using it for now, but that's also a mid to late 2017 product at the earliest.

patrickjp93 · July 20, 2016

2 minutes ago, Tedny said:

i7-6700k have 256 kb L2 cache. Interesing, will Window Zen 512kb L2 cache use ?!

7 minutes ago, Master Disaster said:

Could it be that they're doing 2 tiers of consumer chips, one with low low cache (like a HTPC variant) and one with big cache? Perhaps the chip mentioned in this article is merely the low cache model?

Just spitballing.

9 minutes ago, AluminiumTech said:

Yeah I realized my mistake. But then does that mean they might ship the quad core with 4MB L3 cache?

Also it's a shame AMD didn't put any high performance 100GB/s eDRAM L4 cache on there .

I was reading back over it to double check, and it seems I may have made a mistake in saying 2MB L3. 1) It's early in the morning. 2) The author of the article could use a revision or two himself to clarify.

It seems the 8MB cache is for the quad-core. When I see L2 cache, I never see it listed as a collective unit in MB. I always see it in kilobytes since cores don't share L2 and it doesn't make sense. The article says the 8-core will get double this, or 16MB of L3 cache.

AlTech · July 20, 2016

Just now, patrickjp93 said:

I was reading back over it to double check, and it seems I may have made a mistake in saying 2MB L3. 1) It's early in the morning. 2) The author of the article could use a revision or two himself to clarify.

It seems the 8MB cache is for the quad-core. When I see L2 cache, I never see it listed as a collective unit in MB. I always see it in kilobytes since cores don't share L2 and it doesn't make sense. The article says the 8-core will get double this, or 16MB of L3 cache.

Oh cool. So they will have 8MB for the quad core and 16MB for the eight core.

And the 2MB and 8MB was for L2 cache? That seems like way too much.

The Benjamins · July 20, 2016

1 minute ago, patrickjp93 said:

I only know about the HPC APU using it for now, but that's also a mid to late 2017 product at the earliest.

I think it would be awesome if they could get a RX 480/470 with 1-4GB HBM with a 4/8 core zen CPU. would make for a awesome HTPC/gaming APU.

patrickjp93 · July 20, 2016

Just now, AluminiumTech said:

Oh cool. So they will have 8MB for the quad core and 16MB for the eight core.

And the 2MB and 8MB was for L2 cache? That seems like way too much.

Well, it's 2MB across 4 cores, or 512KB for each core, which is double what Intel uses. And since cache timings have to loosen as cache size increases (universal truth btw), I'm wondering if AMD figured having more data closer at the 2nd level was more worthwhile.

Dabombinable · July 20, 2016

2 minutes ago, patrickjp93 said:

Well, it's 2MB across 4 cores, or 512KB for each core, which is double what Intel uses. And since cache timings have to loosen as cache size increases (universal truth btw), I'm wondering if AMD figured having more data closer at the 2nd level was more worthwhile.

Well, AMD did use 512KB of L2 cache with K10-back then however that was it, 512KB L2 cache per core with no L3 cache. Having 2MB L3 cache per core should help as well since its not shared (with my 4790K for example, it has 2MB of L3 cache per core).

patrickjp93 · July 20, 2016

Just now, Dabombinable said:

Well, AMD did use 512KB of L2 cache with K10-back then however that was it, 512KB L2 cache per core with no L3 cache. Having 2MB L3 cache per core should help as well since its not shared (with my 4790K for example, it has 2MB of L3 cache per core).

L3 cache is shared. It only becomes a problem if you have a noisy neighbor.

Dabombinable · July 20, 2016

Just now, patrickjp93 said:

L3 cache is shared. It only becomes a problem if you have a noisy neighbor.

Oh....just like it was on their CMT architectures? Well....I could see some issues with all 8 and 16 threads under load. Its one of those trade off it seems between multi threaded performance and single threaded performance. They should know by now with the way CMT is that if it starts getting shared, the performance of both cores and all 4 threads will be affected. I was hoping that it wouldn't be like the cache in Wolfdale (L2 split between cores).

patrickjp93 · July 20, 2016

Just now, Dabombinable said:

Oh....just like it was on their CMT architectures? Well....I could see some issues with all 8 and 16 threads under load. Its one of those trade off it seems between multi threaded performance and single threaded performance. They should know by now with the way CMT is that if it starts getting shared, the performance of both cores and all 4 threads will be affected. I was hoping that it wouldn't be like the cache in Wolfdale (L2 split between cores).

It's shared on Intel's architectures too you know... There are benefits and drawbacks to having solitary and unified caches. Everyone from ARM to IBM uses a mix of them.

Also, you're getting tangled between two issues: exclusivity and false sharing. It's not yet clear if AMD implemented an exclusive cache hierarchy where data is not maintained in all 3 levels of cache (if the data is in L1 of 1 core, it's not in the L2 of that same core, and is not in shared L3) or an inclusive one where it is. Exclusive cache hierarchies requiring snooping algorithms to go back through the other caches for other cores, and that's one enormous performance killer for Bulldozer and the entire Construction Core family. An Inclusive cache means, as long as the cache line doesn't have any data modified, sharing it is actually more efficient, because everyone can read it and pull it up from L3 to higher cache levels and not have to worry.

False sharing is when a cache line is given to at least 2 different cores, and 2 cores modify some data on that cache line (a line is usually 64 bytes). In order to maintain cache coherency, once the first change is made, the other cache line copies are labeled dirty and have to be modified before the second change can take place. This stalls the pipeline of the second core. That is also a performance killer.

Having a shared L3 cache is not a bad thing at all. The programmer would have to diligently avoid the false sharing problem even without a shared L3 cache, so there's no real downside.

Dabombinable · July 20, 2016

10 minutes ago, patrickjp93 said:

It's shared on Intel's architectures too you know... There are benefits and drawbacks to having solitary and unified caches. Everyone from ARM to IBM uses a mix of them.

Also, you're getting tangled between two issues: exclusivity and false sharing. It's not yet clear if AMD implemented an exclusive cache hierarchy where data is not maintained in all 3 levels of cache (if the data is in L1 of 1 core, it's not in the L2 of that same core, and is not in shared L3) or an inclusive one where it is. Exclusive cache hierarchies requiring snooping algorithms to go back through the other caches for other cores, and that's one enormous performance killer for Bulldozer and the entire Construction Core family. An Inclusive cache means, as long as the cache line doesn't have any data modified, sharing it is actually more efficient, because everyone can read it and pull it up from L3 to higher cache levels and not have to worry.

False sharing is when a cache line is given to at least 2 different cores, and 2 cores modify some data on that cache line (a line is usually 64 bytes). In order to maintain cache coherency, once the first change is made, the other cache line copies are labeled dirty and have to be modified before the second change can take place. This stalls the pipeline of the second core. That is also a performance killer.

Having a shared L3 cache is not a bad thing at all. The programmer would have to diligently avoid the false sharing problem even without a shared L3 cache, so there's no real downside.

One of the problems is that some programmers-specifically in the AAA games industry, are far from diligent.

patrickjp93 · July 20, 2016

Just now, Dabombinable said:

One of the problems is that some programmers-specifically in the AAA games industry, are far from diligent.

Oh I know. I had a discussion on performance tuning at Epic, and two of the more senior developers said to me that tuning performance for hyper threading was so hard they literally instilled a ban on it because it could cause performance loss in other areas. I showed them a few tricks with OpenMP to demonstrate setting core affinity and putting mixed workloads on each pair of logical cores and saw a 20% boost using good old OpenGL and CPU-based cloth physics. To put it shortly, what I could do in 600 lines of code left them stunned.

At least the engine developers like Mike Acton have finally started paying attention to cache lines, though just from looking at Unreal, good Lord is there are long way to go...

Dabombinable · July 20, 2016

Just now, patrickjp93 said:

Oh I know. I had a discussion on performance tuning at Epic, and two of the more senior developers said to me that tuning performance for hyper threading was so hard they literally instilled a ban on it because it could cause performance loss in other areas. I showed them a few tricks with OpenMP to demonstrate setting core affinity and putting mixed workloads on each pair of logical cores and saw a 20% boost using good old OpenGL and CPU-based cloth physics. To put it shortly, what I could do in 600 lines of code left them stunned.

At least the engine developers like Mike Acton have finally started paying attention to cache lines, though just from looking at Unreal, good Lord is there are long way to go...

So its more than likely that the programmers in all reality need re training? Because I take it that you came out of University/College a lot more recently.

Sign In

Zen Engineering Samples, Specs Spotted

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites