Jump to content

AMD Zen 4 Ryzen 7000 Series Update: 8–10% IPC Uplift, 25% More Perf-Per-Watt, More than 35% Overall Performance Gain, 5.5GHz & V-Cache Chips Coming

Summary

As part of today’s AMD’s 2022 Financial Analyst Day, the company is offering a short update on their forthcoming Zen 4 CPU architecture. Addressing some post-Computex questions around IPC expectations, AMD is revealing that they expect Zen 4 to offer an 8-10% IPC uplift over Zen 3. AMD is projecting a >25% increase in performance-per-watt with Zen 4 over Zen 3 (based on desktop 16C chips running CineBench). Meanwhile the overall performance improvement stands at >35%, no doubt taking advantage of both the greater performance of the architecture per-thread, and AMD’s previously disclosed higher TDPs. Lastly, Zen 4 V-Cache chips are expected to come in the future too.

 

hK95dNmCUPBwm49RHrEM8o.thumb.jpg.dffa1189ed3f2092df03cea8054786bd.jpg

 

z8xzPAEsGVJipyAruZEyBo.thumb.jpg.c996640f9964d412de29a144e5ecfa10.jpg

 

1629482951_2022-06-0913_12_38.thumb.jpg.40921246ab3a876bb26aabbcb9dee5c7.jpg

 

5_5GHz.thumb.jpg.f821d781dbe173b93db6660a8b87402b.jpg

 

Quotes

Quote

During today’s Financial Analyst Day 2022, AMD clarified that it is targeting an 8 to 10% increase in IPC for the Zen 4 processors and that the company is targeting larger gains in single-threaded performance in some types of workloads.

 

AMD also clarified that Zen 4 processors would have >25% performance-per-watt and >35% overall performance improvements. The company says the Zen 4 chips will also have significant clock frequency improvements...

 

AMD’s Zen 4 disclosures today help clarify the company’s performance targets after the initial reveal. AMD’s clarification that the IPC gain will range from 8 to 10%, dependent upon workload, is a bit more encouraging.

 

Whip in the company’s claims of significant frequency improvements for the 5nm Zen 4 processors, and we should see much larger gains than the baseline 15% gain in single-threaded performance...

 

AMD shared a slide showing a greater than 25% performance-per-watt and greater than 35% gain in overall performance in a multi-threaded Cinebench benchmark.

 

The Zen 4 processors will also support up to 25% more memory bandwidth per core, a marked increase that comes from both the step up to DDR5 and likely from widened pathways in the chip to deliver additional bandwidth to the cores.

 

Finally, AMD is confirming that there will be V-Cache equipped Zen 4 SKUs within their processor lineup. No specific SKUs are being announced today, but AMD is reiterating that V-Cache was not just a one-off experiment for the company, and that they will be employing the die stacked L3 cache on some Zen 4 chips as well.

 

My thoughts

I think it's great for AMD to clear the air on the post Computex qualms. While 8-10% increase in IPC sounds underwhelming, greater than 35% overall performance improvement does not, IMO. Combine that with future V-Cache chips and Zen 4 will be a highly competitive product against Intel.  

 

Sources

https://www.anandtech.com/show/17441/amd-zen-4-update-8-to-10-ipc-uplift-25-more-perfperwatt-vcache-chips-coming

https://www.tomshardware.com/news/amd-zen-4-ryzen-7000-has-810-ipc-uplift-more-than-35-overall-performance-gain

https://www.pcgamer.com/amd-provides-new-zen-4-details-and-touts-a-greater-than-25-performance-per-watt-gain/

Link to comment
Share on other sites

Link to post
Share on other sites

Question. 

middle picture at the bottom, reference to Cinebench "NT". 

 

What is the NT?

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Guest 5150 said:

Question. 

middle picture at the bottom, reference to Cinebench "NT"

 

What is the NT?

 

From what I can gather doing research, NT (also nT) is another term for the Multi-Core test score (while 1T denotes Single Core score).

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

So actual AVX-512 is part of Zen4, that's actually really big news.

Why is this big news? Is AVX-512 that high in demand? I don't think it was really used much for desktop applications?? (honestly asking, not sarcasm.)

 

13 minutes ago, BiG StroOnZ said:

 

From what I can gather doing research, NT (also nT) is another term for the Multi-Core test score (while 1T denotes Single Core score).

Well the T I would assume maybe "threads" or threading". Just threw me off, thought there was some new CB that was released I didn't know about yet 🤣

Link to comment
Share on other sites

Link to post
Share on other sites

45 minutes ago, Guest 5150 said:

Why is this big news? Is AVX-512 that high in demand? I don't think it was really used much for desktop applications?? (honestly asking, not sarcasm.)

Right now not all that useful on the desktop but since the architecture is common across Ryzen and EPYC having AVX-512 in Zen4 is a big deal for EPYC.

 

With Intel backing out a little on the consumer desktop side it's not so important or a big deal here as Intel still drives the industry, and software development targets. So if Intel doesn't have AVX-512 then few are going to want to build for it. However AVX-512 in server/HPC has been around for a while now and it's kind of a sore point that EPYC doesn't have it.

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, Guest 5150 said:

Well the T I would assume maybe "threads" or threading". Just threw me off, thought there was some new CB that was released I didn't know about yet 🤣

 

Yeah, it took a while of digging to find out what it meant. It wasn't until I started looking at how they were wording it with their scores beside "nT" and "1T" that it started to make sense. Don't worry, it took excessive digging to find out the answer, my first instinct was also that it was a new CB that was released that I didn't know about. 😂

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, leadeater said:

Right now not all that useful on the desktop but since the architecture is common across Ryzen and EPYC having AVX-512 in Zen4 is a big deal for EPYC.

 

With Intel backing out a little on the consumer desktop side it's not so important or a big deal here as Intel still drivers the industry, and software development targets. So if Intel doesn't have AVX-512 then few are doing to want to build for it. However AVX-512 is server/HPC has been around for a while now and it's kind of a sore point that EPYC doesn't have it.

Ah I see. It's the EPYC challenge for a workstation to have all the proper codecs. (see what I did there?!) Kidding aside, I see the gripe. There are a lot of games that use AVX, I'm also not aware if AVX 2, 512 and AVX are backwards compatible in some way. Can you create a gaming engine on 512 but just use AVX during end game play (if that makes sense?)

 

7 minutes ago, BiG StroOnZ said:

 

Yeah, it took a while of digging to find out what it meant. It wasn't until I started looking at how they were wording it with their scores beside "nT" and "1T" that it started to make sense. Don't worry, it took excessive digging to find out the answer, my first instinct was also that it was a new CB that was released that I didn't know about. 😂

Ah gotcha. I love the curve balls. Now understanding "why" is my next big challenge, but I'd need a drink to go down that road honestly lol. Why can't they just say "Used Cinebench R23 multithread". Do they save money if someone uses abbreviations nobody gets???? lol. 

Link to comment
Share on other sites

Link to post
Share on other sites

This is a bit closer to what I had expected. I saw some places reporting 0 improvement in IPC, I saw others showing 20% and I expected the reality to be somewhere in the middle, especially given that you could pick up a few percent from faster RAM (I'm expecting FAST DDR5 to be a lot better than the stuff we saw around a year ago).

Overall this looks like a nice jump. Comparing against Zen2 this is about 30% faster in terms of IPC and another 30% faster in terms of clock speed - so 70% more performance in the span of around 3 years and overall around 2x as performant per core vs the original Zen 5 years ago. This is before considering the extra cache.

I still remember 2011-2016 where we basically got a 25% uplift from Intel and nothing more. With AMD (also to be fair RaptorLake is competitive too) we're at ~2 the per core perofrmance and overall around 4x the MT performance on consumer platforms.
 

2 hours ago, Guest 5150 said:

middle picture at the bottom, reference to Cinebench "NT". 

What is the NT?

N is a count, T stands for threads.

So arbitrary number of threads.

If you're thinking about it in math terms you often have loops or iterators that go from 1 to N or 0 to N with N often just being the last number in a sequence.

 

 

3900x | 32GB RAM | RTX 2080

1.5TB Optane P4800X | 2TB Micron 1100 SSD | 16TB NAS w/ 10Gbe
QN90A | Polk R200, ELAC OW4.2, PB12-NSD, SB1000, HD800
 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, cmndr said:

N is a count, T stands for threads.

So arbitrary number of threads.

If you're thinking about it in math terms you often have loops or iterators that go from 1 to N or 0 to N with N often just being the last number in a sequence.

Who'd ever known?! 

 

My NT rig only has a single GPU, but it games really good!

 

(seriously thanks for that. Spot on explaination)

Link to comment
Share on other sites

Link to post
Share on other sites

57 minutes ago, Guest 5150 said:

Ah gotcha. I love the curve balls. Now understanding "why" is my next big challenge, but I'd need a drink to go down that road honestly lol. Why can't they just say "Used Cinebench R23 multithread". Do they save money if someone uses abbreviations nobody gets???? lol. 

 

They saved exactly "N" dollars using abbreviations nobody gets. 😅

 

39 minutes ago, cmndr said:

N is a count, T stands for threads.

So arbitrary number of threads.

If you're thinking about it in math terms you often have loops or iterators that go from 1 to N or 0 to N with N often just being the last number in a sequence.

 

Thanks for the clarification, I figured I was on the right track.  👍

Link to comment
Share on other sites

Link to post
Share on other sites

https://i.pinimg.com/originals/f2/ed/fd/f2edfd9b2803d32cc7ebf96ec758c324.jpg

 

If you do shorthand you might see the Sigma symbol with just an n under it (go across al entries).

This pops up in statistics a lot, where you have an arbitrary (and potentially unknown up front) number of data points to go through.


x^_=1/Nsum_(i=1)^Nx_i.

 

The above is the formula for taking the average (Sigma means SUM up). so from the first entry to the nth entry (1, 2, 3, ... , n) take all of your data points (so xi is the data point in the ith position, so the first data point is x1, the second x2, the nth is xn) sum them up and divide by the number.

 

3900x | 32GB RAM | RTX 2080

1.5TB Optane P4800X | 2TB Micron 1100 SSD | 16TB NAS w/ 10Gbe
QN90A | Polk R200, ELAC OW4.2, PB12-NSD, SB1000, HD800
 

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Guest 5150 said:

Why is this big news? Is AVX-512 that high in demand? I don't think it was really used much for desktop applications?? (honestly asking, not sarcasm.)

 

Well the T I would assume maybe "threads" or threading". Just threw me off, thought there was some new CB that was released I didn't know about yet 🤣

Currently, the biggest, if not only, consumer use case I can see of AVX-512 that I can use right now is in PS3 emulation (primarily to brute force difficult SPE code that otherwise would be difficult to recompile for running on GPU). The performance boost is pretty huge. Depending on how SPE-heavy the game is, AVX-512 can double performance. 
 

The x265 encoder can use it if enabled, however it provides an underwhelming performance increase, potentially as floating point performance may not be the bottleneck to encoding performance. 
 

That’s about it. 
 

edit after some contemplation: Judging by how effective AVX-512 is for emulation, if I had to guess, it’s excellent for loads requiring high throughput, and very low latency. Compute bound loads such as encoding and rendering are not appreciably slowed by trips over PCI-e, and so would probably benefit from being offloaded to the GPU if possible.
 

Emulation is quite different however. Usable accuracy requires precise timings, which prioritizes low latency. However, the PS3’s architecture (namely the SPEs) requires a high degree of compute performance so as to not be compute bound. Offloading the SPE code to the GPU would be no good, as the time it takes to cross the PCI-e bus is quite significant in this use case, and affect timing. AVX-512, in this case, can offer the number crunching ability to do the job, while being able to provide the CPU with the results with minimal delay. 
 

If this holds true, AVX-512 is probably very good for taking on brief chunks of performance-sensitive code that requires low latency. A combination that appears to be quite rare. 

My eyes see the past…

My camera lens sees the present…

Link to comment
Share on other sites

Link to post
Share on other sites

So they claimed 15% on Computex, and now they claim 35%.

As a potential Zen4 buyer I very much hope it's the latter, but quite honestly, not sure what to believe anymore (beyond the basic "never trust a company's numbers").

 

After the surprising performance of the 5800X3D, I'm happy to see them confirming V-Cache Zen4 will be a thing. Very interesting. 

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, leadeater said:

So actual AVX-512 is part of Zen4, that's actually really big news.

Let's see what they do. If it does appear in consumer CPUs, it makes them much more interesting to me again.

 

8 hours ago, Guest 5150 said:

Why is this big news? Is AVX-512 that high in demand? I don't think it was really used much for desktop applications?? (honestly asking, not sarcasm.)

So far it has been a bit chicken and egg. Software devs might not look at it until there is sufficient installed base. AMD may be doing their classic of letting someone else do the hard work on adoption before jumping in. Intel, is doing an Intel. It didn't appear in Intel consumer desktop CPUs until Rocket Lake, then officially disappeared again with Alder Lake. On mobile side it has been included in previous two generations: Ice Lake and Tiger Lake. If AMD really do implement AVX-512 we could see more uptake of the instruction set.

 

As for what it does, note AVX-512 is a bit of an umbrella for a whole bunch of mandatory and optional stuff. The core part improves upon AVX2. If software can make good use of AVX2, it could potentially see decent gains from implementing AVX-512. For Prime95 like workloads I've seen +40% IPC on one unit implementations (as used in consumer tier CPUs) and +80% IPC on two unit implementations (as seen in Intel HEDT and some server versions).

 

A side problem also has been that since AVX-512 can do a LOT of work fast, it also uses a lot of power while doing so. Again for Prime95 like workloads, perf/W seems about constant relative to not using it, so more work is more power. In the past Intel have managed this by significantly dropping the clock when AVX-512 code runs. It seems like on newer CPUs this is better managed. Strongly using AVX-512 will still see benefits. If you only make light use of it, the clock drop might hurt non-AVX-512 work going through at the same time. AMD are not immune to this either if you look at their AVX2 behaviour. When running under a fixed power limit (stock) that too drops clocks more than other workloads which don't use it.

 

1 hour ago, Rauten said:

So they claimed 15% on Computex, and now they claim 35%.

Without going back to check, wasn't the 15% claim for single thread? This 35% claim is multi-thread.

 

 

On the claims here, so we get >25% perf/w but >35% overall. Implication that last 10% is from higher power. Since scaling is always worse as you go up the curve, expect that power to go up more than that. Taking 142W PPT of 5950X: 142/1.25*1.35=153W as a minimum. Expect higher but it could be lower depending on what the >25%/>35% numbers actually are.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Zodiark1593 said:

Usable accuracy requires precise timings,

3 hours ago, Zodiark1593 said:

However, the PS3’s architecture (namely the SPEs) requires a high degree of compute performance so as to not be compute bound.

Would Fermi 2.0 be a good candidate for this type of work?

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, porina said:

Without going back to check, wasn't the 15% claim for single thread? This 35% claim is multi-thread.

Went back to check and you're absolutely right, their 15% claim was ST.

Which is still rather disappointing.

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Rauten said:

Went back to check and you're absolutely right, their 15% claim was ST.

Which is still rather disappointing.

What are you even saying. It's MORE than 15%, always, on everything. In what universe is that disappointing? Also if you do the math it will be more. 8% IPC and the clock speeds are at least 10% higher, because they are saying OVER 5.5GHz boost.

Edited by ZetZet

Location: Kaunas, Lithuania, Europe, Earth, Solar System, Local Interstellar Cloud, Local Bubble, Gould Belt, Orion Arm, Milky Way, Milky Way subgroup, Local Group, Virgo Supercluster, Laniakea, Pisces–Cetus Supercluster Complex, Observable universe, Universe.

Spoiler

12700, B660M Mortar DDR4, 32GB 3200C16 Viper Steel, 2TB SN570, EVGA Supernova G6 850W, be quiet! 500FX, EVGA 3070Ti FTW3 Ultra.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, ZetZet said:

What are you even saying. It's MORE than 15%, always, on everything. In what universe is that disappointing? Also if you do the math it will be more. 8% IPC and the clock speeds are at least 10% higher, because they are saying OVER 5.5GHz boost.

Because it's a corporate presentation and we must therefore assume they're going to present the best numbers they can pull out of their ass that won't get them called out later for lying.

And considering that it's 15% ST for a chip that brings:

  • New architecture
  • Smaller process node
  • Faster clock speeds

It does seem rather disappointing, one would expect bigger gains due to the jump in multiple areas. Even more so when their speed bumps claim alone can be responsible for 10% of the uplift, meaning the architecture and node change did almost nothing performance-wise, and a good chunk of what they claim will come at the cost of higher power usage (which they already hinted at, with their increased TDP figures).

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Rauten said:

And considering that it's 15% ST for a chip that brings:

Once again, that's minimum 15%. Zen 3 was around that and everyone lost their mind. Idk what exactly you're expecting. 

15 minutes ago, Rauten said:

meaning the architecture and node change did almost nothing performance-wise

8% IPC is nothing? 

 

 

With higher clocks and more cache Zen 4 will be leading the way in games, unless Raptor Lake does some magic, which I don't believe Intel can do, their focus seems to be on adding more small cores.

Location: Kaunas, Lithuania, Europe, Earth, Solar System, Local Interstellar Cloud, Local Bubble, Gould Belt, Orion Arm, Milky Way, Milky Way subgroup, Local Group, Virgo Supercluster, Laniakea, Pisces–Cetus Supercluster Complex, Observable universe, Universe.

Spoiler

12700, B660M Mortar DDR4, 32GB 3200C16 Viper Steel, 2TB SN570, EVGA Supernova G6 850W, be quiet! 500FX, EVGA 3070Ti FTW3 Ultra.

 

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, ZetZet said:

Once again, that's minimum 15%. Zen 3 was around that and everyone lost their mind. Idk what exactly you're expecting. 

Probably because unless it's 25%+ average it doesn't even beat Alder Lake. Also, the >15% comes from Cinebench R23 ST alone, not from multiple workloads according to the footnotes, it's at least 15% improvement in Cinebench R23 ST, which is well below the 12900K ~23% or the 12900KS ~30% improvement over Zen 3, other workloads can be below that, even though I would hope CB ST is in the lower side.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, KaitouX said:

Probably because unless it's 25%+ average it doesn't even beat Alder Lake. 

Which doesn't exactly matter, does it? Even if it's a bit slower it can be priced to be extremely competitive. Unless you're only looking at the top SKU and maximum performance, which to me isn't the most important. If they can match Adler Lake and push the cost down while also pushing wattage down that's a huge win. 

Location: Kaunas, Lithuania, Europe, Earth, Solar System, Local Interstellar Cloud, Local Bubble, Gould Belt, Orion Arm, Milky Way, Milky Way subgroup, Local Group, Virgo Supercluster, Laniakea, Pisces–Cetus Supercluster Complex, Observable universe, Universe.

Spoiler

12700, B660M Mortar DDR4, 32GB 3200C16 Viper Steel, 2TB SN570, EVGA Supernova G6 850W, be quiet! 500FX, EVGA 3070Ti FTW3 Ultra.

 

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, ZetZet said:

Once again, that's minimum 15%.

Once again, corporate bullshittery. I'll believe it when I see it. Until then, it's a super-golden sample under the best conditions they could possibly create.

23 minutes ago, ZetZet said:

Zen 3 was around that and everyone lost their mind. Idk what exactly you're expecting. 

Well, more?

Zen3 managed that uplift mostly on architectural improvements. It was using a 7nm node that Zen2 already used, and the clock speed bump wasn't really that impressive.

 

Now with Zen4, which is supposed to be a brand new architecture, it's like barely any improvement has been achieved. Most of the gains could easily be put down to the speed increase and the node reduction.

 

I hoped they'd be able to reach similar architectural improvements as they had between Zen2<->Zen3 plus whatever they could squeeze out of the node change. And the clock increases would be the cherry on top. Instead, the clock is the saviour and main force behind the improvements...

 

11 minutes ago, ZetZet said:

Which doesn't exactly matter, does it? Even if it's a bit slower it can be priced to be extremely competitive. Unless you're only looking at the top SKU and maximum performance, which to me isn't the most important. If they can match Adler Lake and push the cost down while also pushing wattage down that's a huge win. 

IF they can match, which right now, points to a big fat "NO". Mind you, I dearly hope I'm wrong. I'm not an Intel fanboy, and I really want Zen4 to be good and be my next platform (and run the same mobo for multiple generations again!), but right now it's just not looking too great.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Rauten said:

IF they can match, which right now, points to a big fat "NO". Mind you, I dearly hope I'm wrong. I'm not an Intel fanboy, and I really want Zen4 to be good and be my next platform (and run the same mobo for multiple generations again!), but right now it's just not looking too great.

They already absolutely smash Adler Lake in MT, we already saw that. I don't see why ST wouldn't be close. I think you're expecting Raptor Lake to be a huge improvement, but so far they haven't given us anything at all. 

Location: Kaunas, Lithuania, Europe, Earth, Solar System, Local Interstellar Cloud, Local Bubble, Gould Belt, Orion Arm, Milky Way, Milky Way subgroup, Local Group, Virgo Supercluster, Laniakea, Pisces–Cetus Supercluster Complex, Observable universe, Universe.

Spoiler

12700, B660M Mortar DDR4, 32GB 3200C16 Viper Steel, 2TB SN570, EVGA Supernova G6 850W, be quiet! 500FX, EVGA 3070Ti FTW3 Ultra.

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, ZetZet said:

Which doesn't exactly matter, does it? Even if it's a bit slower it can be priced to be extremely competitive. Unless you're only looking at the top SKU and maximum performance, which to me isn't the most important. If they can match Adler Lake and push the cost down while also pushing wattage down that's a huge win. 

It's DDR5 only, that alone makes it pretty much impossible to be priced well compared to Alder Lake.

And again due to DDR5 lower SKUs are probably going to be horrible value, plus motherboards are probably be similar priced or more expensive than B660 and Z690. Alder Lake also has a significant thread count advantage over AMD currently in the lower SKUs, which might increase with Raptor Lake.

Intel should be also releasing Raptor Lake around the same time as Zen 4, so Zen 4 failing to match Alder Lake is pretty bad when you consider that too.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×