Jump to content

More Ryzen 3000 info - 4.5GHZ boost & +10-15% IPC

ouroesa
3 hours ago, yolosnail said:

That's very generous calling Intel's current lineup 9th generation, considering it's a refresh of 8th gen, which was a refresh of 7th gen, which was itself a refresh of 6th gen. 

Sure a few more cores have been tacked on here and there, but they're essentially the same with a few tweaks

LOL that's basically my point.

Make sure to quote or tag me (@JoostinOnline) or I won't see your response!

PSU Tier List  |  The Real Reason Delidding Improves Temperatures"2K" does not mean 2560×1440 

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, Princess Luna said:

Perfectly honest it's not about core frequency, it's about per core latency, Ring Bus still trashes Infinity Fabric.

It's not actually the IF that is slow or causing the latency increase it's the communication path and load/store steps when you go across CCXs. There is no direct path between cores on one CCX to another, it goes through L3 cache and that requires clock cycles to load and complete and then also access which is why there is such a huge jump from intra-CCX to inter-CCX latency.

 

But it's not that bad nor necessarily the root cause of low gaming performance as current Intel HEDT processors use Mesh and that has double the latency of Ring, CCX to CCX is another 60% on top of that so there is still a significant difference. The difference between Ring and Mesh isn't that big even though there is twice the latency, there is a difference but much of that is clocks.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, leadeater said:

It's not actually the IF that is slow or causing the latency increase it's the communication path and load/store steps when you go across CCXs. There is no direct path between cores on one CCX to another, it goes through L3 cache and that requires clock cycles to load and complete and then also access which is why there is such a huge jump from intra-CCX to inter-CCX latency.

Do games need much inter-core communication? Thinking in general, why are games seemingly more sensitive to this? Are there other workloads known to strongly prefer ring over other cache strategies? I also wonder if the inclusive cache strategy with ring also happens to work better than the exclusive in AMD or non-inclusive of current Intel HEDT. While the latter two are more capacity efficient, I have to wonder if for some workloads it results in excessive data movement.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

The difference between Ring and Mesh isn't that big even though there is twice the latency, there is a difference but much of that is clocks.

People did complain about the i7 6900K keeping a lead on the i7 7820X when matching clock for clock in games back on release if I recall well... Mesh was the main thing blamed on, but it also looks like you can manually overclock the frequency Mesh is set to and get around such limitation right?

 

3 minutes ago, leadeater said:

There is no direct path between cores on one CCX to another, it goes through L3 cache and that requires clock cycles to load and complete and then also access which is why there is such a huge jump from intra-CCX to inter-CCX latency.

That's interesting mechanics, I recall people talking about increasing L3 or maybe even adding L4 cache could be ways to minimize the issues, if anything at least unbound the infinity fabric frequency to of half the DRAM and allow it to be overclockable like Mesh?

 

I'm excited to see what Zen2 brings to the table, if they can improve latency I'm sure this will become an extremely compelling product across the board, I know some people who gave up on using Threadripper's for server purposes because of high latency issues although I don't have the understanding of this subject in specific well.

Personal Desktop":

CPU: Intel Core i7 10700K @5ghz |~| Cooling: bq! Dark Rock Pro 4 |~| MOBO: Gigabyte Z490UD ATX|~| RAM: 16gb DDR4 3333mhzCL16 G.Skill Trident Z |~| GPU: RX 6900XT Sapphire Nitro+ |~| PSU: Corsair TX650M 80Plus Gold |~| Boot:  SSD WD Green M.2 2280 240GB |~| Storage: 1x3TB HDD 7200rpm Seagate Barracuda + SanDisk Ultra 3D 1TB |~| Case: Fractal Design Meshify C Mini |~| Display: Toshiba UL7A 4K/60hz |~| OS: Windows 10 Pro.

Luna, the temporary Desktop:

CPU: AMD R9 7950XT  |~| Cooling: bq! Dark Rock 4 Pro |~| MOBO: Gigabyte Aorus Master |~| RAM: 32G Kingston HyperX |~| GPU: AMD Radeon RX 7900XTX (Reference) |~| PSU: Corsair HX1000 80+ Platinum |~| Windows Boot Drive: 2x 512GB (1TB total) Plextor SATA SSD (RAID0 volume) |~| Linux Boot Drive: 500GB Kingston A2000 |~| Storage: 4TB WD Black HDD |~| Case: Cooler Master Silencio S600 |~| Display 1 (leftmost): Eizo (unknown model) 1920x1080 IPS @ 60Hz|~| Display 2 (center): BenQ ZOWIE XL2540 1920x1080 TN @ 240Hz |~| Display 3 (rightmost): Wacom Cintiq Pro 24 3840x2160 IPS @ 60Hz 10-bit |~| OS: Windows 10 Pro (games / art) + Linux (distro: NixOS; programming and daily driver)
Link to comment
Share on other sites

Link to post
Share on other sites

I wonder will they increase CCX to 8 cores. Like how much that could complicate the design I'm curious. Would that make it like from 4-way cache core communication to 16-way? Having them all have direct path to each other. It would be great for like games since 8 cores would be in a one CCX pool. 

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

10-15% IPC improvements. Bring on those 1080p gaming benchmarks! :D

If you want my attention, quote meh! D: or just stick an @samcool55 in your post :3

Spying on everyone to fight against terrorism is like shooting a mosquito with a cannon

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, Doobeedoo said:

I wonder will they increase CCX to 8 cores. Like how much that could complicate the design I'm curious. Would that make it like from 4-way cache core communication to 16-way? Having them all have direct path to each other. It would be great for like games since 8 cores would be in a one CCX pool. 

from the leaks we have seen on the epyc side the ccx config is the same, and i am skeptical that moving to 8 core ccx is a good idea

Link to comment
Share on other sites

Link to post
Share on other sites

55 minutes ago, Princess Luna said:

People did complain about the i7 6900K keeping a lead on the i7 7820X when matching clock for clock in games back on release if I recall well... Mesh was the main thing blamed on, but it also looks like you can manually overclock the frequency Mesh is set to and get around such limitation right?

Correct, anything built and optimized during the 4 core era really likes being on Ring, even though current desktop 8 core processors are on Ring the difference between HEDT is a little less now but still there. Upping Mesh ratio gains some, but kinda depends on the game. AC:AO doesn't care much about the difference in architecture where Fay Cry 5 does:

 

9980xe-benchmark-fc5-1080p.png

Note that i9-7980XE 30x Mesh result.

 

Analysis is a little hard because the stock clocks of the 8700k and 9900k etc are just so much higher than HEDT, even when those are OC'd. The 8700k stock vs 7980XE 4.6Ghz x30 is the most interesting for this discussion though since it's 4.7Ghz vs 4.6Ghz. It's a real shame there isn't a 7980XE 4.6Ghz result without the Mesh bump, the 7960X 4.6Ghz is the next best we have on that graph.

 

8700k stock: 141

7980XE 4.6 x30: 146.3

7960X 4.6: 135.9

7960X stock: 117.6 

7980XE stock:112.3

 

Rough 18 FPS gain for the clock bump to 4.6 and another 10.4 FPS for Mesh bump. Clock increase does 'more' improvement and gets to within 4% of the 8700k. At 4% is Mesh really hurting performance compared to Ring or is frequency the main factor? To me it's frequency because best case it's a 17% difference. Also is that 4% the 100Mhz difference and not even Mesh at all?

 

Basically not enough data to call it from just that one game and without diving in to like 20 games with all the OC and Mesh results to analyse the situation I put it in the inconclusive category, unlike frequency which we do have conclusive evidence of across any generation and architecture. I do wish the data was there though, that's the stuff I like to look at.

 

55 minutes ago, Princess Luna said:

That's interesting mechanics, I recall people talking about increasing L3 or maybe even adding L4 cache could be ways to minimize the issues, if anything at least unbound the infinity fabric frequency to of half the DRAM and allow it to be overclockable like Mesh?

In a way adding L4 cache would hurt more than help in regards to core latency, but that's only if L4 cache was actually in that communication path which it wouldn't/shouldn't be. As long as it's purely data cache used between system memory it won't hurt and will help.

 

56 minutes ago, porina said:

Do games need much inter-core communication? Thinking in general, why are games seemingly more sensitive to this? Are there other workloads known to strongly prefer ring over other cache strategies? I also wonder if the inclusive cache strategy with ring also happens to work better than the exclusive in AMD or non-inclusive of current Intel HEDT. While the latter two are more capacity efficient, I have to wonder if for some workloads it results in excessive data movement.

See above ramblings, atm no idea not enough information.

Link to comment
Share on other sites

Link to post
Share on other sites

So much speculation, we've basically seen everything from 5ghz base clock claims to low single digit ipc with no clock speed bump. 4.5ghz means absalutely nothing if you don't know what it is, could be 4.5 base, boost or max oc, the only thing we do know is that the 7nm used is a high performance node, not a low  power one, and according to the amd OFFICIAL demo, the new amd 8 core is faster then the 9900k at less power, which people have speculated to mean that they have a lot more oc headroom.

 

Hopefully it is the case but lets not get our hopes up just yet.

Link to comment
Share on other sites

Link to post
Share on other sites

@leadeater @Princess Luna The CCX penalty is a Windows problem, mostly, because the scheduler is like an ADHD child after drinking a 2L of soda. When it comes to the Ring Bus or Mesh, it's really about reaching a sufficient level of latency more than just "faster = better". This is why, after people put in the work, Mesh overclocking has solved most of the Skylake-X gaming issues. What you're really doing is not increasing bandwidth or throughput, but the actual value (for gaming) is eating up the Wait Cycles faster. It's the reason why ultra-low latency RAM timings can break some Game Engines and you see some wild results, especially in the 1% Lows.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Taf the Ghost said:

This is why, after people put in the work, Mesh overclocking has solved most of the Skylake-X gaming issues.

I haven't actually seem much evidence it's actually Mesh that is the problem and just frequency itself. Upping the Mesh does gain performance but it does so above that of Ring in many instances. So it's like, is Mesh really a problem?

 

5 minutes ago, Taf the Ghost said:

The CCX penalty is a Windows problem, mostly, because the scheduler is like an ADHD child after drinking a 2L of soda.

Isn't that only an issue over a certain number of threads/cores. I thought anything equal/below, I think 16?, didn't trigger the issue. Not everything has the issue either, so it's not even directly the Windows Scheduler but the compiled libraries some software use which just weren't compiled to handle large amount of cores and things get... weird if you go over.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

I haven't actually seem much evidence it's actually Mesh that is the problem and just frequency itself. Upping the Mesh does gain performance but it does so above that of Ring in many instances. So it's like, is Mesh really a problem?

 

Isn't that only an issue over a certain number of threads/cores. I thought anything equal/below, I think 16?, didn't trigger the issue. Not everything has the issue either, so it's not even directly the Windows Scheduler but the compiled libraries some software use which just weren't compiled to handle large amount of cores and things get... weird if you go over.

Mesh definitely was a problem in a number of games in the early going, but both Skylake-X and Ryzen have benefited from a lot of subtle fixes & tweaks that went with both Windows. So, at this point, it's hard to say exactly which interactions were causing issues. The other, likely aspect, is that a lot of updates have just gone out because a lot of game engines were actually out of spec for what they were supposed to be doing but it didn't cause any noticeable issues on Intel, thus no one actually cared to address problems.

 

Other part to this discussion has to be the L3 Cache changes. Skylake-X and Ryzen are similar in that regard, which is partially why gaming performance was down until patches & adjustments came out.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Derangel said:

I would love that to be true, but its a huge change from the current limits of any Ryzen or Threadripper chip.

Heads up, Ryzen is build on lower power process that was initially made by iirc Samsung for GloFo. So everything was done as a "power saving thing". Now if they were to built it on a proper process, that is where 3000 comes in and much higher clocks come in. 

 

Someone correct me if I got a name or two wrong here. 

The ability to google properly is a skill of its own. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Bouzoo said:

Heads up, Ryzen is build on lower power process that was initially made by iirc Samsung for GloFo. So everything was done as a "power saving thing". Now if they were to built it on a proper process, that is where 3000 comes in and much higher clocks come in. 

 

Someone correct me if I got a name or two wrong here. 

Samsung/GloFo 14nm isn't a "low power" node, in the Mobile SoC sense. It's in the middle and the max efficiency range is right around 2.2 to 2.4 Ghz. There is a 14nm HPC variant that IBM uses that clocks really high, but Ryzen was designed around the 14nm node its on. 

 

For 7nm, it's TSMC's 7nm HPC node, which we haven't seen any products off of next. The numbers we have come from what AMD has told us about Rome. Compared to the 14nm node, you can either take 1/2 the power savings or go up between 25-35% in frequency. This is how Rome can have 64 cores at what will be similar clocks to 32 core Epyc parts. For Desktop, we're expecting a "middle of the road" approach. So double the cores at about 15% faster frequency. 

 

However, and here's the big catch, GloFo's 14nm/12nm node has some harsh voltage walls. I believe, with Ryzen first gen, going from 3.9 Ghz to 4.1 Ghz doubled the power usage. If TSMC's node doesn't have that property, AMD can put some boost clocks pretty close to 5 Ghz. Top-end clocks are going to be pretty high, but what we'll really care about is Sustained Clocks. The 2700X does 3.7 Ghz All-core. If a 8c 3rd Gen should, at the same power, do around 4.6 Ghz all-core.  At the same power as the 9900k to do 4.7 Ghz, Ryzen could potentially do 5 Ghz. Without the node dynamics, we don't know yet.

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Bouzoo said:

Heads up, Ryzen is build on lower power process that was initially made by iirc Samsung for GloFo. So everything was done as a "power saving thing". Now if they were to built it on a proper process, that is where 3000 comes in and much higher clocks come in. 

 

Someone correct me if I got a name or two wrong here. 

For News Relevant discussions, Tesla's new Self Driving Module is built on Samsung's 14nm node. That's the type of product that goes well with it. (Both Samsung's & GloFo's 14nm nodes are incredibly popular, as a note. That efficiency curve is great for non-mobile GPUs and ARM CPUs.)

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, dgsddfgdfhgs said:

4.5Ghz boost... ok, may be 4.8Ghz overclocked

anyone think there are some hidden agreements that amd cant make chips faster than intel?

https://www.amd.com/en/products/cpu/fx-9590

 

Amd had 5ghz stock before Intel.

CPU: Ryzen 5 2600 4ghz @ 1.35v  CPU Cooler: Mugen 5 Rev b  Motherboard: MSI B450 Gaming Pro Carbon  GPU: Zotac RTX 2060 +150/+1000 Memory: 16GB Viper 4 @ 3200 CL14 Samsung B-die  Storage: 1TB Patriot VPN100 NVMe; 500GB 860evo; 128gb 840pro CaseCooler Master Q500L  PSU: CX750M V2 Operating System: Windows 10 Pro Other: 6 Corsair LL Fans; 2 aRGB Strips

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, fluxdeity said:

That's not really relevant.

 

Bulldozer also had horrible IPC, to the point where I can downclock my 5930K and still beat the 5GHz 9590.

Come Bloody Angel

Break off your chains

And look what I've found in the dirt.

 

Pale battered body

Seems she was struggling

Something is wrong with this world.

 

Fierce Bloody Angel

The blood is on your hands

Why did you come to this world?

 

Everybody turns to dust.

 

Everybody turns to dust.

 

The blood is on your hands.

 

The blood is on your hands!

 

Pyo.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Drak3 said:

That's not really relevant.

 

Bulldozer also had horrible IPC, to the point where I can downclock my 5930K and still beat the 5GHz 9590.

5 bulldozer jiggies were equivelent to around 3-3,5 intel jiggies afaik. 

 

Intel had a node advantage, but the IPC werent very good on the narrow cores on the fx line

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, Taf the Ghost said:

However, and here's the big catch, GloFo's 14nm/12nm node has some harsh voltage walls. I believe, with Ryzen first gen, going from 3.9 Ghz to 4.1 Ghz doubled the power usage.

The wall was tough for overclockers. I had a 1700 and from memory I don't remember getting past 4.0 under load. Even at 4.0, I do recall one occasion seeing 180W reported by the CPU itself (probably running Cinebench R15). Personally when overclocking for 24/7 usage I didn't put all core past 3.6 in order to maintain use of low voltages as the power goes up massively for the tiny increases in performance above that. Zen+ helped a lot. With air cooling, 4.25 all cores was attainable on my 2600. Note all my Ryzen CPUs were non-X versions, so potentially those could have got you a little more.

 

The interesting thing here is, user overclocks on Ryzen so far without exotic cooling seem to be reaching roughly where the fastest turbo clocks are for the generation. There isn't much headroom beyond that. The benefit of OC was that all cores were reaching it. It will be interesting to see if this pattern continues with Zen2.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, JoostinOnline said:

Intel's 3rd generation of the Core processors stayed under 4GHz.

Sure, on paper they stayed below 4GHz indeed.  However Intel was much more conservative with clock speeds back then, which left tons of headroom for enthusiasts.  Sandy and Ivy Bridge CPUs were hitting 4.8 to 5GHz like it was nothing, some even on air. 

 

It's nice to see AMD close the gap though.  I'm hoping to get a few more years out of my 5930K, but my next rig will probably be Ryzen.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Captain Chaos said:

Sandy and Ivy Bridge CPUs were hitting 4.8 to 5GHz like it was nothing, some even on air. 

Lol no they weren't.  Not at all.  Reaching 4.5GHz on a 3770k was really good for an air cooler.  It wasn't until Coffee Lake that hitting 5GHz was simple on air.

Make sure to quote or tag me (@JoostinOnline) or I won't see your response!

PSU Tier List  |  The Real Reason Delidding Improves Temperatures"2K" does not mean 2560×1440 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, JoostinOnline said:

Lol no they weren't.  Not at all.  Reaching 4.5GHz on a 3770k was really good for an air cooler.  It wasn't until Coffee Lake that hitting 5GHz was simple on air.

I've got a 4930k on a stupid high end water cooling loop and getting mine past 4.3Ghz is a right pain, sure it's likely a dud but from what I know about Ivy-E 4.8Ghz was never common nor that easy with a good chip.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

I've got a 4930k on a stupid high end water cooling loop and getting mine past 4.3Ghz is a right pain, sure it's likely a dud but from what I know about Ivy-E 4.8Ghz was never common nor that easy with a good chip.

Yeah, my 4790k has the biggest air cooler I could fit strapped to it, delidded with liquid metal, and my 4.7GHz overclock isn't super stable. It runs games fine, but stress tests either cause overheating or crashing.

Make sure to quote or tag me (@JoostinOnline) or I won't see your response!

PSU Tier List  |  The Real Reason Delidding Improves Temperatures"2K" does not mean 2560×1440 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Taf the Ghost said:

Samsung/GloFo 14nm isn't a "low power" node, in the Mobile SoC sense. It's in the middle and the max efficiency range is right around 2.2 to 2.4 Ghz. There is a 14nm HPC variant that IBM uses that clocks really high, but Ryzen was designed around the 14nm node its on. 

 

For 7nm, it's TSMC's 7nm HPC node, which we haven't seen any products off of next. The numbers we have come from what AMD has told us about Rome. Compared to the 14nm node, you can either take 1/2 the power savings or go up between 25-35% in frequency. This is how Rome can have 64 cores at what will be similar clocks to 32 core Epyc parts. For Desktop, we're expecting a "middle of the road" approach. So double the cores at about 15% faster frequency. 

 

However, and here's the big catch, GloFo's 14nm/12nm node has some harsh voltage walls. I believe, with Ryzen first gen, going from 3.9 Ghz to 4.1 Ghz doubled the power usage. If TSMC's node doesn't have that property, AMD can put some boost clocks pretty close to 5 Ghz. Top-end clocks are going to be pretty high, but what we'll really care about is Sustained Clocks. The 2700X does 3.7 Ghz All-core. If a 8c 3rd Gen should, at the same power, do around 4.6 Ghz all-core.  At the same power as the 9900k to do 4.7 Ghz, Ryzen could potentially do 5 Ghz. Without the node dynamics, we don't know yet.

did you forgot about vega 20?

26 minutes ago, JoostinOnline said:

Yeah, my 4790k has the biggest air cooler I could fit strapped to it, delidded with liquid metal, and my 4.7GHz overclock isn't super stable. It runs games fine, but stress tests either cause overheating or crashing.

i have a 4690k myself and getting over 4.5 is hard, tried to get 5ghz in a bench session once, needed over 1.5v for enter windows and maybe lasts long enough to post a cpuz screenshot levels of stability 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×