Jump to content

More Ryzen 3000 info - 4.5GHZ boost & +10-15% IPC

ouroesa
3 hours ago, fluxdeity said:

Ah yes, the FurnaceX-9590.

 

Don't forget that Bulldozer had the IPC of a potato salad. Even at stock I'm 1000% sure my 8086K will run circles around the FX-9590.

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, JoostinOnline said:

Lol no they weren't.  Not at all.  Reaching 4.5GHz on a 3770k was really good for an air cooler.  It wasn't until Coffee Lake that hitting 5GHz was simple on air.

If you only did 4.5 on air, you were very unlucky with the silicon lottery.  I saw machines do 4.7 and higher on air on a regular basis. 

Ivy often did require delidding due to bad TIM (as did almost every generation after it).  Maybe that's why you thought/think it didn't do those speeds. 

 

I didn't even have to dig far for this :

661705068_LTTCPUbenchesLGA1155.jpg.079c626d41daf52d90027498d429514a.jpg

 

Source :

 

Most of the Sandy and Ivy results were from 2014 or earlier, when the forum was a lot smaller than it is now. 

I highlighted the ones on air.  Right away I see three 3570Ks on air at 5GHz, 2 of them even on a mere Hyper 212.  also a 3770K doing 4.9 on a Thermalright Silver Arrow, another one doing 4.8 on an NH-D14.  If 4.5 is "really good", 4.8 and higher is probably not bad. 

 

I doubt that Jumper118 was actually running a stock cooler, but I wouldn't mind if someone can prove me wrong on that.

 

As for the water-cooled ones, I only see 4 custom loops, the rest are AIOs.

 

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Captain Chaos said:

I didn't even have to dig far for this :

661705068_LTTCPUbenchesLGA1155.jpg.079c626d41daf52d90027498d429514a.jpg

half of these are essentially lethal voltage........

 

also we dont know the stability of these systems.

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, leadeater said:

It's not actually the IF that is slow or causing the latency increase it's the communication path and load/store steps when you go across CCXs. There is no direct path between cores on one CCX to another, it goes through L3 cache and that requires clock cycles to load and complete and then also access which is why there is such a huge jump from intra-CCX to inter-CCX latency.

 

But it's not that bad nor necessarily the root cause of low gaming performance as current Intel HEDT processors use Mesh and that has double the latency of Ring, CCX to CCX is another 60% on top of that so there is still a significant difference. The difference between Ring and Mesh isn't that big even though there is twice the latency, there is a difference but much of that is clocks.

 

It's a bit more complicated than that, i did a bit of digging a few weeks ago on this and basically Intel has a few different advantages just on bandwidth levels. First their ring main separates data out into 4 categories and each gets a full speed ring to itself this means much less in the way of issues of the pipeline getting clogged by other stuff for any given type of transmission.  Second their ring main operates at a much higher frequency not only giving each ring main roughly 4 times the bandwidth of AMD's IF links, (though the bidirectional nature of IF links makes it closer to 2 times in practise), but making the transmission delay latency contribution over 4 times smaller.

 

Zen 2 promises several advancements that should if AMD fully utilizes what we know they can do on the chiplets let them more or less reverse all the negatives save the minimum latency, which will still be slashed overall.

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, CarlBar said:

It's a bit more complicated than that, i did a bit of digging a few weeks ago on this and basically Intel has a few different advantages just on bandwidth levels. First their ring main separates data out into 4 categories and each gets a full speed ring to itself this means much less in the way of issues of the pipeline getting clogged by other stuff for any given type of transmission.  Second their ring main operates at a much higher frequency not only giving each ring main roughly 4 times the bandwidth of AMD's IF links, (though the bidirectional nature of IF links makes it closer to 2 times in practise), but making the transmission delay latency contribution over 4 times smaller.

Bandwidth and latency is still a problem with Ring which is why Intel moved to Mesh, Ring just doesn't scale. The inherent problem with Ring is the fact that it's a Ring.

 

image.png.6f3c22def3180b474ff40a7ee128e906.png

 

More cores means higher latency with Ring, you can't see it here very well but Mesh does win out easily in the higher core counts. You also get much higher total bandwidth with Mesh but significantly lower single core bandwidth. There's a good breakdown of that on Anandtech for there EPYC and Skylake-SP review.

 

image.png.98b3a172231b2d5505c599617b145b7e.png

 

Zen within a CCX is actually very good latency though and bandwidth is better than Ring or Mesh in most situations.

Link to comment
Share on other sites

Link to post
Share on other sites

56 minutes ago, GoldenLag said:

also we dont know the stability of these systems.

I quickly went through the original spreadsheet again :

 

Banttu (i5-2500K, 5.0GHz, custom loop) : "At least 40-50 passes with Prime95"

TheSLSAMG (i5-2500K, 5.0GHz on H100i) : 6+ hours of Intel's CPU stress test
Jumper118 (i5-2550K, 4.926GHz on "stock cooler") : Used Cinebench, listed in spreadsheet as "1-3 hours"

Located (i5-2500K, 4.8GHz on H100) : 404 error on the image, spreadsheet lists it as "1-3 hours, or at least 40-50 passes with Prime95"

Artem (i7-3770K, 4.8GHz on Noctua NH-D14) : 6+ hours on Prime95

terminashunator (i7-2600K, 4.8GHz on H100) : 6+ hours of Intel's CPU stress test

arodrake (i7-2700K, 4.7 on H100i) : imageshack doesn't show the picture, spreadsheet list sit as "1-3 hours, or at least 40-50 passes with Prime95"

SpyRosL (i5-3570K, 4.7 on CM Hyper TX3) : 6+ hours on Prime95

Gofspar : i7-3770K, 4.7 on Kraken X61) : 6+ hours on Prime95

HappyChubbs (i7-3770K, 4.7 on H100i) : 404 error on the image, spreadsheet lists it as "6+ hours"
 

The others were either less than an hour or did a single pass just to get the result. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Captain Chaos said:

If you only did 4.5 on air, you were very unlucky with the silicon lottery.  I saw machines do 4.7 and higher on air on a regular basis. 

Ivy often did require delidding due to bad TIM (as did almost every generation after it).  Maybe that's why you thought/think it didn't do those speeds. 

Sandy-E actually clocked higher than Ivy-E could, none of those in that picture are Ivy-E.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Captain Chaos said:

I quickly went through the original spreadsheet again :

 

Banttu (i5-2500K, 5.0GHz, custom loop) : "At least 40-50 passes with Prime95"

TheSLSAMG (i5-2500K, 5.0GHz on H100i) : 6+ hours of Intel's CPU stress test
Jumper118 (i5-2550K, 4.926GHz on "stock cooler") : Used Cinebench, listed in spreadsheet as "1-3 hours"

Located (i5-2500K, 4.8GHz on H100) : 404 error on the image, spreadsheet lists it as "1-3 hours, or at least 40-50 passes with Prime95"

Artem (i7-3770K, 4.8GHz on Noctua NH-D14) : 6+ hours on Prime95

terminashunator (i7-2600K, 4.8GHz on H100) : 6+ hours of Intel's CPU stress test

arodrake (i7-2700K, 4.7 on H100i) : imageshack doesn't show the picture, spreadsheet list sit as "1-3 hours, or at least 40-50 passes with Prime95"

SpyRosL (i5-3570K, 4.7 on CM Hyper TX3) : 6+ hours on Prime95

Gofspar : i7-3770K, 4.7 on Kraken X61) : 6+ hours on Prime95

HappyChubbs (i7-3770K, 4.7 on H100i) : 404 error on the image, spreadsheet lists it as "6+ hours"
 

The others were either less than an hour or did a single pass just to get the result. 

even then a lot of them are still using lethal voltage. they arent the normal sets of overclocks you would see people perform in the wild

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Sandy-E actually clocked higher than Ivy-E could, none of those in that picture are Ivy-E.

Hang on, we were talking 2nd and 3rd gen.  I made it easier by excluding all results that weren't on LGA1155.  Ivy-E wasn't really in my picture, otherwise I would have included 2011. 

I'm not going through the entire spreadsheet again though.

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, Drak3 said:

Someone flunked middle school math...

 

Because doing the math, we got 520 PCIe lanes and 516 SATA ports according to this.

 

 

 

 

 

 

 

 

 

And I can't afford that many NVMe drives and GPUs.

Well, if you want a mining machine :D

LTT's Resident Porsche fanboy and nutjob Audiophile.

 

Main speaker setup is now;

 

Mini DSP SHD Studio -> 2x Mola Mola Tambaqui DAC's (fed by AES/EBU, one feeds the left sub and main, the other feeds the right side) -> 2x Neumann KH420 + 2x Neumann KH870

 

(Having a totally seperate DAC for each channel is game changing for sound quality)

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, leadeater said:

Sandy-E actually clocked higher than Ivy-E could, none of those in that picture are Ivy-E.

yet here am I with a Sandy-E that can't get above 4.4 lol

 

I've literally tried 1.45V on the thing... 4.4 tops

 

I hate losing the silicon lottery lol. Thing would still be a good chip if it would run at 4.6-4.8.

"If a Lobster is a fish because it moves by jumping, then a kangaroo is a bird" - Admiral Paulo de Castro Moreira da Silva

"There is nothing more difficult than fixing something that isn't all the way broken yet." - Author Unknown

Spoiler

Intel Core i7-3960X @ 4.6 GHz - Asus P9X79WS/IPMI - 12GB DDR3-1600 quad-channel - EVGA GTX 1080ti SC - Fractal Design Define R5 - 500GB Crucial MX200 - NH-D15 - Logitech G710+ - Mionix Naos 7000 - Sennheiser PC350 w/Topping VX-1

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, leadeater said:

Yep, just pointing out TSMC 7nm isn't a magic holy grail that will do everything.

True but from the things we know with the Apple A9 is that the TSMC 16nm Process is a good amount better than the GF/Samsung 14nm process.

Though only in terms of Temperature/power consumption.

IIRC there aren't any clockrate comparisations between those two Processes and A9 is all we have...

 

The next  best thing we kinda have is VEGA10 vs. VEGA20 and that also doesn't look too bad but its not that comparable as VEGA20 has some changes.

IIRC 1:2 SP:DP, additional 2 1024 bit Memory Controllers and possibly other things as well.


And that's without reworking the Core, wich is what AMD Did for Zen2...

"Hell is full of good meanings, but Heaven is full of good works"

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Jito463 said:

Not really, IPC for Ryzen is already pretty close to Intel's.  Where AMD has been lacking the most is in clock speeds, though an IPC increase as well wouldn't hurt.

Nope, not for gaming. 

The Real Issue is the rather high latency, wich prevents Ryzen for the most part to go well in high fps gaming, wich in reality is low latency Gaming.

 

And the "Single Core Performance" w/o SMT is also something people claim is important (for Starcraft2 for example)...

"Hell is full of good meanings, but Heaven is full of good works"

Link to comment
Share on other sites

Link to post
Share on other sites

57 minutes ago, Captain Chaos said:

If you only did 4.5 on air, you were very unlucky with the silicon lottery.  I saw machines do 4.7 and higher on air on a regular basis. 

Ivy often did require delidding due to bad TIM (as did almost every generation after it).  Maybe that's why you thought/think it didn't do those speeds. 

Yes and how many CPUs were overclocked, how is the percentage of those reaching that? 

There's a ton of information we don't know.

Including how many CPUs those people bought originally and sent back to get one that is as good.

 

Also we do NOT know how stable it is. There are some applications that are pretty allergic to overclocking. The last ones I remember was one of the Aquanox Games. But there are others around as well.

 

What I'm saying is that we don't have enough information to claim anything. We could also claim that later Intel products are more consistant and have less variation.

"Hell is full of good meanings, but Heaven is full of good works"

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Captain Chaos said:

I quickly went through the original spreadsheet again :

 

Banttu (i5-2500K, 5.0GHz, custom loop) : "At least 40-50 passes with Prime95" 

TheSLSAMG (i5-2500K, 5.0GHz on H100i) : 6+ hours of Intel's CPU stress test
Jumper118 (i5-2550K, 4.926GHz on "stock cooler") : Used Cinebench, listed in spreadsheet as "1-3 hours"

Located (i5-2500K, 4.8GHz on H100) : 404 error on the image, spreadsheet lists it as "1-3 hours, or at least 40-50 passes with Prime95"

Artem (i7-3770K, 4.8GHz on Noctua NH-D14) : 6+ hours on Prime95

terminashunator (i7-2600K, 4.8GHz on H100) : 6+ hours of Intel's CPU stress test

arodrake (i7-2700K, 4.7 on H100i) : imageshack doesn't show the picture, spreadsheet list sit as "1-3 hours, or at least 40-50 passes with Prime95"

SpyRosL (i5-3570K, 4.7 on CM Hyper TX3) : 6+ hours on Prime95

Gofspar : i7-3770K, 4.7 on Kraken X61) : 6+ hours on Prime95

HappyChubbs (i7-3770K, 4.7 on H100i) : 404 error on the image, spreadsheet lists it as "6+ hours"
 

The others were either less than an hour or did a single pass just to get the result. 

Your original claim was that they "were hitting 4.8 to 5GHz like it was nothing, some even on air", which is what we're all calling bullshit on.  Nobody is saying it was impossible to hit 5GHz, it's just that it was rare to get a stable overclock on a reasonable voltage.  Most of those voltages are dangerously high, even if they're stable, and thus are more for showing off than 24/7 usage.

 

Intel and AMD are using two different architectures, with the latter being much newer.  Intel has been refining theirs for a decade now, and while they seem to be hitting some roadblocks (such as with scalability), they've got an advantage on clock speeds.  It's silly to attribute it to a conspiracy theory.

Make sure to quote or tag me (@JoostinOnline) or I won't see your response!

PSU Tier List  |  The Real Reason Delidding Improves Temperatures"2K" does not mean 2560×1440 

Link to comment
Share on other sites

Link to post
Share on other sites

10-15% IPC boost is huge if true, given that R7 2700X actually has identical IPC as newest Intel offerings. But people continue to mix up clocks with IPC, thinking Intel has it higher. IPC is compared at same clocks, not at different clocks. So, R7 2700X has identical IPC to 9900K at 4GHz which is what both achieve easily. The rest of "IPC" on Intel side is purely gained from higher clocks, not from actual IPC enhancement.

Link to comment
Share on other sites

Link to post
Share on other sites

So, this news is about the middle or high-end chip? 

Link to comment
Share on other sites

Link to post
Share on other sites

57 minutes ago, RejZoR said:

10-15% IPC boost is huge if true, given that R7 2700X actually has identical IPC as newest Intel offerings. But people continue to mix up clocks with IPC, thinking Intel has it higher. IPC is compared at same clocks, not at different clocks. So, R7 2700X has identical IPC to 9900K at 4GHz which is what both achieve easily. The rest of "IPC" on Intel side is purely gained from higher clocks, not from actual IPC enhancement.

You also have to compare the same instructions as performance can vary widely from instruction to instruction between the two. The instructions in "instruction per clock" is a huge factor. Many misuse IPC for single core performance.

 

Just a heads up for people grazing through that may not understand it fully.

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, Dylanc1500 said:

You also have to compare the same instructions as performance can vary widely from instruction to instruction between the two. The instructions in "instruction per clock" is a huge factor. Many misuse IPC for single core performance.

 

Just a heads up for people grazing through that may not understand it fully.

Which for the most part other than AVX2 Ryzen is down on single thread performance due to clocks not IPC.

Link to comment
Share on other sites

Link to post
Share on other sites

22 hours ago, ouroesa said:

They also added more PCI-e lanes for a total of 40 PCIe Gen 4 lane

Nobody is talking about this. The gen 4 thing is cool, but I still find myself asking "why bother?" particularly on the consumer end. That's an insane number of PCIe lanes for most people. It makes me wonder if it's cheaper to just give all the chiplets 40 lanes, rather than separating by tiers.

Make sure to quote or tag me (@JoostinOnline) or I won't see your response!

PSU Tier List  |  The Real Reason Delidding Improves Temperatures"2K" does not mean 2560×1440 

Link to comment
Share on other sites

Link to post
Share on other sites

56 minutes ago, JoostinOnline said:

It makes me wonder if it's cheaper to just give all the chiplets 40 lanes, rather than separating by tiers.

Keep in mind:
Ryzen CPU already have 32 PCIe Lanes. 

How many can you use? 20... (+4 for Chipset, so technically 24).

 

The Problem is that its limited by the Socket.

 

In theory you could modify the Socket, though I haven't found a Pinout of it right now to see if that might or might not be possible.

 

But i highly doubt that...

 

"Hell is full of good meanings, but Heaven is full of good works"

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, fluxdeity said:

Guys, I have the answer for AMD that they've been searching for.  Hitting 10GHz on <1 volt.  Yes, you read that right, and the technology already exists.

 

Netburst killinme.gif.7cde41966d659cfb81b9bb887f87c29d.gif

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, leadeater said:

Bandwidth and latency is still a problem with Ring which is why Intel moved to Mesh, Ring just doesn't scale. The inherent problem with Ring is the fact that it's a Ring.

 

image.png.6f3c22def3180b474ff40a7ee128e906.png

 

More cores means higher latency with Ring, you can't see it here very well but Mesh does win out easily in the higher core counts. You also get much higher total bandwidth with Mesh but significantly lower single core bandwidth. There's a good breakdown of that on Anandtech for there EPYC and Skylake-SP review.

 

image.png.98b3a172231b2d5505c599617b145b7e.png

 

Zen within a CCX is actually very good latency though and bandwidth is better than Ring or Mesh in most situations.

 

Um i think you completely misunderstood me there. i was commenting  purely on Zen1/Zen+ IF links compared to ring bus. I wasn't even really talking about Mesh at all or in CCX latency. I was just pointing out how the higher bandwidth and lower transmission time, (as well as seperate rings for each group of things), give intel a big advantage in latency ATM above and beyond write/read/write/read stuff you where talking about. And how Zen2 looks set to really close up that aspect regardless of any other changes.

 

Sorry if i was a bit unclear.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, CarlBar said:

give intel a big advantage in latency ATM above and beyond write/read/write/read stuff you where talking about.

Where though? IF is not the thing adding the latency between the CCXs, that's the L3 cache access layer causing that. If there is any advantage at all it's very minimal compared to other factors. All the current information you can find on IF in relation to latency involves the L3 cache so we don't actually have a measure of IF latency itself.

 

The other factor is that for Ring Bus the L3 cache is clocked to the core clocks yes but that also means that down clocks with the core clocks, if the core is in a lower power state at say 2.6Ghz then so is the L3 cache. At the heart of it all these buses are about the L3 cache so if we're talking about Ring Bus then we're talking about L3 cache.

 

Regardless IF is actually an umbrella term and more a protocol, it's transport layer agnostic so the transport layer latency is whatever that medium is i.e. PCIe. We're really talking about Scalable Data Fabric (SDF), between die and between socket are also different.

 

Edit:

Also the 4 rings in Ring Bus aren't for data, only 1 is so it's not 4x the bandwidth. The 4 rings are: Data, Request, Acknowledge and Snoop.

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, leadeater said:

Edit:

Also the 4 rings in Ring Bus aren't for data, only 1 is so it's not 4x the bandwidth. The 4 rings are: Data, Request, Acknowledge and Snoop.

 

Each of those rings has, (last time i did the math i need to dig my articles up again though as it's been a few weeks), 4 times the bandwidth, albit uni directional compared to AMD's bi-directional.But thats kinda one of the things i was getting at. AMD has to put all the traffic thats distributed across 4 buses, (though i couldn't find a good explanation of what the snoop bus was for), through a single much lower speed bus.

 

Also saying we don't know what effect the IF links have on latency is like saying we don't know what effect making everyone use 30mph mopeds will have on traffic congestion. You just have to look at  side data and use a bit of common sense on said data.

 

We know despite splitting transmission types across 4 different higher speed buses that the ring bus itself causes increased latency with more than 8 cores. If the ring bus itself is the issue that allready tells us the issue is one of the bus delaying things for some reason. And the only reason the ring bus would cause that kind of issue is if it;'s becoming overloaded with info.

 

Now obviously AMD and Intel processors aren't going to use identical intercore bandwidth. Thats a a Duh point. But they're absolutely going to be in the same general ballpark assuming Intel hasn't designed their CPU's to be horribly inefficient in this regard, (and if they had they could just fix the ring bus to work at bigger core counts by working on that). Given the massive bandwidth disparity between the Intel and AMD solutions the only way the 8 core parts couldn't be bandwidth starved is actual magic. Because we know from the limitations of Ring Bus roughly what kind of bandwidth is required for an intel speed latency solution and we know AMD literally isn't even remotely close to that currently which means unless AMD has actual magic to make intercore communication restricted almost entirely to purely in CCX there's no way they cannot be bandwidth starved on higher core count parts.

 

Obviously on HEDT and Server parts where intel clocks are lower and they're using a mesh that starts to fall off as intel's bandwidth drops, (because of the frequency point you mentioned), and their mesh just isn't quite as blazingly fast for a given bandwidth in the first place AMD's issues get less, helped by the fact that the lower clocks also mean each core is doing less work reducing the needed bandwidth for the communications. Not to mention cross die communications are as you noted different in Zen1 and Zen+ from interCCX.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×