Jump to content

so lion cove is-

a) inefficient

b) barley holds the ST and IPC crown while drawing near 2x the power (vs zen5)

c) skymont has 1.02x the ipc of raptor cove and can reach decent clock speeds 

d) skymont die space is less than 3mm squared while lion cove is almost double that (zen 5 is 5mm squared i think)(i read all this a while ago and i can't find this info again so i may be wrong)

 

So, in a hypothetical situation where intel never made the lion cove and just had a 32 core skymont CPU for the top end, that cpu would have been more efficient, had better MT then the 285k and 9950x, been cheaper to make (less die space), would have ~13600k ST perf (assuming intel clocks it to 4.8-5), actually have alder lake+ gaming perf. this CPU would have lower gaming and st perf than current ARL but it would have been a better fit to the lower performance, higher efficiency thing ARL/LNR have going on. So, what am i missing? this would be a fantastic desktop CPU topping efficiency, cost per frame (with a 12/16c version?)(cheaper to make), MT and laptop (again, efficient and a lot more cores/threads) charts. More wins than the entirety of intel has right now. At least this is a fantastic fit for laptops where a 32C cpu will be able to reach almost desktop levels of performance in both MT and ST. 

 

To anyone who says skymont is not that efficient at higher clocks, just look at the sd865 vs sd870 perf to power curve, both have the same silicon just tuned differently, currently skymont is tuned for efficiency but if it was the only core intel could improve its high-end efficiency with more optimizations to the node, and/or the architecture itself and better tuning.

 

To be clear, I know I am wrong, but why am I wrong? we know this works because Sierra forest exists and Clearwater forest will exist.

Link to comment
https://linustechtips.com/topic/1588543-why-does-lion-cove-exist/
Share on other sites

Link to post
Share on other sites

I think its because intel P cores and E cores comes from different tech trees. P cores originally were intel core series, while E cores used stuff from atom series. They have different architecture and leads to problem. For example in 12 gen early series, p cores have hidden avx512 and e cores dont have avx512, intel thus hid it via microcode, and subsequently removed all avx 512 instruction sets. You can enable them via modded bios on 12 gen processors with a circular intel logo.

Also I think it has something to do with shared l2 and decode-launch micro architecture issues, but well I don't really have statistic data on that.

Link to post
Share on other sites

31 minutes ago, Penpilot said:

to anyone who says skymont is not that efficient at higher clocks, just look at the sd865 vs sd870 perf to power curve, both have the same silicon just tuned differently, currently skymont is tuned for efficiency but if it was the only core intel could improve its high-end efficiency with more optimizations to the node and/or the architecture itself.

Many good questions. I'll pick this bit for now because it is easier. Look at AMD C vs c cores for a close parallel to what you describe. The process of tuning smaller cores for efficiency prevents them from clocking up. To make them clock up, they get big again and lose efficiency. Make your choice which you optimise for. 

 

IPC is also a bit of an idealised measure. I think if you were to make a CPU from a lot of Skymont cores, as you load them up with consumer style workloads they'll fall off much faster from the ideal peak. The forest based CPUs target use cases not affected so much by that.

 

31 minutes ago, Penpilot said:

b) barley holds the ST and IPC crown while drawing near 2x the power (vs zen5)

Do you know of anyone doing perf vs power curves for them? Is your statement for a particular operating point?

 

Edit: in general, where you made claims, please can you link to a reference demonstrating those claims so we at least are looking at the same info.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

8 minutes ago, porina said:

Do you know of anyone doing perf vs power curves for them? Is your statement for a particular operating point?

I'm just talking about the max ST perf they can achieve, and the power they consume in that state

 

8 minutes ago, porina said:

as you load them up with consumer style workloads they'll fall off much faster from the ideal peak

has anyone tested that? i feel since they are present in clusters of 4 intel could pull a zen 3 and make a 8 core cluster, and even of the clusters were only 4 cores, only 4 cluster have to "talk" in gaming scenarios for a total of 16 threads, which doesn't sound that bad.

 

1 minute ago, Tridefender said:

Basically they are not, as i said, ecores comes from atom series, and P cores comes from intel cores

i was referring to sd865vs sd870 in that particular line

Link to post
Share on other sites

3 minutes ago, Tridefender said:

They share l2 cache, thats what makes them that bad

well, if they were the only cores, the l2 cache could have been made larger and several other optimizations could have been made. like the reverse of zen4 to zen4c for skymont.

Link to post
Share on other sites

2 minutes ago, Penpilot said:

well, if they were the only cores, the l2 cache could have been made larger and several other optimizations could have been made. like the reverse of zen4 to zen4c for skymont.

Then there are still launch and decode differences. You can try do more fact checking on that.

Link to post
Share on other sites

23 minutes ago, Tridefender said:

I didnt find much information on that, but they should be continued on such basis, which means e cores will always have less decoder width

just google it, skymont has a 8bit 9 wide decode while golden cove has a 6 wide and zen 3 has 4 wide. idk what exactly is going on with this but that's the data. i don't even know why this is relevant? if anything wont fetch cycles be more important?

 

Link to post
Share on other sites

1 hour ago, Penpilot said:

I'm just talking about the max ST perf they can achieve, and the power they consume in that state

That's a product level decision operating point and doesn't tell you about the underlying efficiency. 

 

Following is perf/W curves for Lunar Lake. We need same for Arrow Lake vs Desktop Zen 5 since there are differences from mobile versions in both cases, but the same channel didn't repeat this in their Arrow Lake coverage.

image.thumb.png.6e7532038fbad44bb399b2eb40b48509.png

Source: https://youtu.be/ymoiWv9BF7Q?si=_2Srhf17pg7O1kBM&t=517

 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

31 minutes ago, porina said:

That's a product level decision operating point and doesn't tell you about the underlying efficiency. 

i wasn't talking about mobile but it is interesting to see that LNR actually lives up to its claims. geekerwans tests are usually correct. stopped watching when i stopped understanding what he was saying

Link to post
Share on other sites

12 minutes ago, Penpilot said:

i wasn't talking about mobile

That was just an example. I did say we ideally need similar for Arrow Lake.

 

12 minutes ago, Penpilot said:

geekerwans tests are usually correct. stopped watching when i stopped understanding what he was saying

He's still new to me but that sort of testing got my attention. If I'm not mistaken there was an English channel in the past, but it seems long inactive. I mainly pick out the charts and if needed Google Lens translate things on it.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

2 hours ago, Penpilot said:

inefficient

We don't truly know. It might even be TSMC's node not behaving good with their architecture. At the end, it's definitely more efficient, especially Lion Cove in Lunar Lake. It seems that Intel maybe didn't try too hard on Lion Cove, and they didn't overspend on something they knew wouldn't perform (source : Moore's Law is Dead when talking to an Intel Engineer).

 

2 hours ago, Penpilot said:

barley holds the ST and IPC crown while drawing near 2x the power (vs zen5)

Why comparing with AMD? Intel at least beat themselves (kind of). IPC is better, but I think mostly seen in a fat threaded load, rather than lightly threaded where stuff like the wider decoder and better single threaded performance by dropping HT, wouldn't help, and the lower clocks will definitely set them back there. But the increase in cache, and better branch prediction helps in gaming.

 

2 hours ago, Penpilot said:

c) skymont has 1.02x the ipc of raptor cove and can reach decent clock speeds 

2 hours ago, Penpilot said:

would have ~13600k ST perf (assuming intel clocks it to 4.8-5)

Sources? Mathematically that would make sense, but I really doubt that skymont can match raptor cove, and they still clock much less though. If that might be actually correct, the clock speed is what must be holding them back. Either way, that is the reason Intel had to put in P cores, or else it would just not compete in single threaded at all (would fall terribly behind last gen).

 

edit - Watch this - 

edit 2 - And this as well - 

 

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×