Jump to content

Intel Lunar Lake announced - low power mobile and a hint to the future

porina

yxcdE9aKEE4ytevGs3EAnW-970-80.jpg.webp

 

Summary

Intel's Lunar Lake has been announced. It is targeted at premium low power mobile segment so will be Intel's offering going against the also recently announced Ryzen AI and Snapdragon X Elite offerings. It brings with it new P and E core architectures with two active tiles made on TSMC's N3B and N6 process nodes, and comes with on package DRAM.

 

Intel_Tech%20Tour%20TW_AI%20on%20Client%20PCs-51.png

Let's get AI out of the way. Lunar Lake's NPU unit claims 48 TOPS, slightly behind Ryzen AI's 50 TOPS but ahead of Snapdragon Elite X's 45 TOPS. While AMD will lead the numbers game here, they're all in the same ball park. Combined with the CPU and iGPU Intel claims a combined total of 120 TOPS available on package.

 

twyPx5WAxAZR4YCQVUiiWb.jpg

The P-cores get updated, with a claimed 14% IPC over Redwood Cove as used in Meteor Lake. It is confirmed that in Lunar Lake at least, hyperthreading has been removed. This enables better efficiency in P cores and E cores will make up for thread scaling workloads.

 

Intel_Tech%20Tour%20TW_Next%20Gen%20E-core%20The%20Skymont%20Architecture-17.png

E-cores also get a significant uplift, with a claimed IPC +2% average ahead of Raptor Cove, as used in Raptor Lake P-cores. Clocks are still likely to be lower so don't expect these to perform like for like. This could bring the perf delta between P and E cores much closer than before, like AMD's C and c cores.

 

Intel_Tech%20Tour%20TW_Xe2%20and%20Lunar%20Lakes%20GPU-59.png

Graphics moves onto 2nd gen with a claimed 1.5x performance compared to Meteor Lake.

 

image.thumb.png.6378948906074c87a36ef7f7347036e0.png

This shows the claimed improvements on their Xe2 graphics compared to Xe1, normalised for configuration/clocks. We'll have to wait for testing to see how this impacts gaming overall.

 

There are also other updates around connectivity which I wont go into here. More info in sources for those interested.

 

Quotes

Quote

Intel pulled the covers back on its Lunar Lake architecture during its Intel Tech Tour 2024, delivering deep dive architectural details in Taipei, Taiwan in advance of the company’s Computex 2024 keynote as its newest chips race to a Q3 launch. Intel’s Lunar Lake will have significant improvements in every facet of its design. Lunar Lake will primarily target mobile designs, powering some of the best laptops, though many of the fundamental changes will likely carry over to Arrow Lake and will be in some of the best CPUs for gaming.

 

My thoughts

I know most on this forum wont be interested in low power mobile offerings, but this does give a hint at what we could expect with Arrow Lake later this year. The updated P cores should help Intel fight against Zen 5. While the new E cores look very promising, it isn't a given they'll be used on Arrow Lake which may get stuck with the older ones. Of interest is that this will be all TSMC made silicon. A reason given was that it was best available at the time Intel started the design. Intel's fab improvements are very aggressive, and going TSMC could have been a de-risk move. So we still have unknowns about Intel's attempts to move to process leadership and will have to wait for the expected 20A Arrow Lake to see how that is going.

 

Sources

https://www.anandtech.com/show/21425/intel-lunar-lake-architecture-deep-dive-lion-cove-xe2-and-npu4

https://www.tomshardware.com/pc-components/cpus/intel-unwraps-lunar-lake-architecture-up-to-68-ipc-gain-for-e-cores-16-ipc-gain-for-p-cores

 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

I find it weird they didnt talk more about battlemage or about 15th gen. Even if they keep battlemage more under wraps (which they shouldnt if im honest cause both amd and nvidia are dropping bombshells on the gpu market and intel is in the midst of it all) 15th gen imo is smt they must stand on more especially since am4 got new cpus and am5 support is extended to 2027 and beyond. Intel can capitalize on it in the sense that we do not know if ryzen 10000 series (if it will  be named that) will be on it, while 15th gen prob will have the same socket as 16th and 17th at least if we go by 12th to 14th gen.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Millios said:

I find it weird they didnt talk more about battlemage or about 15th gen.

There is no 15th gen as that naming system is over. It'll be Core Ultra 2nd gen (1st gen being Meteor Lake). We've long known it is Arrow Lake and expected by the end of the year. In a similar parallel to people wondering where Nvidia 50 series are, the answer is a simple "too early". You don't want to talk too much about details until release is closer. Look for something another quarter out perhaps.

 

As for Battlemage dGPUs, who knows what's going on at Intel right now. It does feel like they are doing a LOT in many areas and this isn't a priority for them to push right now. As we have seen before, when things get tight at Intel, mobile gets first dibs over desktop offerings.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, porina said:

There is no 15th gen as that naming system is over. It'll be Core Ultra 2nd gen (1st gen being Meteor Lake). We've long known it is Arrow Lake and expected by the end of the year. In a similar parallel to people wondering where Nvidia 50 series are, the answer is a simple "too early". You don't want to talk too much about details until release is closer. Look for something another quarter out perhaps.

oh i didnt get the new naming scheme so thx about that. While i do agree that some details should be left out and amd gave a lot cause we'll get 9000 series next month. Idk it just feels like its a little too little. Some might think otherwise but personally i feel like they should give us some expectations larger than that

 

4 minutes ago, porina said:

As for Battlemage dGPUs, who knows what's going on at Intel right now. It does feel like they are doing a LOT in many areas and this isn't a priority for them to push right now. As we have seen before, when things get tight at Intel, mobile gets first dibs over desktop offerings.

I mean, with the speed their drivers get better and how much more they become a promising option for the future I think they should at least announce smt about it or say that there will be new and improved software for arc or smt smart memory like amd has. Yes their mobiles are their first priority but tbf not many will pay it much mind like nvidias AI where yes many care but more want to know about the gaming end of stuff and there... idk i might be wrong and others might disagree but it feels at least to me luckluster at best in that end

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

The P-cores get updated, with a claimed 14% IPC over Redwood Cove as used in Meteor Lake. It is confirmed that in Lunar Lake at least, hyperthreading has been removed. This enables better efficiency in P cores and E cores will make up for thread scaling workloads.

That's a very interesting choice. Also isn't it more energy efficient to load up a single core with two hardware threads compared to two actual cores. If the core isn't completely utilized by one thread then adding a bit more on is surely more power efficient than running this extra thread on a whole other core.

 

I guess this is more in the realms of scheduling and overall resource utilization, processes being allocated to P core threads when they could run sufficiently on E cores but don't get put there because there is a high number of read P core threads 🤷‍♂️

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, leadeater said:

I guess this is more in the realms of scheduling and overall resource utilization, processes being allocated to P core threads when they could run sufficiently on E cores but don't get put there because there is a high number of read P core threads 🤷‍♂️

In one of the source links it described that on Meteor Lake and older, the typical scheduling order was to fill thread one per P core until exhausted. Then fill E cores. Only when they are also exhausted, start filling the 2nd thread on each P core. So unless you're maxing out all the threads, the 2nd thread on P cores largely went unused.

 

With Lunar Lake they want to shift to filling the E cores first. If higher performance is determined to be necessary then promote to P cores. It's efficiency first, not performance first.

 

Keep in mind this might be optimised for mobile uses. If you're doing big compute where all threads will be loaded, that's a different consideration. It was mentioned that server versions of these cores could keep HT.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

24 minutes ago, porina said:

Keep in mind this might be optimised for mobile uses. If you're doing big compute where all threads will be loaded, that's a different consideration. It was mentioned that server versions of these cores could keep HT.

Yep but that's where 2 threads on one should be better for power compared to servers. It actually doesn't make logical sense to power up an entire extra core to service two workloads that could be done on one. Does depend if it could have been done on one of course but I don't see most general workloads on mobile platforms as being highly performant, see change to E core only with P core spill over change for these mobile parts.

 

24 minutes ago, porina said:

In one of the source links it described that on Meteor Lake and older, the typical scheduling order was to fill thread one per P core until exhausted. Then fill E cores. Only when they are also exhausted, start filling the 2nd thread on each P core. So unless you're maxing out all the threads, the 2nd thread on P cores largely went unused.

 

That is true it just doesn't make sense in a 2 vs 1 active core comparison in regards to power alone.

 

24 minutes ago, porina said:

With Lunar Lake they want to shift to filling the E cores first. If higher performance is determined to be necessary then promote to P cores. It's efficiency first, not performance first.

That is already how it works. Lunar Lake is different, it's E core only until E core is exhausted, there is no explicit 'performance' moving up to P cores like there was before.

 

Intel_Tech%20Tour%20TW_Lunar%20Lake%20Po

 

Meteor Lake does the higher demand move to P Core, Lunar Lake does not. I expect in reality it still will but to a much lesser degree than before. From what Intel is saying here it is a 'Fill then Spill' allocation strategy with I assume some smarts around what it will move to P cores when it needs to Spill rather than Fill.

 

But that is where it doesn't make a whole lot of sense, why not Spill to P core Thread 2 rather than P core 2. Intel is probably right, they would have tested it to know but it still doesn't actually make sense. It's like saying frying two steaks is more energy efficient in two pans than just using one pan that is big enough, both could possibly be true but 2 steaks in 1 pan should be more 'efficient'.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, leadeater said:

But that is where it doesn't make a whole lot of sense, why not Spill to P core Thread 2 rather than P core 2. Intel is probably right, they would have tested it to know but it still doesn't actually make sense.

I've said it before but IMO HT/SMT 2nd thread per core is often over-valued. It just doesn't add that much more throughput outside of some outliers like Cinebench, and you're taking away from ST perf of the existing thread. Not a problem if it is part of the same MT work, may be a problem if not. The optimisation isn't instantaneous power usage, but power to complete a task. A dedicated P core maybe could do work in less power than running it as a 2nd thread on an already loaded P core. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, porina said:

I've said it before but IMO HT/SMT 2nd thread per core is often over-valued

I agree performance wise sure, but if you just need something to run then core vs thread doesn't often make a lot of difference. A lot of things don't need "performance", they just need a place to run. Very few workloads actually stress all the cache, all the execution units etc, not even video editing in Premier would be doing that.

 

17 minutes ago, porina said:

It just doesn't add that much more throughput outside of some outliers like Cinebench, and you're taking away from ST perf of the existing thread. Not a problem if it is part of the same MT work, may be a problem if not.

The OS and Thread Director can know when to not do that anyway. Windows already groups process threads/handles to physical hardware threads where it thinks it makes most sense. In fact I'm pretty sure the Windows default is to load primary threads across cores first before secondary (HT) threads, something I'm sure Thread Director can change or give OS guidance on.

 

https://devblogs.microsoft.com/oldnewthing/20230620-00/?p=108358

https://devblogs.microsoft.com/oldnewthing/20040913-00/?p=37883

 

Is Intel just aligning with what Windows does already anyway? 🤷‍♂️

More active threads = more power, could it actually just be that simple?

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

I agree performance wise sure, but if you just need something to run then core vs thread doesn't often make a lot of difference. A lot of things don't need "performance", they just need a place to run. Very few workloads actually stress all the cache, all the execution units etc, not even video editing in Premier would be doing that.

 

The OS and Thread Director can know when to not do that anyway. Windows already groups process threads/handles to physical hardware threads where it thinks it makes most sense. In fact I'm pretty sure the Windows default is to load primary threads across cores first before secondary (HT) threads, something I'm sure Thread Director can change or give OS guidance on.

 

https://devblogs.microsoft.com/oldnewthing/20230620-00/?p=108358

https://devblogs.microsoft.com/oldnewthing/20040913-00/?p=37883

 

Is Intel just aligning with what Windows does already anyway? 🤷‍♂️

More active threads = more power, could it actually just be that simple?

It probably is, tbh.

 

But anyways... 14% is interesting ipc uplift. It doesn't look very good for intel at first glance competing with zen 5 since gen on gen performance uplift gets smaller as the power goes higher, but we'll see I suppose. Might be closer to competitive in a server use where more cores sit near optimal speed/power levels.

LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Note 10+ - Surface Book 2 15"

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I have a question. Is this supposed to compete with AMD's Zen5 mobile CPUs? Because clearly, those weak 4 P cores ain't going to compete at all. And the highest model tops out at 4P + 4E config?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Gat Pelsinger said:

I have a question. Is this supposed to compete with AMD's Zen5 mobile CPUs? Because clearly, those weak 4 P cores ain't going to compete at all. And the highest model tops out at 4P + 4E config?

We don't have specific Lunar Lake models listed yet so we can't tell exactly what the scope of the product is. It is meant to give great performance at low power. It may not necessarily be targeted to go against the higher end AMD mobile offerings. Higher performance at higher power will come with Arrow Lake.

 

Edit: in case it isn't clear, Intel are a bit further out on Lunar Lake product than Strix Point is. While we might start to see Strix Point offerings next month, Intel's previous guidance was Q3, so could be September.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, leadeater said:

if you just need something to run then core vs thread doesn't often make a lot of difference. A lot of things don't need "performance", they just need a place to run

Those things gets grouped to the same core, regardless of whether it has SMT or not. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, LAwLz said:

Those things gets grouped to the same core, regardless of whether it has SMT or not. 

huh same core regardless? What do you mean because without SMT there is only one thread so you can't put both on one core at the same time, it would go to another thread aka core. Windows will, or should, only be telling a process to wait if there are no CPU threads available. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

We don't have specific Lunar Lake models listed yet

4+4 , 5.1Ghz+3.7Ghz, 30W.

Workstation:  14700nonK || Asus Z790 ProArt Creator || MSI Gaming Trio 4090 Shunt || Crucial Pro Overclocking 32GB @ 5600 || Corsair AX1600i@240V || whole-house loop.

LANRig/GuestGamingBox: 13700K @ Stock || MSI Z690 DDR4 || ASUS TUF 3090 650W shunt || Corsair SF600 || CPU+GPU watercooled 280 rad pull only || whole-house loop.

Server Router (Untangle): 13600k @ Stock || ASRock Z690 ITX || All 10Gbe || 2x8GB 3200 || PicoPSU 150W 24pin + AX1200i on CPU|| whole-house loop

Server Compute/Storage: 10850K @ 5.1Ghz || Gigabyte Z490 Ultra || EVGA FTW3 3090 1000W || LSI 9280i-24 port || 4TB Samsung 860 Evo, 5x10TB Seagate Enterprise Raid 6, 4x8TB Seagate Archive Backup ||  whole-house loop.

Laptop: HP Elitebook 840 G8 (Intel 1185G7) + 3060 RTX Thunderbolt Dock, Razer Blade Stealth 13" 2017 (Intel 8550U)

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

huh same core regardless? What do you mean because without SMT there is only one thread so you can't put both on one core at the same time, it would go to another thread aka core. Windows will, or should, only be telling a process to wait if there are no CPU threads available. 

You can schedule multiple threads to a single core, even if that core doesn't support SMT. It will just do time-slicing.

The CPU in collaboration with the OS will quite often decide to park cores and shift the load over to fewer cores (and thus do time-slicing) because it saves power.

 

When a core should or shouldn't be parked is determined by several parameters, but in this case it is the CPHeadroom parameter that is the most important. It is the parameter that determines if a core should be activated (unparked) and start executing tasks. It is determined by measuring the load on the CPU core with the least amount of load.

 

Windows (in collaboration with CPUs post Skylake) will try to avoid having multiple cores active and will prefer scheduling multiple threads to the same few cores if it determines that to be optimal for power.

Link to comment
Share on other sites

Link to post
Share on other sites

41 minutes ago, LAwLz said:

You can schedule multiple threads to a single core, even if that core doesn't support SMT. It will just do time-slicing.

Well yes you can but that's not just going to happen, it really depends on a lot of things but if there is an idle thread and process is asking for CPU time then it's going to be given that unless there is a better reason to make it wait on an already utilized thread. That would be something like a thread as part of the same process and it's wanting data in the same L2 or L3 slice, something like that.

 

41 minutes ago, LAwLz said:

When a core should or shouldn't be parked is determined by several parameters, but in this case it is the CPHeadroom parameter that is the most important. It is the parameter that determines if a core should be activated (unparked) and start executing tasks. It is determined by measuring the load on the CPU core with the least amount of load.

 

Windows (in collaboration with CPUs post Skylake) will try to avoid having multiple cores active and will prefer scheduling multiple threads to the same few cores if it determines that to be optimal for power.

Only if the power plan you have chosen both enabled this and sets it aggressive enough that it'll actually be happening as opposed to quite minor increases in core loading causing utilization of another core/thread.

 

It's not correct to just say what you did because you can't know or guarantee that will actually happen, so it's not 'regardless' when there is an 'it depends'. In the most generic terms any OS will schedule a process asking for CPU time to a free CPU thread unless something makes that not happen, something you have pointed out which is quite interesting.

 

41 minutes ago, LAwLz said:

Windows (in collaboration with CPUs post Skylake) will try to avoid having multiple cores active and will prefer scheduling multiple threads to the same few cores if it determines that to be optimal for power.

Where do you get that information from, that's not anything I have ever seen on a desktop platform. That might be mobile/laptop platforms and again only based on the actual power plan chosen. The laptops I've used that were Intel based didn't do that, probably because I always go in a choose then high performance power plan.

 

Edit:

Oh also I'm not sure how this changes my pondering since if Windows has  parked other P cores and something needs CPU time and the E cores are busy or w/e else then an SMT thread on an already active core would use less power than un-parking another P core if that were to happen and be more responsive (potentially) than making it wait for CPU time. Either way Intel is mot likely right here, just seems odd to have disabled SMT and say it's more power efficient. Guess it's mostly a difference between micro scale analysis and macro scale.

Link to comment
Share on other sites

Link to post
Share on other sites

Do you think with the Xe2 launch Xe 1 GPUs are going to fall more in price?
Or do you think Intel still want to recoup as much cost as possible (and keep the price)?

Since they are already selling cards at losses

Link to comment
Share on other sites

Link to post
Share on other sites

On 6/4/2024 at 11:55 PM, leadeater said:

Well yes you can but that's not just going to happen

Time splitting and context switching happens thousands upon thousands of times every time you have your PC running. 

You can check this by opening up process monitor and check how many times a given process on your PC has done context switching. It will be hundred or thousands of times for each process. 

 

Since loading and executing multiple threads on the same core happens so often during normal operations, and doing it could save power, I don't see why they wouldn't take advantage of it. 

 

On 6/4/2024 at 11:55 PM, leadeater said:

if there is an idle thread and process is asking for CPU time then it's going to be given that unless there is a better reason to make it wait on an already utilized thread.

Yes, and that reason might also be "because it's less power efficient to fire up another core". 

 

 

On 6/4/2024 at 11:55 PM, leadeater said:

Only if the power plan you have chosen both enabled this and sets it aggressive enough that it'll actually be happening as opposed to quite minor increases in core loading causing utilization of another core/thread.

Well if we are talking about power efficiency then surely we should also assume that sane power efficiency settings are used. Most if not all power plans in Windows today will do core parking to some extent (again, why is Microsoft's documentation about this so bad?). How aggressive it is with that, and how long it waits before firing up another core will depend on the specific power plan and even the CPU model itself (since with Intel Speed Shift the CPU also takes part in making that decision). 

 

On 6/4/2024 at 11:55 PM, leadeater said:

Where do you get that information from, that's not anything I have ever seen on a desktop platform. That might be mobile/laptop platforms and again only based on the actual power plan chosen. The laptops I've used that were Intel based didn't do that, probably because I always go in a choose then high performance power plan.

I can't really tell you since I don't remember.

Part of it has been reading about various schedulers and how they work and context switching.

Part of it has been researching and troubleshooting Windows server (Windows server has a longer default quantum and thus does less context switching than Windows Pro).

Part of it had been reading about Speed Shift.

 

 

It's just info i have obtained over many years and articles. It is true though, since you can look at for example the CPHeadroom variable and see that's exactly what it does. It's entire purpose it to govern at what load the scheduler will decide to unpark another core at. 

 

 

I will meet Intel at one of their research centers later this year. I am not quite sure which people I will meet, but maybe I can ask them some more info about this if you want. 

 

 

On 6/4/2024 at 11:55 PM, leadeater said:

The laptops I've used that were Intel based didn't do that, probably because I always go in a choose then high performance power plan.

Both my Intel desktop and Intel laptop will park cores despite having more active threads than their core counts. I am fairly sure (I might be able to check) even on high performance mode. 

I will try and find a program (or write one myself) that can spawn let's say 10 or so threads with very little load on each and see if it actually assigns each thread to its own core. The hard part will be to make sure the threads actually does something, but not enough to go over whichever threshold exists.

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, LAwLz said:

Time splitting and context switching happens thousands upon thousands of times every time you have your PC running. 

You can check this by opening up process monitor and check how many times a given process on your PC has done context switching. It will be hundred or thousands of times for each process. 

 

Since loading and executing multiple threads on the same core happens so often during normal operations, and doing it could save power, I don't see why they wouldn't take advantage of it. 

Yes I am very aware of that is it's really not the point at all. Obviously sharing of CPU threads happens.

 

7 hours ago, LAwLz said:

Yes, and that reason might also be "because it's less power efficient to fire up another core". 

I feel you are missing the point here because I'm literally asking why it would be less power efficient to have HT enabled so you can you utilized that thread on an already powered up and utilized core. I'm asking why disable HT which means you HAVE to power up another P core to utilize another P core thread rather than allocate to an existing already un-parked P core.

 

The ponderance is why would it more MORE power efficient to NOT have HT enabled on the P cores. Using 2 threads on 1 core does not require un-parking the core, it's already un-parked.

 

Nothing about what you have posted actually goes towards my questioning of why it's more power efficient to have HT disabled, core parking makes my question even more valid.

 

7 hours ago, LAwLz said:

Both my Intel desktop and Intel laptop will park cores despite having more active threads than their core counts. I am fairly sure (I might be able to check) even on high performance mode. 

I will try and find a program (or write one myself) that can spawn let's say 10 or so threads with very little load on each and see if it actually assigns each thread to its own core. The hard part will be to make sure the threads actually does something, but not enough to go over whichever threshold exists.

Doing that is most likely going to hit the same issue I am likely having with my pondering. You're doing an isolated micro scale assessment not factoring in the total system load and processes, your processes might not end up on P cores at all but other things might. Something you do might cause another process to be moved to a P core and that core being un-parked.

 

What I was saying was that when using high performance power plan on my laptops I have already seen all cores with active utilization even if the percentage is low. They might get parked but not all too often, I would suspect much more so on a lower power plan but for the most part my laptop is always plugged in to power so I see no reason to run it on anything other than high performance.

Link to comment
Share on other sites

Link to post
Share on other sites

@LAwLz Interesting, Windows seems to be SMT aware even for process threads and parking.

 

image.thumb.png.8b12c41231174c6d4885e37b1eccd7bc.png

 

This is my Windows 11 desktop btw

image.thumb.png.d0d0f1f28728c5badb1a7495b4d88d31.png

 

At least when it comes to my desktop no cores will be parked.

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, leadeater said:

The ponderance is why would it more MORE power efficient to NOT have HT enabled on the P cores. Using 2 threads on 1 core does not require un-parking the core, it's already un-parked.

Cores are not so binary. It isn't just on/off. It can be clocked up and down its efficiency curve. I think I haven't mentioned it so far, but Intel are moving to 16.6 MHz steps for clock, down from the 100 MHz steps it feels like they had since practically forever. This allows finer control of operating point. AMD have been on 25 MHz steps for quite a while, at least since Zen 2. I don't recall if the earlier Ryzen had 25 MHz steps also.

 

So basically the question then becomes if two cores operating at low power higher efficiency are better than running one core with HT at a higher power point at lower efficiency. I'd further speculate that HT benefits more when cores are under heavy load. Under light loads, its existence may decrease efficiency. So this becomes a case of choose your optimisation point. The use case of these low power laptops is intermittent burst loads, not all cores blazing all the time.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, porina said:

So basically the question then becomes if two cores operating at low power higher efficiency better than running one core with HT at a higher power point at lower efficiency. I'd further speculate that HT benefits more when cores are under heavy load. Under light loads, its existence may decrease efficiency. So this becomes a case of choose your optimisation point. The use case of these low power laptops is intermittent burst loads, not all cores blazing all the time.

True, that is probably quite important for Intel considering their power usage curve has historically been quite sharp. The other factor too is what @LAwLz mentioned, just making a process wait if it's not latency sensitive. Part of Windows power management and core parking looks at process latency as well and it'll un-park a core based on that, not just solely on % utilization. Seems there is a lot of factors when it comes to Windows 11, looking at Windows 10 it has less. Windows 11 is even CCD/CCX aware for AMD and it can choose not to power up any cores in that to stay more power efficient when warranted.

Link to comment
Share on other sites

Link to post
Share on other sites

Can Intel make CUDA level drivers for their NPU?

 

Can Intel contribute to open source frameworks like pytorch so that it uses NPU acceleration and unified memory? I digged quite a bit into why I couldn't ROCM accelerate Stable Diffusion on my AMD framework APU, and it's because of an arbitrary memory limit, it just refuses to try and actually allocate the VRAM in the unified memory.

 

Say what you will about Nvidia, but their CUDA drivers never let you down. The acceleration just works. I want a credible competitor to CUDA.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, 05032-Mendicant-Bias said:

Can Intel make CUDA level drivers for their NPU?

 

Can Intel contribute to open source frameworks like pytorch so that it uses NPU acceleration and unified memory? I digged quite a bit into why I couldn't ROCM accelerate Stable Diffusion on my AMD framework APU, and it's because of an arbitrary memory limit, it just refuses to try and actually allocate the VRAM in the unified memory.

 

Say what you will about Nvidia, but their CUDA drivers never let you down. The acceleration just works. I want a credible competitor to CUDA.

They have "oneAPI" which I believe will use the NPU if possible. Not sure how widely supported it is though. 

oneAPI also supports AMD and Nvidia products too, so it would be great if we saw widespread adoption of it. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×