Snapdragon Summit 2023 - Qualcomm Announces a Slew of New Processors - Updated with Day 2

LAwLz · October 25, 2023

Summary

Yesterday Qualcomm held their Snapdragon Summit 2023 event. For those who don't know, Snapdragon Summit is Qualcomm's annual event where they present their new products that usually end up in products the following year.

This year they had a lot of announcements and products to show.

The big announcements were the Snapdragon X Elite SoC, a new SoC aimed at laptops and uses the new Oryon CPU cores, as well as the Snapdragon 8 Gen 3 which will be next year's flagship SoC.

The announcements

AI

Qualcomm is betting big on generative AI, saying it will be the next step in the evolution of computing.

Right now, most AI things run in the cloud. Qualcomm wants to move some of this computing on-device, to make it more responsive (as in fast), personal (access to data on the device), and private (data doesn't need to leave the device).

They showed some demos, including a 7 billion parameter model of Llama 2 (Meta's answer to ChatGPT) running on-device (didn't say which device but probably a laptop). They also showed Stable Diffusion running on a phone, and it could generate an image in less than 1 second (0,6 seconds to be precise). This is down from 15 seconds which was the time it took to generate an image last time they ran the demo (on a Snapdragon 8 gen 2).

There are over 16 different AI models such as OpenAI's Whisper, Meta's Llama 2, Google's AI Foundational models, and stability.ai's Stable Diffusion that have all been ported over to work on Snapdragon SoCs.

Qualcomm Oryon - Qualcomm's first fully custom CPU core since 2015

Oryon is designed by the team from NUVIA, a startup founded in 2019 by engineers from Apple and Google. Qualcomm bought NUVIA in 2021.

Quote

The new CPU leader in mobile computing

Quote

Usually, in our history of developing chip, and we develop flagchip chips every year, you do a lot of simulation work. Then when you actually get to your product you hope that your product is going to meet what you expected to see in the simulations. This is a different thing... It exceeded everything everything we expected to do in the simulation.

I think this is Cristiano trying to say that the actual hardware outperforms what the simulations said it would perform like.

Qualcomm confirmed that this is not a one-off thing. This is just "phase 1" in their CPU architecture design. My guess is that they will come out with a server CPU in the coming year(s), as well as a phone CPU core.

This first core however will be aimed at laptops. Qualcomm said during the presentation that they will show an Oryon based mobile platform in 2024.

If only they had used some of their AI capabilities to check the spelling of "Orion" before making all these presentations...

Some benchmarks:

GeekBench 6.2:

Single core:

Oryon - 3,227

i9-13980HX - 3,192

M2 Max - 2,841

A ~13,5% lead in single-core performance over the M2 Max.

If you cut back the clock speed of the Oryon so that it matches the single-core performance of the M2 Max, the Oryon core uses 30% less power.

If you cut back the clock speed of the Oryon so that it matches the performance of the i9, it delivers the same performance at 70% less power.

Multi-core performance was a bit more vague:

Same performance at 18 watts as the i7-1360P gives at 50 watts, or twice as fast at the same power consumption.

At 30 watts it gives the same performance as the i7-13800H does at 90 watts.

"50% faster vs M2" - This sounds a bit low to me, considering it's (spoiler) a 12-core chip (/spoiler) competing against a 4+4 core chip.

Snapdragon X Elite - Qualcomm's laptop processor featuring Oryon cores

Some numbers:

12 Oryon cores clocked at 3.8GHz.

It has a 2-core boost clock of up to 4.3GHz.

The cache is 42MB in total, although we don't know how it's distributed (how much is L1, L2 and L3).

"Adreno SD X Elite" GPU with up to 4.6 TFLOPs of compute performance. "Best in-class" according to Qualcomm. Supports DirectX 12.

Supports up to 3 external 4K HDR10 displays at 60Hz, or two 5K displays at 60Hz. The primary monitor (eDP) can be 4K 120Hz.

A massive Hexagon NPU with up to 46 TOPS (INT4). This is over 3 times more than the Snapdragon 8cx Gen 3 (their previous Windows chip) and a 100x increase over their chip from 2017.

NPU + CPU + GPU totals 75 TOPs.

30 tokens per second in the Llama 2 model. For comparison, Qualcomm estimates that 5-7 tokens per second is needed to output as fast as the average person reads. In other words, their NPU can output AI-generated text 5-6 times faster than the average person can read.

A 8x 16bit LPDDR5X memory controller that supports 8533MT/s RAM, totaling 136GB/sec of bandwidth.

Supports up to 64GB of memory.

Manufactured on an undisclosed "4nm" node, probably TSMC N4P.

Can be paired with a separate modem and Wi-Fi chipset (FastConnect 7800) to support Wi-Fi 7, Bluetooth 5.4, and 5G.

Supports using PCIe 4 or UFS 4.0 for connecting to storage.

PCIe 3.0 is used for connecting to other things such as the modem.

Support for up to 3 USB4 ports, 2 USB 3 ports (3.2 Gen2), and one USB 2 port.

In a first for Qualcomm, this chip supports AV1 encoding as well as decoding. This means all major GPU vendors (except Apple) now have products that can do AV1 encoding.

Some benchmarks from 3DMark WildLife Extreme:

2x faster GPU performance than the i7-13800H at the same power, or same performance at 74% lower power.

80% faster GPU performance than the Ryzen 9 7940HS at the same power, or same performance at 80% less power.

Launching in devices during the middle of 2024.

This means that while it might look very favorable compared to devices on the market right now, it remains to be seen how it stacks up against the products from Apple, Intel, and AMD that will launch between now and when Snapdragon X Elite devices launch.

Snapdragon 8 Gen 3 - 2024's flagship smartphone chip

It seems to me like the Snapdragon 8 Gen 3 will be a pretty good upgrade to the Gen 2, which was already a great chip to begin with.

Some highlights compared to the previous chip generation chip are:

30% faster CPU performance. 20% higher efficiency.
25% faster GPU performance. 25% higher efficiency and 40% better Ray Tracing.
98% faster NPU performance. Capable of running an LLM like the 7B Llama 2 model at 20 tokens per second. Supports up to 10B parameter models on-device. 40 higher performance per watt compared to the s8g2.
Supports up to 8.5Gbps LPDDR5x.

Qualcomm has also changed up the core layout compared to their previous chips.

Instead of a 1+3+4 design, they are going for a 1+5+2 design.

The prime core is a Cortex-X4 running at 3.3GHz.

They didn't disclose what the other cores were, but it is a fairly safe bet to say the "performance" cores will be Cortex-A720 at various clock speeds and the efficiency cores will be Cortex-A520.

It's been a long-known fact that the "efficiency" cores from ARM haven't actually been all that efficient. They use very little power, but doesn't really do a lot of work either. As a result, the middle cores have oftentimes been able to get more work done per watt of power.

It is worth noting that Cortex-A720 cores come in two flavors. One which is the full "high performance" version, and then there is a cut-down version which Arm calls "entry-tier". Since Qualcomm didn't even want to confirm if they used A720 cores to begin with, we don't know which version we might see in the s8g3, but it might be the case that not all "performance cores" are equal.

Qualcomm put a lot of emphasis on using generative AI to enhance photos taken with cameras.

One of the features they showed was taking a picture, and then using generative AI (like stable diffusion) to generate content on the sides of the picture, as if the picture had been taken with a wide angle lens. Not sure how I feel about this, but we'll see if it catches on.

The camera hardware also supports "Truepic with C2PA". This is a digital cryptographic signature that is added to pictures to validate how they were taken and if they have been modified. This is an open standard that has gained quite a lot of momentum in recent times. It's headed by a consortium that includes companies like Adobe, Arm, BBC, Intel, Microsoft, Twitter, Akamai and many more. The goal is to try and curb manipulated images from being spread, be it through traditional means or AI-generated ones.

AV1 decoding is supported, but not AV1 encoding. Same as the gen 2.

The first device with a Snapdragon 8 Gen 3 processor will be the Xiaomi 14 series, which will launch tomorrow.

(Sorry for the low-quality image, it's what Qualcomm provided)

Qualcomm S7 Gen 1 and S7 Pro Gen 1

New chipsets for headphones and earphones.

The big focus is on upgrading the processing power on the headphones and earbuds themselves. They offer almost 100 times higher AI-compute performance than the previous generation, which Qualcomm hopes will lead to better active noise cancelation, hearing loss compensation and other such features.

The difference between the Pro and the non-Pro model is what Qualcomm calls its "XPAN technology". What this is, in simple terms, is the ability to use Wi-Fi to send audio between your device and the audio equipment. The example Qualcomm gave on stage was that Bluetooth might be used when your phone is close to you, but then if you walk into another room and leave your phone behind, the devices will seamlessly switch to sending over Wi-Fi for its longer-range capabilities.

The Wi-Fi connection also allows for up to 192kHz lossless audio.

Both chipsets include support for Bluetooth LE Audio as well as Auracast (basically broadcast audio).

My thoughts

Lots of good announcements this year. I am personally very excited for the Snapdragon X Elite, which will use CPU cores developed by the NUVIA team.

A quick recap of why this is so exciting:

In 2019, a company called NUVIA was founded. The founders were Gerard Williams who used to be Senior Director and Chief CPU architect at Apple.

Manu Gulati who used to be lead SoC architect at Apple and then Google.

John Bruno who used to be a system architect at Apple and then Google (and further back, ATI GPUs).

The goal of NUVIA was to build a custom arm CPU core targeted at servers. Their goal (and projected targets) was a CPU core that was far and beyond better than the at the time x86 server processors.

Their estimates in 2020 were 40-50% higher IPC than Zen 2, while using 1/3 the power.

Then in 2021, Qualcomm bought NUVIA and told them to design a CPU core (probably by modifying their Phoenix architecture) for laptops instead. This is what resulted in what we now know as Oryon.

I hope that this will finally result in a good Windows on Arm laptop. Qualcomm's previous attempts at an SoC for Windows have been quite frankly pathetic. At first it was just the same SoCs as they used for phones, then they were slightly upscaled versions of the phone SoCs but nothing special (not enough to compete with x86-based laptop chips) and then they kind of gave up and used outdated CPU cores and didn't update their SoCs anymore. If you buy a Windows on Arm laptop today, you will end up with a 3-year-old CPU architecture.

Sources

You can watch the announcement here:

And the press kit can be found here:

https://www.qualcomm.com/news/media-center/press-kits/snapdragon-summit-2023-press-kit

Update:

Here are the day 2 announcements that I thought were noteworthy:

Qualcomm has a technology called "Snapdragon Seamless" which will let you connect your PC and your phone in a "seamless" way. More info here. The examples I have seen have been things like:

Let your phone and PC share keyboard and mouse. So you can use your keyboard that's connected to your PC to type on your phone.
Access files that are stored on your phone, on your PC (or vice versa).
Share audio sources between your phone and PC, so for example your headphones plugged into your PC can play audio that's coming from your phone or the other way around too.

Snapdragon X Elite stuff:

The Snapdragon X Elite and its 42MB of cache has "optimizations for virtualization and memory address translation".

The SoC is designed to be scalable, from fanless ultra-portable, to large performance laptops.

The GPU has "upgradeable drivers" (not sure what that means. Can't you upgrade the drivers on the current Snapdragon Windows PCs?)

An Arm native version of DaVinci Resolve will come out sometime next year.

The Snapdragon X Elite's NPU is on its separate power distribution system, which means it can be controlled power-wise independently of other components on the SoC.

The NPU will show up in the task manager, just like your GPU, CPU, memory, etc. This will probably come to other processors as well in the future, since both AMD and Intel are, or are planning on adding NPUs to their processors.

Here is a picture:

The Snapdragon X Elite was ~25% faster at compiling NotePad++ (for Win32) in VIsual Studio compared to the Intel i7-1360P.

That's pretty damn impressive, and the type of benchmark that gets me excited.

Snapdragon 8 gen 3 stuff:

Overall the Snapdragon 8 gen 3 will get about a 10% power saving compared to the s8g2.

They are adding support for PyTouch ExecuTorch, which is an open-source framework for ML.

Qualcomm is once again bringing up the possibility of integrating Stable Diffusion or something similar into the camera app itself, since it can run so fast.

Then Snapdragon 8 Gen 3 will have significant improvements to the camera processing. Here are some things they are introducing:

1) They will be able to do far better captures of depth data by leveraging AI. This can help with for example portrait shots where you want the background to be blurred.

2) A new framerate conversion engine that can generate new frames to convert 30 FPS video to 60 FPS.

3) Support for Samsung's new ISICELL HP3 sensor which is a 200 megapixel monster. Not sure how useful that really is, but it's supported.

4) They are working with some camera snesor manufacturers to finetune their software to work better with Snapdragon processors, in order to help with for example low-light pictures.

5) It now supports Dolby HDR for photo captures. Apparently, the images are backwards compatible with JPEG, so I presume this is based on Google's JPEG_R format. Not sure why everyone is so against JPEG XL but whatever...

Qualcomm has their own Frame motion interpolation (think Nvidia DLSS 3) and Game Super Resolution (think DLSS 2), which have both been updated for the s8g3.

venomtail · October 25, 2023

How would Snapdragon break into the laptop market with ARM CPU's, doesn't windows need X86 instruction set to operate? If Microsoft has some ARM version of Win 11 or rumored win 12 then maybe. What does that leave them with, Linux? Apple won't allow a competitor to use their hardware.

LAwLz · October 25, 2023

Just now, venomtail said:

How would Snapdragon break into the laptop market with ARM CPU's, doesn't windows need X86 instruction set to operate? If Microsoft has some ARM version of Win 11 or rumored win 12 then maybe. What does that leave them with, Linux? Apple won't allow a competitor to use their hardware.

Microsoft made a version of Windows called Windows RT back in 2011 and it supported arm processors. It never took off because it could only run "metro apps", and barely anyone wanted to rewrite their programs.

Then with Windows 10 they also released an arm version, but this time they added x86 emulation capabilities. Both Windows 10 and Windows 11 can run on arm processors. In fact, there are several devices out there including the Surface X Pro, Surface Pro 9 and ThinkPad X13s G1 that run Windows on arm processors.

The problem is that they have sucked pretty hard. The arm processors designed for Windows just haven't been good enough to be worth considering. With this Snapdragon X Elite, that hopefully changes.

jaslion · October 25, 2023

46 minutes ago, LAwLz said:

The arm processors designed for Windows just haven't been good enough to be worth considering.

It's not that they lack theoretical power per se. They are beefed up version of the mobile top end after all it's just that Windows kinda sucks whereas apple did a massive undertaking to make osx work properly on their silicon + make a fully fledged emulation layer compared to the quarter assed one windows for arm had at the start.

Apple is strongarming arm whilst Microsoft is treating it as some gimmick and the approach and result show the outcomes clearly

LAwLz · October 25, 2023

57 minutes ago, jaslion said:

It's not that they lack theoretical power per se. They are beefed up version of the mobile top end after all it's just that Windows kinda sucks whereas apple did a massive undertaking to make osx work properly on their silicon + make a fully fledged emulation layer compared to the quarter assed one windows for arm had at the start.

Apple is strongarming arm whilst Microsoft is treating it as some gimmick and the approach and result show the outcomes clearly

It seems to me like it's a bit of both, but Qualcomm currently shares the majority of blame.

Apparently, Microsoft have made a lot of strides in their emulation software. It was awful in the beginning, but has since improved a lot. Support for various instructions that help with x86 emulation have been quite lacking in Qualcomm's chips.

The impression I got right now is that it's Qualcomm that's behind. Like I said in my previous posts, the current Windows on Arm chip uses X1 cores. That's not exactly great. It's actually pretty awful.

Apple absolutely did a way better job at having everything ready on day one. Microsoft and Qualcomm both released products I'd not even call half baked, but both of them are improving them at lot since then (if this lives up to the hype which I think it will).

I think they are both treating it with respect now, which is something I couldn't say a year or two ago.

Paul Thexton · October 25, 2023

Those relative performance / power usage graphs for the GPU gave me a good chuckle. Apple got hauled over the coals by tech YouTubers (and rightly so) for the exact same type of graphs.

Core performance numbers look good. Will be interesting to see if Apple engages with the horsepower race or just continue on doing their own thing performance charts be damned.

leadeater · October 25, 2023

3 hours ago, LAwLz said:

"50% faster vs M2" - This sounds a bit low to me, considering it's (spoiler) a 12-core chip (/spoiler) competing against a 4+4 core chip.

3 hours ago, LAwLz said:

Manufactured on an undisclosed "4nm" node, probably TSMC N4P.

That probably has a bit to do with it but also the number of cores often doesn't matter as much as the power limits and the performance efficiency of the cores. 12 cores doesn't mean a lot, Ryzen 7900X also has 12 cores, so does a i9-10920X

Given the technology acquired/company to make this product I'm more hopeful than normal but it's still Qualcomm so probably will not live up to the marketing.

Doobeedoo · October 25, 2023

This is looking great, big move from current stuff that was rather weak for laptop form. Personally I'd really like to see a laptop with hybrid chip with this SoC and regular CPU & GPU or an APU. Since there is software that simply won't work with ARM chip, but it's efficiency for laptop and general use will give huge battery life. Having general laptop hardware along can be used for games for example. Would be ultimate laptop really. Anything prior Nuvia cores with Windows ARM limits too is just rubbish.

Also it will be interesting to see their phone SoC with new cores, that will be quite a jump for Android phones.

leadeater · October 25, 2023

35 minutes ago, Paul Thexton said:

Those relative performance / power usage graphs for the GPU gave me a good chuckle. Apple got hauled over the coals by tech YouTubers (and rightly so) for the exact same type of graphs.

I mean that is why their were complaints #LearnFromTheBest lol.

They have obviously exactly copied Apple here which is exactly the problem, argh. Make it seem like a lot of information with detail but it's actually useless.

Marketing will always be crap so whatever, the souls of mathematicians and statisticians will be sacrificed every time haha.

LAwLz · October 25, 2023

1 hour ago, leadeater said:

They have obviously exactly copied Apple here which is exactly the problem, argh. Make it seem like a lot of information with detail but it's actually useless.

I wouldn't go as far as to say it's useless.

There is still a lot of information that can be seen in the graphs even if one axis isn't labeled. For example, we can see in the graph how the power-to-performance curve looks on Oryon, AMD, and Intel processors.

Since we know which software they used and even which laptop model they compared against, we could in theory run the same benchmark and figure out some of the numbers, and from there we can extrapolate the other numbers.

But I think doing so would miss the point Qualcomm was trying to make. The focus wasn't on which score it got in some benchmark. The point was how much power was necessary to reach various performance goals, and I think the graph has enough information to accurately see that.

On the charts where they talked about performance, they did give us specific numbers. On the charts where they talked about efficiency, they gave us number for the power used. Of course it would have been better if they had labeled the y axis as well, but I think that would have distracted from the point they were trying to make, which was "up to 2x faster GPU performance at the same power consumption" and "Intel needs 50 watts to get the performance we get at ~10 watts, so about 74% less".

Now, the i7-13800H's GPU operates quite far beyond the peak of its efficiency curve in those scenarios, and it certainly wouldn't be 74% less power usage for the same work if the i7 was restricted to let's say 20 watts. In that case the Snapdragon X Elite (I am getting tired of writing that name) would "only" be ~50% more efficient.

But that's not really an issue with the graphs being unlabeled. That's just the same kind of cherry-picking we see in pretty much all first-party benchmarks. They find the sweet spots where their product looks the best and present those numbers. Everyone does that and it' why you should not blindly trust first-party benchmarks, because they typically don't paint the full picture.

But I have to say this looks very, very promising, and I don't think Qualcomm is lying with any of their numbers. I think the things to keep in mind is that:

1) They are comparing themselves against products that are on the market right now, and their competitors will probably launch a new generation of products before this reaches the market.

2) Their numbers are cherry-picked to show Qualcomm in the best light possible. You won't get an average of 74% power savings or twice the performance. That will be the best-case scenario and the typical scenario might be half of that if not even lower.

But even keeping those two things in mind, this still looks very promising.

Something else to keep in mind is that this chip doesn't just have to match the x86 alternatives, it has to actually outperform them by a fairly significant margin.

It's not enough to match the performance of an AMD or Intel chip because then once we start adding the overhead for emulating x86 (which you will have to do a lot on Windows) then you end up with a slower device anyway.

It's also not enough to just slightly outperform AMD or Intel either because then we end up in a situation where performance might be the same, but you still have the drawbacks of some things just not working in Windows on Arm.

This chip has to be so good that it is still better even with translation overhead factored in, and the remaining performance (or battery) advantage has to be big enough that you're willing to sacrifice some compatibility to get it (how much of a sacrifice that is varies from person to person).

Dracarris · October 25, 2023

2 hours ago, LAwLz said:

emulation software

Emulation is where the trouble already starts. As long as they only do that, i.e., emulating x86 instructions with an emulation layer at runtime, performance will always be piss poor.

What made the transition to ARM so successful with Apple is exactly the fact that they did NOT emulate x86, but translated everything at install time - once. So it costs a lot of power and execution time once, during which you can optimize the translated version for ARM as good as automated processes can do that. Additionally, there was/is some hardware support for x86, like direct support for the x86 memory model.

So as long as Qualcomm+Microsoft don't manage to pull off sth comparable, Windows on ARM will continue to suck, at least that's what I'm foreseeing.

If 13% faster than M2 holds true (where we compare a 5nm against a 4nm chip, probably both from TSMC), well good job, at least until we see what M3 can do - next Monday.

leadeater · October 25, 2023

20 minutes ago, LAwLz said:

There is still a lot of information that can be seen in the graphs even if one axis isn't labeled. For example, we can see in the graph how the power-to-performance curve looks on Oryon, AMD, and Intel processors.

You can't scale it to anything so it's actually useless. All the information is just in the text which is fine, just that the graph is of no real use at all which doesn't serve the purpose of a graph.

Without proper scales, units and labels graphs actually do tell you nothing, it's a picture not a graph.

Stahlmann · October 25, 2023

32 minutes ago, Dracarris said:

Emulation is where the trouble already starts. As long as they only do that, i.e., emulating x86 instructions with an emulation layer at runtime, performance will always be piss poor.

it depends. If the ARM processor is 50% faster and loses 30% of it's performance to emulation it's still faster and still has the efficiency gains. They could just brute-force it if the jump is big enough. How this all translates to reality remains to be seen. I think it's too early to form any kind of conclusion on how "revolutionary" these processors are.

And remember, ALWAYS take first party graphs (pictures) and marketing material with a grain of salt. Obviously they want to portrait themself in the best way possible.

leadeater · October 25, 2023

16 minutes ago, Dracarris said:

Emulation is where the trouble already starts. As long as they only do that, i.e., emulating x86 instructions with an emulation layer at runtime, performance will always be piss poor.

That's not even how x86 emulation in Windows for Arm works though. The historic problem was always the the Arm SoCs in the Windows devices absolutely sucked natively and then also didn't have optional in hardware support implemented to help with x86 emulation. Some parts were present but not everything that was possible to make it as best as possible.

Quote

The WOW64 layer of Windows allows x86 code to run on the Arm64 version of Windows. x86 emulation works by compiling blocks of x86 instructions into Arm64 instructions with optimizations to improve performance. A service caches these translated blocks of code to reduce the overhead of instruction translation and allow for optimization when the code runs again. The caches are produced for each module so that other apps can make use of them on first launch

Instruction translation at run time isn't how it is done, or even was done on much only iterations of Windows Arm. One of the big changes is x86-64 support, because previously you could only run 32bit x86 apps.

But above all the SoC were to blame for the performance, not Windows translation implementation which may or may not be worse (I mean to a degree that matters) than Apple's. Microsoft was given a water pistol to fight against Apple's M134 Minigun.

Dracarris · October 25, 2023

15 minutes ago, Stahlmann said:

it depends. If the ARM processor is 50% faster and loses 30% of it's performance to emulation it's still 20% faster and still has the efficiency gains. They could just brute-force it if the jump is big enough. How this all translates in reality remains to be seen. I think it's too early to form any kind of opinion.

Well, first of all, that's not really how percentages work. Losing 30% of overall performance results in 105% of perf, aka 5% faster compared to the 100% reference point.

Next, this may apply to single-core performance but certainly not to multi-core and everyday workloads universally. Even if it would in theory be 20% faster, it needing that extra horsepower all the time would be detrimental to energy efficiency. Not exactly a good match for mobile devices.

Betting on regular emulation together with raw performance is more of a brute-force approach that will probably never be able to deliver the various aspects of user experience that current Macbooks can offer and are loved for.

But well okay, if it actually was the SoCs primary fault like @leadeater said, then the end results might be much better. Lets wait and see.

Dracarris · October 25, 2023

5 minutes ago, leadeater said:

That's not even how x86 emulation in Windows for Arm works though. The historic problem was always the the Arm SoCs in the Windows devices absolutely sucked natively and then also didn't have optional in hardware support implemented to help with x86 emulation. Some parts were present but not everything that was possible to make it as best as possible.

Well bummer. Then the delivered performance is even more ridiculous. Given how much WoA sucked, I was simply assuming they were doing regular emulation like console emulators do for example.

leadeater · October 25, 2023

25 minutes ago, Dracarris said:

Well bummer. Then the delivered performance is even more ridiculous. Given how much WoA sucked, I was simply assuming they were doing regular emulation like console emulators do for example.

I mean the translation first time for a code block is done at run time but everything is reused so much the reality is it's all coming from cache, even the first time you use an app. I still don't know exactly how much performance is lost with the emulation layer itself, we know Apple's is 20%, but we've not really had a good set of standardized hardware with known good hardware x86 capabilities to measure against. At least we will get that soon.

I'd be happy if it's like 30%-40% performance loss, given actually good native hardware performance being presented here the end result is the performance of a high end laptop from ~3 years ago with the power efficiency of today. Not a bad proposition argument for Windows Arm at that point.

The issue is all the benefits of Arm and power efficiency are typically greatly exaggerated so for all the good this is for Arm it's going to be an inferior product performance and performance per watt compared to the latest and greatest Intel/AMD ~30W mobile products.

Right now today it's competing with mobile products like the AMD Z1 extreme and other product families based on the same technology, but it's not out yet so Z2 Extreme or whatever else may or may not be out.

Just remember Qualcomm is comparing against the most favorable products possible and also the one(s) they are obligated to. They absolutely are not picking the best available x86 options because the picture is vastly different if they did.

hishnash · October 25, 2023

3 hours ago, leadeater said:

I mean the translation first time for a code block is done at run time but everything is reused so much the reality is it's all coming from cache, even the first time you use an app. I still don't know exactly how much performance is lost with the emulation layer itself, we know Apple's is 20%, but we've not really had a good set of standardized hardware with known good hardware x86 capabilities to measure against. At least we will get that soon.

A non trivial part of the perf loss of rosseta2 is purply down to 4kb page sizes, running benchmarks on linux with apple silicon testing 4kb and 16kb kernels there is a 10% to 15% perf hit in some tasks (native arm tasks just running with 4kb pages compared to 16).

The biggest issue MS have had is unlike apples SOCs the chips MS have been dealing with do not have the option of running in x86 memory access ordering mode so almost all reads and writes in the translated binary need to be wrapped in atomic semiphore to ensure memory ordering is as the compiler originally expected. This has a big perf impact compared to apples chips were the core running the task switches to a memory access model that behaves the same as x86.

LAwLz · October 25, 2023

6 hours ago, Dracarris said:

Emulation is where the trouble already starts. As long as they only do that, i.e., emulating x86 instructions with an emulation layer at runtime, performance will always be piss poor.

I think you got too caught up with a specific word.

The emulation/translation is a bit more complicated than that but I see that leadeater has already posted about it so I won't bother.

6 hours ago, leadeater said:

You can't scale it to anything so it's actually useless. All the information is just in the text which is fine, just that the graph is of no real use at all which doesn't serve the purpose of a graph.

Without proper scales, units and labels graphs actually do tell you nothing, it's a picture not a graph.

I disagree.

The chart tells us that the GPU in the Snapdragon X Elite gets roughtly 80% higher performance than the GPU in the Ryzen 7940HS when running at ~30 watts. It also tells us that at around 15 watts the Snapdragon X Elite gets the same performance as the Ryzen at around 60 watts. It also tells us that those numbers are gathered from 3DMark WildLife. That's far from "useless" in my eyes.

What the precise number is for "performance" is kind of irrelevant for the point Qualcomm is making, which is about efficiency.

I guess you could say the lines on the graph itself are somewhat useless, but they are there for illustratory purposes. They are more of a visual aid than. And it is actually possible to gather the information I mentioned above from just the graph itself.

Again, I agree that the graph would have been better if it had a labeled Y-axis, but I won't call it useless just because of that. It serves it purpose and does show some valuable information.

5 hours ago, leadeater said:

The historic problem was always the the Arm SoCs in the Windows devices absolutely sucked natively and then also didn't have optional in hardware support implemented to help with x86 emulation. Some parts were present but not everything that was possible to make it as best as possible.

I had this debate with someone else on this forum a while ago. I am trying to find it right now, but there were a lot of interesting things in that discussion.

If I remember correctly, Microsoft did update their translation at some point, and it really was awful to begin with. But they have made improvements to it, not just to increase performance but also add support for 64bit applications.

I would say Windows on arm have had three major issues.

1) The hardware hasn't been good. It just hasn't.

2) The translation and support for x86 software was slow and bad. Just take the lack of 64-bit support as an example. Such a massive oversight would never have happened with Apple.

3) Nobody seems to want to write arm native applications for Windows. And by "nobody" I include Microsoft themselves. Despite Windows on Arm launching in 2017, they didn't have an arm native variant of for example Visual Studio until very late 2022. It shouldn't take ~5 years for Microsoft to port one of their most widely used applications (one used by developers to make applications) to arm. That should have been available on day 1.

With this chip, I hope that issue 1 is fixed.

It seems like Microsoft has already fixed issue 2.

Issue 3 probably won't get fixed until we have compelling arm devices on the market, which I hope will happen next year.

With a bit of luck, Windows on arm might be worth using by ~2025. I won't hold my breathe, but I am optimistic. I would love to see Windows on arm take off.

leadeater · October 25, 2023

10 hours ago, LAwLz said:

I disagree.

The chart tells us that the GPU in the Snapdragon X Elite gets roughtly 80% higher performance than the GPU in the Ryzen 7940HS when running at ~30 watts. It also tells us that at around 15 watts the Snapdragon X Elite gets the same performance as the Ryzen at around 60 watts. It also tells us that those numbers are gathered from 3DMark WildLife. That's far from "useless" in my eyes.

What the precise number is for "performance" is kind of irrelevant for the point Qualcomm is making, which is about efficiency.

No the graph does not tell us that, the text on the right does and the text in the media release does. Remove that text which is actually not part of "the graph" and tell me now what it shows? Nothing right. Something is higher than something else but no way to actually discern by how much which as I said makes the graph useless.

You could present that information any other way and come out with the same result. Like I agree it's a nice pictorial to visualize the information but as a graph it's useless.

There is no way to argue against that. A graph is a graph, this is not one and I think it's quite a horrible trend to pretend and present something as a graph as a billion dollar company when it's not actually a graph. Because we both know there aren't values for other data points on the "graph" to draw those lines as is, they could have some but they aren't giving that information to us and you can't get it from that graph properly either because the Y axis is utterly worthless to achieve that.

10 hours ago, LAwLz said:

I guess you could say the lines on the graph itself are somewhat useless,

Yes literally and exactly the point so what are we discussing? It's utter garage graph and anyone that made it should be ashamed. This is stuff you learn in school and there is actually no excuse. It's being done to hide and distort information to protect commercially sensitive information which is fair so just don't make a graph and then there is no problem.

These graphs in a mathematics class would be a fail, literally end of story. That's as objective as it gets heh. If a child can do better then well... ya know

leadeater · October 25, 2023

2 hours ago, LAwLz said:

I had this debate with someone else on this forum a while ago. I am trying to find it right now, but there were a lot of interesting things in that discussion.

If I remember correctly, Microsoft did update their translation at some point, and it really was awful to begin with. But they have made improvements to it, not just to increase performance but also add support for 64bit applications.

I would say Windows on arm have had three major issues.

1) The hardware hasn't been good. It just hasn't.

2) The translation and support for x86 software was slow and bad. Just take the lack of 64-bit support as an example. Such a massive oversight would never have happened with Apple.

3) Nobody seems to want to write arm native applications for Windows. And by "nobody" I include Microsoft themselves. Despite Windows on Arm launching in 2017, they didn't have an arm native variant of for example Visual Studio until very late 2022. It shouldn't take ~5 years for Microsoft to port one of their most widely used applications (one used by developers to make applications) to arm. That should have been available on day 1.

With this chip, I hope that issue 1 is fixed.

It seems like Microsoft has already fixed issue 2.

Issue 3 probably won't get fixed until we have compelling arm devices on the market, which I hope will happen next year.

With a bit of luck, Windows on arm might be worth using by ~2025. I won't hold my breathe, but I am optimistic. I would love to see Windows on arm take off.

The problem is pretty much that Microsoft wasn't commercially invested in ARM and to be honest I still don't think they are. It's a very typical doing just enough to not get criticized and affect company and stock values. I'm very skeptical of it since if Microsoft really wanted to make a fully working Windows ARM transition effort and plan they could and the fact they aren't is very telling to be honest.

As much as I hate their front end development choices for a lot of their applications they are well more than capable as a company of writing good core software, kernels, drivers etc.

The problem is Windows ARM won't make money, not like investing company resources in to Azure and Microsoft 365 does/will and both of those don't even require users/customers to be running Windows at all.

williamcll · October 26, 2023

I sure hope they put this on the next nintendo switch.

hishnash · October 26, 2023

1 hour ago, leadeater said:

The problem is pretty much that Microsoft wasn't commercially invested in ARM and to be honest I still don't think they are. It's a very typical doing just enough to not get criticized and affect company and stock values. I'm very skeptical of it since if Microsoft really wanted to make a fully working Windows ARM transition effort and plan they could and the fact they aren't is very telling to be honest.

MS are almost never willing to go all in with anything. They always do 10%, just enough to look like they are doing stuff.

1 hour ago, leadeater said:

The problem is Windows ARM won't make money, not like investing company resources in to Azure and Microsoft 365 does/will and both of those don't even require users/customers to be running Windows at all.

I could see MS welcoming ARM on the PC space if they could use this to push more users (and companies) cloud based rental for high power machines, eg windows 365 and xbox cloud streaming. But in this model MS is not interested in good local ARM software support infact they sort of want it to be only 'ok' locally so that you $$ up for a subscription for a powerful machine remotely.

It is well within MS power to make future windows versions have strict restrictions on OEM vendors that would push them hard to line up with MS requirements in shipping platforms that strongly encourage users to subscribe.

Dracarris · October 26, 2023

10 hours ago, LAwLz said:

2) The translation and support for x86 software was slow and bad.

Except for the lack of 64-bit support - what exactly was "slow and bad" about it? You two just lectured me that there indeed was sophisticated and buffered install-time translation already in place, which left me quite puzzled about the actual performance, or lack thereof, that we saw. So now I'm again puzzled.

10 hours ago, LAwLz said:

disagree.

The chart tells us that the GPU in the Snapdragon X Elite gets roughtly 80% higher performance than the GPU in the Ryzen 7940HS when running at ~30 watts.

For that you have to assume that the y-axis starts a 0 at the axis interception point on the bottom left. An information we do not have (only the x-axis is labeled). I also highly doubt that outside of the single data point they labeled the shape of the curves is anything more than made up or roughly approximated.

This keynote is almost literal copy-pasta from Apple, but in blue and orange. Really cheap. And if someone copies bad habits from Apple, they're the only one to blame. Literally one of the first thing you learn as a kid. If your friend or an older kid does sth stupid, there is absolutely no reason or justification for you to repeat it.

LAwLz · October 26, 2023

5 hours ago, Dracarris said:

Except for the lack of 64-bit support - what exactly was "slow and bad" about it? You two just lectured me that there indeed was sophisticated and buffered install-time translation already in place, which left me quite puzzled about the actual performance, or lack thereof, that we saw. So now I'm again puzzled.

Install-time translation is not a silver bullet that fixes all performance issues that can arise when translating x86 instructions to arm instructions.

Which instructions you choose to translate also to matters a lot.

I haven't found the other thread where someone who seemed very knowledgeable about this commented, but from what I remember they gave some pretty good examples of how things have changed. For example, there were some instructions (which probably didn't exist when Microsoft wrote the original translation function) that sped up unaligned memory access by a lot. Since there are a lot of unaligned memory accesses in your typical Windows program, it mattered hugely.

Arm has been adding more and more instructions to their instruction set that makes translating x86 code faster and easier, but since Microsoft originally wrote their translation layer for the Snapdragon 835, many of those instructions weren't used (because they didn't exist).
The updates to the translation layer have been to use newer and faster arm instructions.

Since the translation layer isn't open source we don't know for sure what chances they are making, but we do know that the dotnet team has been working on improving arm performance a lot. For example here is a pull request to implement the LDARP instruction. That's an instruction that didn't exist when the original translation layer was written, and the performance gain is quite dramatic for certain situations (twice as fast). If the dotnet team is implementing it then I don't see why the translation team wouldn't be implementing it.

There was also the issue that a lot of Microsoft's own code was pretty terribly optimized for arm. So even applications that were written in native code were slow. So the translation was rather poor and not optimized (using outdated instructions instead of more efficient ones), and then when Windows ran the arm code that was generated, it was slow at running that code too.

14 hours ago, leadeater said:

No the graph does not tell us that, the text on the right does and the text in the media release does.

The graph do in fact show that, even without the text on the side or at the bottom.

6 hours ago, Dracarris said:

For that you have to assume that the y-axis starts a 0 at the axis interception point on the bottom left.

and @leadeater

For that particular claim ("80% higher performance when running at 30 watts"), yes we have to assume that the y-axis is linear. But I think that's a fairly safe and logical assumption to make. If we want to be like that ("you're just assuming and we shouldn't do that!"), we could also say that we make an assumption that "W" stands for "watts" and not "whatever-qualcomm-wants-to-measure-power-consumption-in so therefore them saying 10W might mean 80 watts".

But we don't need to make any assumptions in order to draw a straight line between various points in the graph and see where they line up. For example, if we draw a horizontal line anywhere from the y-axis we can see where on the x-axis it intersects with the Snapdragon and Intel/AMD lines, and getaccurate numbers from there. No matter how they have tried to mess with the y-axis, not start at 0, make it logarithmic, etc, it does not matter. We will always get accurate and comparable numbers by drawing a line from the y-axis and seeing where it intersects because the x-axis is labeled.

We do not need the y-axis to be labeled in order to say "The Snapdragon gets the same performance at ~8 watts as the Ryzen gets at ~15 watts" (in 3DMark WildLife Extreme).

Since the X-axis is labeled we can pick any arbitrary point on the y-axis (which isn't labeled) and then use that as the reference.

We will always be able to infer "the Snapdragon gets the same performance at X watts as the Ryzen gets at Y watts" from the graph using that methodology, and therefore the graph isn't useless or meaningless.

The graph would be a lot better if they labeled the y-axis as well, and I wonder why they didn't do that, but calling it useless is just not true.

Sign In

Snapdragon Summit 2023 - Qualcomm Announces a Slew of New Processors - Updated with Day 2

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites