Is ARM Really More Efficient than x86-64, Or Is It Much More About Optimization From Top to Bottom

linuxChips2600 · January 18, 2021

(This is definitely a bit long but please bear with me till the end; jump to the question in bigger font near the end of the post if you don't want to read everything)

What inspired me to create this post is some thoughts that occurred to me recently about Nvidia's and AMD's latest GPU releases. Stock shortages aside, we can't deny that computer parts, especially graphics cards, have become increasingly power hungry due to Dennard scaling (not quite but related to Moore's Law) breaking down around 2006 stemming from current leakages caused by quantum effects (e.g. quantum tunneling) at ever shrinking nodes. Therefore, one of the ways that this problem has been combated so far is through the use of alternatives to x86-64, the biggest perhaps being the ARM architecture family. This has actually been pretty successful already, as a quick glance at https://gs.statcounter.com/os-market-share tells us that when looking at the desktop AND mobile OS market as a whole (despite a few obvious problems with that approach, such as the difference in nature of desktop vs mobile devices), Android had already surpassed Windows in market share.

Given that electricity does not come cheap for some, especially those in developing countries, and not to mention the increased environmental impact that increasingly power-hungry GPUs and even CPUs will have (see https://www.tomsguide.com/news/ps5-vs-xbox-series-x-with-great-power-comes-greater-electric-bills), this trend in desktop parts needing greater and greater power draw is pretty worrying (at least to me). Furthermore, I'm sure that no one wants to have to get 800+ or even 1000+ watt power supplies just to make sure that their computer doesn't just randomly shutdown in the middle of a gaming session, and even then still end up tripping their breaker (which isn't all that implausible given how common 10 amp breakers on 120V outlets are especially in apartments at least in the U.S.). And while "technically" a driver update can solve the issue, depending on the TDP of the GPU/CPU itself it could drastically reduce its performance (one need to look no further than rumors about how RTX 3070+ GPUs in laptops will have up to a 40% performance deficit due to power constraints especially with the Max-Q variants).

Furthermore, Apple just demonstrated this past year that there's lots to be gained in both performance and power efficiency by switching over to a custom ARM architecture (although by how much is still disputable as Apple throttled pretty hard the Intel CPUs that they were putting onto their Mac-Minis and laptops). Furthermore, ARM has at least a reputation of being much more efficient than x86-64, especially with their widespread-use in high-performance smartphones such as the latest Samsung Android flagships and iPhone flagships. But the deeper I dove into the debate of x86-64 vs ARM efficiency, the more confused I got. For example, this webcodr.io website (https://webcodr.io/2020/11/ryzen-vs-apple-silicon-and-why-zen-3-is-not-so-bad-as-you-may-think/) and even the following post on this forum (at least the OP one - https://linustechtips.com/topic/1214401-apple-and-arm-a-quasi-insiders-thoughts/?tab=comments#comment-13758213) both emphasized that ARM is more efficient than x86-64. However, here's also another 3 different articles/posts (including another post on this forum) that emphasize the specific micro-architectural design of the chips themselves rather than whether it's simply ARM vs x86-64 when it comes to efficiency, and even outright state that beyond a certain wattage limit both x86-64 and ARM exhibit very similar levels of efficiencies (even the webcodr.io website stated earlier somewhat acknowledges this as well):

1. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/the-final-isa-showdown-is-arm-x86-or-mips-intrinsically-more-power-efficient

2. https://www.extremetech.com/mobile/312076-what-kind-of-performance-should-we-expect-from-arm-based-macs

3. https://linustechtips.com/topic/1157141-how-come-pcs-use-so-little-power-compared-to-other-machines/?tab=comments#comment-13310148

So as the title states, my question really is - when it comes to maximizing performance/watt, is it much more about basically optimizing every layer of your "ecosystem" all the way from the hardware microarchitecture to the APIs, system applications, and even user applications themselves (kind of like how Apple has always done it especially with iPhones), OR does using ARM in general truly have a performance/watt advantage over x86-64 which can be capitalized without sacrificing too much performance?

P.S. Honestly, as a computer science student looking to do web development but also looking to graduate with a specialization in computer systems (e.g. networking, computer architecture, etc.), I'm not sure if I will like the answer either way. Microsoft so far has been seemingly dragging their feet on Windows on ARM (which is not the same as Windows 10x; Windows 10x is more for competing against ChromeOS than anything else), and if it wasn't a x86-64 vs ARM issue, then I would absolutely hate having to learn all of the quirks and features of a dozen+ different microarchitectures for anything I develop (especially if I decide to switch to system programming) just to get an idea of how I would go about optimizing the user experience for my program. (e.g. oh crap, I forgot that ARM Cortex XYZ can only support 2GB of RAM, better utilize storage more or oh no, these custom ARM instructions can't carry over to x86-64 so I better use a whole 'nother library, etc, etc. and I know that modern compilers and interpreters have made it much less of an issue, but then I also don't want to go towards the other end of the spectrum and spend the rest of my life coding nothing but iOS apps; plus I see all the time web dev jobs ads calling for "full-stack" developers or just knowledge of a whole slew of programming languages/ecosystems like knowing development for both iOS and Android.)

Anyway hope that wasn't too long.

Edited January 18, 2021 by linuxChips2600
Added a "TLDR"

Roswell · January 18, 2021

Boils down to ARM having a less complicated instruction set, really.

Apple also doesn’t throttle their Intel chips... They run at the same power limits as every other machine.

TetraSky · January 18, 2021

16 minutes ago, linuxChips2600 said:

Anyway hope that wasn't too long.

It was, too long, didn't read.

From the title alone : Because X86/x64 has a bunch of legacy instruction sets it has to support, from not only the hardware, but also the software, making it less efficient overall than the newer ARM architecture and the OSes built upon it...

It's actually a bit more complicated than this, but that's a part of the reason if I remember.

linuxChips2600 · January 18, 2021

13 minutes ago, Vitamanic said:

Apple also doesn’t throttle their Intel chips... They run at the same power limits as every other machine.

I think that's the case if you compared it with other Macs of similar price, but I'm pretty sure NOT if you compared it with Intel's original TDP of the chip (sorry my fault I didn't specify), e.g.: https://youtu.be/MlOPPuNv4Ec?t=118

Roswell · January 18, 2021

2 minutes ago, linuxChips2600 said:

I think that's the case if you compared it with other Macs of similar price, but I'm pretty sure NOT if you compared it with Intel's original TDP of the chip (sorry my fault I didn't specify), e.g.: https://youtu.be/MlOPPuNv4Ec?t=118

That’s thermal throttling implemented by Intel. Ultrabooks in the PC world do the same.

captain_to_fire · January 18, 2021

1 hour ago, linuxChips2600 said:

Apple throttled pretty hard the Intel CPUs that they were putting onto their Mac-Minis and laptops

The throttling of Intel CPUs in Macs are due to the thermal constraints of the chasis and Intel chips being inefficient in power management thanks to their failure to shrink down their transistors beyond 14 nm.

Intel tries to solve it with Alder Lake with 10 nm but only time and reviews will tell if it can match Apple Silicon’s performance per watt. Remember that the M1 MacBook Air beats the 16” MacBook Pro with i9 9th gen and even Tiger Lake chips when it comes to video editing and code compiling tasks.

my name is guru iam tech · January 18, 2021

2 hours ago, linuxChips2600 said:

(This is definitely a bit long but please bear with me till the end; jump to the question in bigger font near the end of the post if you don't want to read everything)

What inspired me to create this post is some thoughts that occurred to me recently about Nvidia's and AMD's latest GPU releases. Stock shortages aside, we can't deny that computer parts, especially graphics cards, have become increasingly power hungry due to Dennard scaling (not quite but related to Moore's Law) breaking down around 2006 stemming from current leakages caused by quantum effects (e.g. quantum tunneling) at ever shrinking nodes. Therefore, one of the ways that this problem has been combated so far is through the use of alternatives to x86-64, the biggest perhaps being the ARM architecture family. This has actually been pretty successful already, as a quick glance at https://gs.statcounter.com/os-market-share tells us that when looking at the desktop AND mobile OS market as a whole (despite a few obvious problems with that approach, such as the difference in nature of desktop vs mobile devices), Android had already surpassed Windows in market share.

Given that electricity does not come cheap for some, especially those in developing countries, and not to mention the increased environmental impact that increasingly power-hungry GPUs and even CPUs will have (see https://www.tomsguide.com/news/ps5-vs-xbox-series-x-with-great-power-comes-greater-electric-bills), this trend in desktop parts needing greater and greater power draw is pretty worrying (at least to me). Furthermore, I'm sure that no one wants to have to get 800+ or even 1000+ watt power supplies just to make sure that their computer doesn't just randomly shutdown in the middle of a gaming session, and even then still end up tripping their breaker (which isn't all that implausible given how common 10 amp breakers on 120V outlets are especially in apartments at least in the U.S.). And while "technically" a driver update can solve the issue, depending on the TDP of the GPU/CPU itself it could drastically reduce its performance (one need to look no further than rumors about how RTX 3070+ GPUs in laptops will have up to a 40% performance deficit due to power constraints especially with the Max-Q variants).

Furthermore, Apple just demonstrated this past year that there's lots to be gained in both performance and power efficiency by switching over to a custom ARM architecture (although by how much is still disputable as Apple throttled pretty hard the Intel CPUs that they were putting onto their Mac-Minis and laptops). Furthermore, ARM has at least a reputation of being much more efficient than x86-64, especially with their widespread-use in high-performance smartphones such as the latest Samsung Android flagships and iPhone flagships. But the deeper I dove into the debate of x86-64 vs ARM efficiency, the more confused I got. For example, this webcodr.io website (https://webcodr.io/2020/11/ryzen-vs-apple-silicon-and-why-zen-3-is-not-so-bad-as-you-may-think/) and even the following post on this forum (at least the OP one - https://linustechtips.com/topic/1214401-apple-and-arm-a-quasi-insiders-thoughts/?tab=comments#comment-13758213) both emphasized that ARM is more efficient than x86-64. However, here's also another 3 different articles/posts (including another post on this forum) that emphasize the specific micro-architectural design of the chips themselves rather than whether it's simply ARM vs x86-64 when it comes to efficiency, and even outright state that beyond a certain wattage limit both x86-64 and ARM exhibit very similar levels of efficiencies (even the webcodr.io website stated earlier somewhat acknowledges this as well):

      1. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/the-final-isa-showdown-is-arm-x86-or-mips-intrinsically-more-power-efficient

      2. https://www.extremetech.com/mobile/312076-what-kind-of-performance-should-we-expect-from-arm-based-macs

      3. https://linustechtips.com/topic/1157141-how-come-pcs-use-so-little-power-compared-to-other-machines/?tab=comments#comment-13310148

So as the title states, my question really is - when it comes to maximizing performance/watt, is it much more about basically optimizing every layer of your "ecosystem" all the way from the hardware microarchitecture to the APIs, system applications, and even user applications themselves (kind of like how Apple has always done it especially with iPhones), OR does using ARM in general truly have a performance/watt advantage over x86-64 which can be capitalized without sacrificing too much performance?

P.S. Honestly, as a computer science student looking to do web development but also looking to graduate with a specialization in computer systems (e.g. networking, computer architecture, etc.), I'm not sure if I will like the answer either way. Microsoft so far has been seemingly dragging their feet on Windows on ARM (which is not the same as Windows 10x; Windows 10x is more for competing against ChromeOS than anything else), and if it wasn't a x86-64 vs ARM issue, then I would absolutely hate having to learn all of the quirks and features of a dozen+ different microarchitectures for anything I develop (especially if I decide to switch to system programming) just to get an idea of how I would go about optimizing the user experience for my program. (e.g. oh crap, I forgot that ARM Cortex XYZ can only support 2GB of RAM, better utilize storage more or oh no, these custom ARM instructions can't carry over to x86-64 so I better use a whole 'nother library, etc, etc. and I know that modern compilers and interpreters have made it much less of an issue, but then I also don't want to go towards the other end of the spectrum and spend the rest of my life coding nothing but iOS apps; plus I see all the time web dev jobs ads calling for "full-stack" developers or just knowledge of a whole slew of programming languages/ecosystems like knowing development for both iOS and Android.)

Anyway hope that wasn't too long.

ummmm it was to long

A51UK · January 18, 2021

7 hours ago, TetraSky said:

It was, too long, didn't read.

From the title alone : Because X86/x64 has a bunch of legacy instruction sets it has to support, from not only the hardware, but also the software, making it less efficient overall than the newer ARM architecture and the OSes built upon it...

It's actually a bit more complicated than this, but that's a part of the reason if I remember.

I think you misunderstand, they no legacy instruction sets, and ARM architecture is not a new type of instruction sets at all. X86 come out in 1985, where ARM come out in 1985. There the same age but the different is X86 is a CISC and ARM is RISC base. New is not all better, have look at Itanium that come out in 2001.

It also unfair to compare x86-64 that is on 14nm to a 5nm CPU. You need to compare a 14nm to a 14nm to see how efficient there are to each other. Also, note that Intel and AMD aim for performance over efficient, If Intel or ARM aim for efficiently over performance then you may find x86-64 may be as efficient or more them ARM with more performance in different area. As CISC more instructions for may different area you can find that is high performance, Some instruction can do same as thing in one go and not take like 3 or more instruction to do with RISC.

RISC uses small type of instructions but many to do a job, where CISC just one instruction to do the same job.

Also note that PowerPC was a RISC base CPU like ARM but was not as efficient compare to x86-64.

A good example of CISC instruction, here https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

It's looking over time that ARM may move over to CISC. Do look up Itanium that was VLIW and should have the benefit of CISC and RISC. It come out in 2001 !!. https://en.wikipedia.org/wiki/Itanium

LogicalDrm · January 18, 2021

-> Moved to CPUs, Motherboards and Memory

Nena Trinity · January 18, 2021

It's absolutely less bloated since it has less legacy baggage, how ever older software or stuff using older standards will run worse on ARM for sure... UwU

A51UK · January 18, 2021

1 minute ago, Nena Trinity said:

It's absolutely less bloated since it has less legacy baggage, how ever older software or stuff using older standards will run worse on ARM for sure... UwU

I do not understand why people think old stuff is bad or legacy stuff that not how software works. There really not anything that old standards, there no standards for software at all.

Having legacy stuff can be great for the users and programs, may time new thing are worse, e.g. Windows 7 GUI vs Windows 10 GUI. I like Windows 7 GUI far more.

Having legacy stuff mean I do not need to buy all new software for every new OS.

A program made with Cobol could easily run a wall or maybe better than something make a new language like Java as it has less overhead not to do with ago on ARM.

You build on top of greater not reinventing the wheel.

New != Good

Old != Bad

Legacy != Bad

igormp · January 18, 2021

17 hours ago, linuxChips2600 said:

we can't deny that computer parts, especially graphics cards, have become increasingly power hungry due to Dennard scaling (not quite but related to Moore's Law) breaking down around 2006 stemming from current leakages caused by quantum effects (e.g. quantum tunneling) at ever shrinking nodes.

No, not really. Dennard scaling refers to the performance per area of a single unit of computing. When talking about moderns CPUs, and specially GPUs, we're trying to push for more parallel compute power performance. GPUs in special have been relying on increased transistor density in order to fit more streaming units, and that's where most of the performance from each generation comes from.

The problem with power is that both AMD and Nvidia (but specially nvidia) are pushing really hard for higher clocks, way past the point where the relation between clock and power scaling becomes exponential (example image bellow). You could decrease the clocks and voltages in order to lose 5% perf, and shave 1/3~1/4 of the power usage.

17 hours ago, linuxChips2600 said:

Therefore, one of the ways that this problem has been combated so far is through the use of alternatives to x86-64, the biggest perhaps being the ARM architecture family.

No, the ISA has nothing to do with the process node, clock scaling and power consumption. The thing about an ARM x x86 design is how efficient you can make an specific use-case be, and how you can make proper use of the available transistor count for your µArch. Remember that the ISA is just a facade to your actual µArch, and that moderns high-end processors, be it ARM or x86, have some really similar design choices.

17 hours ago, linuxChips2600 said:

Furthermore, Apple just demonstrated this past year that there's lots to be gained in both performance and power efficiency by switching over to a custom ARM architecture (although by how much is still disputable as Apple throttled pretty hard the Intel CPUs that they were putting onto their Mac-Minis and laptops).

It's hecking efficient due to the fact that it's built on a 5nm process, has tons of coprocessors (heterogeneous computing) in order to offload tasks from the CPU in a more efficient and faster way, and every component is tightly coupled. The CPU being based on an ARM ISA is just a pretty small detail when you consider everything, and I bet they would have achieved the same results (or even better!) with any other ISA.

17 hours ago, linuxChips2600 said:

Furthermore, ARM has at least a reputation of being much more efficient than x86-64, especially with their widespread-use in high-performance smartphones such as the latest Samsung Android flagships and iPhone flagships. But the deeper I dove into the debate of x86-64 vs ARM efficiency, the more confused I got.

Efficiency it directly related to what task you need to do. In a phone, with basic media consumption and lots of hardware accelerators (reminder that a phone has a SoC with tons of other peripherals apart from the CPU), having a CPU that doesn't have tons of legacy hardware and specialized extensions (such as AVX2) surely helps since those aren't really used. When you got to anything more demanding/high performance, both ARM and x86 deliver pretty much the same performance and have a similar power consumption.

17 hours ago, linuxChips2600 said:

However, here's also another 3 different articles/posts (including another post on this forum) that emphasize the specific micro-architectural design of the chips themselves rather than whether it's simply ARM vs x86-64 when it comes to efficiency, and even outright state that beyond a certain wattage limit both x86-64 and ARM exhibit very similar levels of efficiencies (even the webcodr.io website stated earlier somewhat acknowledges this as well):

YES. I totally agree with this point.

17 hours ago, linuxChips2600 said:

So as the title states, my question really is - when it comes to maximizing performance/watt, is it much more about basically optimizing every layer of your "ecosystem" all the way from the hardware microarchitecture to the APIs, system applications, and even user applications themselves (kind of like how Apple has always done it especially with iPhones)

Yes!

17 hours ago, linuxChips2600 said:

OR does using ARM in general truly have a performance/watt advantage over x86-64 which can be capitalized without sacrificing too much performance?

No, it doesn't, you're sacrificing lots of things and the actual µArch implementation is what matters. You mostly don't notice such sacrifices and think that ARM is pretty close in performance to x86 while having amazing power consumption because the µArch was tailored for such simple use-cases that you commonly see most ARM CPUs being used (as in, browsing facebook and watching youtube).

17 hours ago, linuxChips2600 said:

computer architecture

Make sure to take a class into that, and read Patterson's book, it's amazing.

January 18, 2021

19 hours ago, linuxChips2600 said:

(This is definitely a bit long but please bear with me till the end; jump to the question in bigger font near the end of the post if you don't want to read everything)

So as the title states, my question really is - when it comes to maximizing performance/watt, is it much more about basically optimizing every layer of your "ecosystem" all the way from the hardware microarchitecture to the APIs, system applications, and even user applications themselves (kind of like how Apple has always done it especially with iPhones), OR does using ARM in general truly have a performance/watt advantage over x86-64 which can be capitalized without sacrificing too much performance?

Anyway hope that wasn't too long.

It was a good read. Although did not follow up with the links.

1st, your asking for an opinion or a fact is making it difficult to answer.

2nd, there are so many variables to this, you won't actually get a good straight forward meaningful answer.

WHY??????

Because people have a hard enough time comparing AMD to Intel which is already apples to oranges and they're both X86 processors.

Then, aside that, efficiency of a Cpu, does not make up the efficiency of the entire rig. In example, my video card is 250w, but my cpu is only 65w.

Fact side, they are both (ARM and x86) VERY efficient to their specific uses and designs.

ARM, being mostly a BGA product will not really "take off" to most gamer's desktops. You can't upgrade the damn chip, then why buy into it. It will come to EOL and that's that.

linuxChips2600 · January 19, 2021

3 hours ago, igormp said:

The problem with power is that both AMD and Nvidia (but specially nvidia) are pushing really hard for higher clocks, way past the point where the relation between clock and power scaling becomes exponential (example image bellow). You could decrease the clocks and voltages in order to lose 5% perf, and shave 1/3~1/4 of the power usage.

I can confirm this to be true just from personal experience with overclocking (oh boi did my triple-fan Vega 56 get toasty trying to shoot for 2 GHz under full load), and Linus himself even has said that with any processor, it takes a lot of power and voltage (and hence energy) to squeeze the very last 5%-10% performance available from any chip (now I'm not quite sure then how exactly Nvidia gets to have their chips scale so freaking low in temps; like have you see JaysTwoCents show that table where a high end Nvidia card can clock-scale up to -200 C?!?! Custom power delivery and cooling aside ofc)

3 hours ago, igormp said:

Make sure to take a class into that, and read Patterson's book, it's amazing.

I literally was going to take a class on computer architecture at my university this semester, where the computer science department recommended that every professor use that exact book as the course textbook. But alas the professor teaching that class right now is forcing all students to come to campus to take the course, and COVID rates and even crime rates (e.g. violent robbery in board daylight) are basically at an all time high in the city around the campus area (not to mention that many hospitals in the vicinity are maxed out in their capacity). So I didn't want to risk it and am taking something different in the meantime.

2 hours ago, ShrimpBrime said:

2nd, there are so many variables to this, you won't actually get a good straight forward meaningful answer. le people who hang around this forum so I wanted at least to get some direct input from them as well.

I honestly figured that, but even from this forum I still get some of that "ARM is more efficient than x86-64" vibe, but I also know there's many very knowledgable people who hang around this forum so I wanted at least to get some direct input from them as well. Now that I think about it, didn't Linus himself even try to dispel some of the ARM vs x86-64 myths in a WAN show or a video or something? If someone could link me to that video that'd be great (although I don't want people spending a lot of time just to hunt for the vid as LMG has released a ton of videos since their beginnings more than a decade ago).

@igormpSince it seems that you're saying Dennard scaling still applies at least somewhat (or maybe I read ur reply wrong, please let me know if so), then what's the deal with this portion of the Wikipedia article on Dennard scaling - https://en.wikipedia.org/wiki/Dennard_scaling#Breakdown_of_Dennard_scaling_around_2006 ?

9 hours ago, LogicalDrm said:

-> Moved to CPUs, Motherboards and Memory

Thank you; I wasn't sure where I would put this post since I'm fairly new to this forum, so thanks for the help

linuxChips2600 · January 19, 2021

4 hours ago, igormp said:

It's hecking efficient due to the fact that it's built on a 5nm process, has tons of coprocessors (heterogeneous computing) in order to offload tasks from the CPU in a more efficient and faster way, and every component is tightly coupled. The CPU being based on an ARM ISA is just a pretty small detail when you consider everything, and I bet they would have achieved the same results (or even better!) with any other ISA.

Btw would that help explain why Intel worked with Altera for a long time before just buying them outright (and now same goes for AMD & Xilinx), as both AMD and Intel know how big of a role ISA/integrated circuit design can play in chip manufacturing? Same could go for the Nvidia + ARM deal, but of course Nvidia has always been a bit... complicated for me imo.

igormp · January 19, 2021

25 minutes ago, linuxChips2600 said:

But alas the professor teaching that class right now is forcing all students to come to campus to take the course, and COVID rates and even crime rates (e.g. violent robbery in board daylight) are basically at an all time high in the city around the campus area (not to mention that many hospitals in the vicinity are maxed out in their capacity).

Oh man, that's a bummer

Still, better late than sorry, see if you can manage it in the upcoming semesters! If you want to get started with some silly game that actually gives you a nice insight, I recommend trying out Nandgame.

28 minutes ago, linuxChips2600 said:

Since it seems that you're saying Dennard scaling still applies at least somewhat (or maybe I read ur reply wrong, please let me know if so), then what's the deal with this portion of the Wikipedia article on Dennard scaling - https://en.wikipedia.org/wiki/Dennard_scaling#Breakdown_of_Dennard_scaling_around_2006 ?

No, I agreed that it's going south. But that wiki link summarizes what I tried to say in a pretty nice way: "The breakdown of Dennard scaling and resulting inability to increase clock frequencies significantly has caused most CPU manufacturers to focus on multicore processors as an alternative way to improve performance." We're simply circumventing it in other ways

As I said (or tried to, not so sure if I managed to get the idea across), this applies especially to GPUs, just take a look at their core counts generation after generation.

26 minutes ago, linuxChips2600 said:

Btw would that help explain why Intel worked with Altera for a long time before just buying them outright (and now same goes for AMD & Xilinx), as both AMD and Intel know how big of a role ISA/integrated circuit design can play in chip manufacturing?

Those FPGA acquisitions are more related to special-purpose chips, which is a growing market since your general-purpose CPUs can't handle everything you may want to throw at it, specially when it comes to power efficiency. Better have a small hardware that know perfectly well how to do something and only that thing than a huge piece of silicon that will go mostly unused and will take much longer to accomplish the same thing. Remember what I said about apple, their coprocessors and heterogeneous computing? It goes in that same line.

bowrilla · January 19, 2021

May I add that for years (and to my knowledge true for all mainstream CPUs) x86-64 is just the exposed CISC instruction set. Internally Intel Core and most likely same with AMD Zen, they are RISC. The more complex RISC instructions are being broken down into (optimized) RISC commands and then being processed.

At the end of the day it all comes down to the specific chip design and the specific application. ARM heavily benefits from parallelization but when there's a library that's not very well optimized their single core performance in those tasks just tanks. See i.e. this benchmark session from a few years ago at cloudflare. Arm can also shine in terms of power savings. Under full load and all cores engaged, you won't save much in many cases (though it is possible if the load is just right), but the fact, that most ARM-based chips have often lower power slower cores on the same Die that they can switch to means less power draw. Perfect for mobile devices. Also perfect in specific desktop and server applications but not as allround yet. Of course, that's just a matter of optimization. Apple will certainly give ARM a heavy push with lots of work going into optimizations.

linuxChips2600 · January 19, 2021

24 minutes ago, bowrilla said:

Apple will certainly give ARM a heavy push with lots of work going into optimizations.

But the real question is, given that Apple has (afaik) traditionally been a much bigger supporter of open-source development on at least the software side (like say compared to a major proprietary OS vendor like Microsoft, although I understand that Microsoft isn't quite like Apple in a lot of ways), what are the chances, if any, that once Apple has matured their desktop/laptop ARM CPUs/GPUs/processors to a point where they can truly compete with x86-64 processors in basically all use cases (except for may a few edge cases here and there), that they (Apple) will actually release details of how they optimized ARM or even a part/all of their new custom ARM ISA, whether it be through an open source license or through a paid-for license (like how Intel licensed x84-64 to others such as Cyrix and AMD)? Or would it put Apple at risk of others having a much easier time basically reverse engineering their entire ecosystem?

bowrilla · January 19, 2021

1 hour ago, linuxChips2600 said:

But the real question is, given that Apple has (afaik) traditionally been a much bigger supporter of open-source development on at least the software side (like say compared to a major proprietary OS vendor like Microsoft, although I understand that Microsoft isn't quite like Apple in a lot of ways), what are the chances, if any, that once Apple has matured their desktop/laptop ARM CPUs/GPUs/processors to a point where they can truly compete with x86-64 processors in basically all use cases (except for may a few edge cases here and there), that they (Apple) will actually release details of how they optimized ARM or even a part/all of their new custom ARM ISA, whether it be through an open source license or through a paid-for license (like how Intel licensed x84-64 to others such as Cyrix and AMD)? Or would it put Apple at risk of others having a much easier time basically reverse engineering their entire ecosystem?

First, MS is actually a huge contributor in the open source community for years now. And the optimizations I'm talking about are on library and compiler level. You don't just rewrite all of it to not be forced to share it and you don't just fork it because that would exclude you potentially from future updates. There'll be research being done on how to optimize one single algorithm, how to optimize math problems and so on. It's not just Apple working on it, it's a big community. Apple is just providing a base platform.

As always, Apple isn't even the first to try this. MS had some Surface devices with Arm chips and Windows 10 on Arm is out for a while. Unlike Apple though, MS never invested time and money to push the platform, to invest tens of thousands of developer hours to offer a good first experience. But Apple still needs 3rd party stuff to work well. You'll need the Adobe Creative Cloud to work well for example. There's so much software that needs to be adapted and optimized. If Apple would keep all their knowledge about optimization for themselves and would not offer the proper build tools, this project would flop and Apple would have a huge problem.

So again, Apple is taking established ideas and technology and puts it in an attractive package that sells. Good for Apple, good for the customers, good for the industry. They did it with the iPhone, the iPod, the iPad and iTunes just to name 4 of their biggest hits. Nothing was new technology. All of it was already available in some way. They just made it work for the consumer. They'll do the same with Arm chips in laptops and small pcs. Don't expect your gaming rig or your heavy lifting workstation to make the jump any soon (if at all).

January 19, 2021

5 hours ago, linuxChips2600 said:

I honestly figured that, but even from this forum I still get some of that "ARM is more efficient than x86-64" vibe, but I also know there's many very knowledgable people who hang around this forum so I wanted at least to get some direct input from them as well. Now that I think about it, didn't Linus himself even try to dispel some of the ARM vs x86-64 myths in a WAN show or a video or something? If someone could link me to that video that'd be great (although I don't want people spending a lot of time just to hunt for the vid as LMG has released a ton of videos since their beginnings more than a decade ago).

Right not digging up old videos, gotcha.

Arm is simple non complex instruction sets.

X86 is complex instruction sets.

ARM is not capable of running complex instruction sets, therefor cannot be compared to X86 processors in this way.

So you compare the two CPUs by using basic instructions sets on a calculation, perhaps Pi, and see which one consumes less power. That would determine the efficiency between the two I suppose.

So W10 and X86 does tablet mode, therefor the need for ARM is small. And actually could get smaller.

So here's a Ryzen 4000 chip in a handheld gaming console. Tom's Hardware did a review on it last November.

https://www.tomshardware.com/news/amd-ryzen-4000-apu-handheld-gaming-console

Could X86 actually be the future even in your hands? Looks like it very well could be a good contender vs ARM, but have the flexibility of complex instruction sets.

Nystemy · March 19, 2021

ARM and RISC architectures aren't inherently "better" than X86 and CISC architectures, nor is it the other way around.

The topic is far more nuanced than what might first meet the eye.

From a pure computer science standpoint, a RISC architecture is always less power efficient than a CISC architecture made for the application. This is though not something we typically have in practice. _{(Sometimes this is the case in some applications, but in general we do not have application specific CPUs.)}

Now, for this post, I won't talk specifically about ARM nor X86, but rather about architecture design in general.

First of, we need to look at what RISC and CISC architectures are.

RISC is Reduced Instruction Set Computing, and CISC is Complex Instruction Set Computing. But what does "Reduced" and "Complex" actually mean.

Complex doesn't actually state anything about the complexity of any given instruction. But rather the overall complexity of the larger implementation.

Reduced is generally just the opposite of this.

As a rule of thumb one can say that all architectures with more than 32 instructions are CISC, and that all architectures with fewer than 50 instructions are RISC. _{(Yes, I know, they overlap.)} But the difference is also typically swayed by memory access behavior, how instruction calls are formulated, if we need to explicitly handle state flags in various situations. among other details. In short, the more stuff we have to take into consideration, the more CISC the architecture is in general. It is though a large sliding scale without a clear line in the sand.

The next thing to consider is other features on top of the Instruction Set Architecture (ISA) itself. Like how prefetching is handled, caches and even out of order execution, multi threading, SMT, etc. Now these things can be part of the ISA itself, since an ISA will make more or less restrictions to how these things can be implemented in practice.

From a performance and power efficiency perspective out of order execution tends to be a fairly central point, since this is generally where everything else revolves around.

The out of order system will generally need to handle our incoming decoded instruction calls, and figure out a logical order to process them in. _{(How that is determined is a debatable topic in itself)} And only after figuring out a good order can it push it to execution.

Now, in most architecture implementations we don't just give each an every single instruction its own dedicated lane to the out of order system. We will tend to aggregate them into ports. _{(something that can create fun side channel attack vectors if one isn't careful...)}Each port will have a handful of different instructions on it that we can interact with.

But here is the important aspect of architecture design, and that is our budget. Be it physical space, power, or just transistors, or even production yield in some cases.

In a CISC architecture we have a lot of different hardware features that we need to implement. This means that we will spend X portion of our budget to just implement one of each instruction. Then we can have duplicates of certain instructions we expect to need more of. But how many duplicates and of what instruction is debatable and can vary depending on the target market.

In a RISC architecture on the other hand, we can make more copies of each instruction. And then we can use microcode to implement more advanced features through what is effectively software.

But the downside with microcode is that for every sub call it creates it needs to run our control logic as well as the instruction for the sub call it uses. In a CISC architecture, we would simply have 1 piece of hardware for the whole task, saving us the need to run more generalized control logic. This greatly improves execution speed and also power efficiency of our CISC implementation.

On the other hand, this power efficiency and performance advantage of CISC is though flushed down the drain if our application of choice doesn't utilize our instructions that does the thing they need. (in short, poorly optimized software.)

And in some edge cases we can just have an application that needs simpler instructions that there is no equivalent for in our CISC architecture, then our RISC one will tend to perform better. CISC architectures also tends to have more control logic in their core compared to RISC ones, making the impact from "lacking an equivalent instruction" even more painful...

A RISC architecture is more or less following the idea of a minimum viable product from the architecture design perspective. It has most of the features needed for the applications it tagets, but straying outside of that scope tends to give lackluster performance on a RISC architecture. But most applications needs fairly little to get by adequately. And RISC generally also has less control logic, making it easier for it to reach lower idle power consumption.

CISC tends to aim towards ensuring that one has an instruction suitable for the task at hand, speeding up both the task and increasing power efficiency for the task. Though at the downside of often needing more chip real-estate.

To answer the question, "Is ARM really more efficient than X86-64?"
Then the answer is simply: It depends.

It is hard to make a proper comparison, since we would need two chips with similar design constraints targeting the same general scope of applications to actually answer the question correctly. And in the end, it will more depend on the workload at hand. Not to mention that our workload would need to be equally optimized for both systems. _{(The CISC one for an example would be abhorrent if one doesn't use applicable instructions, it would be an unfair comparison.)}

To a degree, the question is like asking, "Is a Fiat 500 more efficient than a Scania K EB?" _{(one is a tiny city car, the other is a bus taking 50+ people. Fuel efficiency varies greatly depending on how many people we need to move.)}

In the end, I don't know if this answer is satisfactory since I don't actually answer the question.
But having studied and designed computer architectures for over a decade I don't find the "crowning" of architectures as interesting. From the architecture development point of view, a CPU can be interesting and worth while, even if it is abhorrent in 70+% of all applications. Between ARM and X86, even PowerPC, they are all kings of their own respective hills.

Though, in regards to the "or is it much more about optimization from top to bottom?"
Yes, software optimization and picking a platform suitable for the application and optimizing for the platform is where the majority of the difference between architectures is. If one doesn't use an architecture's applicable strengths for one's application, then it won't perform well. And just because some other application performs well on X platform doesn't mean that the platform is inherently a good choice for other applications, and even if a platform is performing bad doesn't really state much either.

From the architecture design standpoint, one can do a lot to bring forth good features and performance improvements in a processor. But in the end, it is the software developers that makes or breaks a platform.

I will though have to end this post with the reservation that I have oversimplified things rather greatly here. This is already a wall of text most people won't care about reading fully and going into more detail would in general be boring.

Sign In

Is ARM Really More Efficient than x86-64, Or Is It Much More About Optimization From Top to Bottom

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites