Dumb Question that could 2x CPU Perf.

Poet129 · July 17, 2021

Put simply could a 64 bit cpu perform two 32 bit calculations in a single cpu cycle? My thinking is yes, however they would have to be the same instruction with the current layout of cpus (To my knowledge). Would/could this be a reasonable reason for intel/amd to start making 128 bit or higher cpus? Not to make 128 bit calculations but to make two 64 bit calculations in the same cpu cycle (Effectively doubling cpu speed). If this worked there shouldn't need to be a huge change with programs like if we moved directly to 128 bit. I think the main change would need to be to the scheduler. I don't know exactly how cpus work however I think this might work and might be worth looking into which is why I've posted here...

Eigenvektor · July 17, 2021

The primary reason for 64-bit is address space, not 64-bit calculations.

A 32-bit CPU can address at most 2^32 = 4 GB of RAM. A 64-bit CPU can address 2^64 = 16 exabytes. In reality current 64 bit CPUs actually have less address lanes than they could have, because no one needs this amount of memory (at least for now ). Which is not a problem because even the "more limited" 256 TB of RAM that some CPUs could address is way more than we realistically need for the foreseeable future.

Ok, but could you perform two 32-bit calculations on a 64-bit CPU? Sure. If you happen to have 2 numbers that you need processed at the same time. And you're certain the result of e.g a multiplication will not actually require 64 bit. So yes, a theoretical 128-bit CPU could perform 4 32-bit calculations at the same time. But you would increase complexity and heat output by a large margin for something that is usable in a few edge cases. Unless you need 4 numbers calculated simultaneously all the time, those additional bits are a liability more than a strength. You're better of with a less complex CPU that can run at higher clock speeds.

trag1c · July 17, 2021

What you're thinking of is what is called SIMD or Single Instruction Multiple Data. This is a dedicated set of hardware registers on the CPU exposed through instruction set extensions. For x86 based processors this has been around since the first version of SSE which came out in 1999 (IIRC). This allowed developers to apply the same instruction to 4x 32bit single precision floating point numbers. Subsequent versions expanded this with more instructions to allow for 2x 64bit double precision floating point numbers or 2x 64 bit integers, 8x 16bit short integers, 16x 8bit characters.

Other instruction set extensions exist for different instruction architectures such as NEON for ARM based devices. Additionally, we have even more SIMD extensions for x86 such as AVX512 which has up to 512bit registers.

SIMD has been the bread and butter for game developers for quite some time now. Every bit of vector or matrix math is performed using these registers since the performance increase is absolutely massive (10x over regular scalar math in some cases). Although that is not always the case since performance is purely implementation and task specific. You could just as easily attain half the performance of scalar math if you use SIMD in an application where it doesn't make sense or implement it poorly.

Eigenvektor · July 17, 2021

3 minutes ago, trag1c said:

This allowed developers to apply the same instruction to 4x 32bit single precision floating point numbers. Subsequent versions expanded this with more instructions to allow for 2x 64bit double precision floating point numbers or 2x 64 bit integers, 8x 16bit short integers, 16x 8bit characters.

And if you need more than that, you can move to the GPU, which can often do thousands of such operations in parallel. Having an extension to take care of the cases where you do need/profit from simultaneous calculations is a better idea than scaling up the whole CPU.

wanderingfool2 · July 17, 2021

57 minutes ago, Poet129 said:

Put simply could a 64 bit cpu perform two 32 bit calculations in a single cpu cycle?

If you are thinking about it the way you are thinking about it no, it doesn't work like that. [Like 1,2 + 1,2 = 2,4... 0001 0010 + 001 0010 = 0010 0100, an 4/8 bit example, where you send it through an 8 bit full adder and then split up the 4 bits...in this specific examples it works]

Addition/Subtraction would get messed up due to two's compliment

-1,-2 + -1,-2 = -2,-4... 1111 1110 + 1111 1110 = 1111 1100 [-1,-4]...notice that -4 is right, but because of two's compliment it rolls over into the other one.

Multiplication/Division gets messed up just the same (so you wouldn't be able to use the 64 bit circuits for it)

Then you have float/doubles, which don't even work.

Actually, if it were an 128 bit CPU...I'd imagine it would in general perform slower...since each add operation would be 128 bits, the full adder would be bigger, so it would take more transistors and more cycles. In general as well, there is more overhead in running 128 bits than 64 bits.

I might not be thinking correctly though, so if someone wants to correct me I'm more than welcome.