Jump to content

Aside from AVX and SIMD registers, how can a CPU execute multiple instructions per clock? One normal register will read one normal operation and execute one normal task and update the clock, right?

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to post
Share on other sites

37 minutes ago, Gat Pelsinger said:

Aside from AVX and SIMD registers, how can a CPU execute multiple instructions per clock? One normal register will read one normal operation and execute one normal task and update the clock, right?

There are certain technices that allow a CPU to execute multiple instructions per clock cycle. For example, Superscalar processing.  In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor . Another technique is instruction pipelining, which allows a CPU to execute multiple instructions in parallel by breaking them down into smaller steps and executing them in a pipeline. It is difficult to say which technique is used more frequently as both techniques are commonly used in modern CPUs. However, superscalar processing is more commonly used in modern CPUs as it allows for more efficient use of the CPU’s resources.

Have you tried turning it off and on again? Maybe Restart it? 

Please make sure to Mark the Solution as a Solution.

Take everything I say with a grain of salt. I could be just about wrong as I am right.

 

Main RIG

13600K (Undervolted) +MSI Z690 Edge Wi-Fi+ Team Elite 32gb RAM (3200) +Noctua Nhd-15 Chromax Black+ Intel 670p 1TB SSD+ EVGA FTW Nvidia RTX 3090+ Corsair Crystal 465x case+ EVGA SuperNOVA 650W PSU.+ ASUS VP222 Gaming Monitor

 

Laptop for School: Surface go 2 (sucks ass)

 

Laptop for tinkering: Dell Inspirion 3358

 

Audio: Apple Airpods Pro (1st Gen)

 

(Apple_reigns_ supreme_ forever_ and_ ever)

 

(I am 16 years old and don't know shit about fucking shit.) 

 

Everyone must suffer one of two Pains: The pain of Discipline or the pain of regret and disappointment.

 

-Jim Rohn

Link to post
Share on other sites

 

1 hour ago, Gat Pelsinger said:

One normal register will read one normal operation and execute one normal task and update the clock, right?

Not always, sometimes it can do more, for example if you have instructions like

xor eax,eax

xor ebx,ebx

it can do these two at once because they're independent of each other (the "xor eax,eax" instruction breaks all previous flag dependencies and won't affect the flags that the "xor ebx,ebx" instruction changes)

Link to post
Share on other sites

2 hours ago, Gat Pelsinger said:

Aside from AVX and SIMD registers, how can a CPU execute multiple instructions per clock? One normal register will read one normal operation and execute one normal task and update the clock, right?

clock updates regardless. 

But it doesnt do more than one core instruction per clock really. It pipelines. 

Think if washing and drying your clothing


You set up your clothes to wash
you put it into the wash
It drys
you fold

each one done in a "clock"
You can build different parts of the CPU to do each of those tasks
so while you are setting up your clothes, another set is washing, another set is drying, you have a 2nd person folding what just got taken out of the dryer. 
you are doing 4 "instructions" in one clock cycle. (this Is oversimplified, fetching isn't really an instruction)

https://en.wikipedia.org/wiki/Instruction_pipelining

https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)

The problem comes in when you have data coming up that is dependent on the data that you just sent in. It stalls. it creates bubbles. You are clocking parts of your chip but not doing anything with that clock. 

Thats where you have a control front end that goes. "oh shit a bubble", and shoves data into it that was behind it that was not dependent, or makes a guess on what the data is and shoves it in anyways (branch prediction). If the branch prediction is wrong, that's all wasted and thrown out. 

Modern processors are often 20ish deep pipelined. 
This is also why hyperthreading works. if your Out of Order front end is not smart enough, just send a second thread down the pipeline instead, you know its not dependent. 
Skylake was 14 to 19 stages long


some instructions use other parts of the pipeline that others dont, like you dont need to even touch the core to multiply or divide ANY register by 2.

It should also be noted, the deeper the pipeline the more latency you have, some instructions can short cut out early, but others if you made a 60 deep will take 60 clocks to get out, and if anything needed that data, thats a 60 clock stall.  

The whole increasing IPC is more so decreasing stalls and bubbles and parallelizing as much as possible. 

Modern X86 SIMD superscales by having multiple inject and eject points in the pipeline to accomplish that.


I want to say, some of my details are probably wrong, I'm going off of memory from a class I took a decade ago. 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×