How can a CPU execute "multiple" instructions per cycle?

Haswellx86 · December 15, 2023

Aside from AVX and SIMD registers, how can a CPU execute multiple instructions per clock? One normal register will read one normal operation and execute one normal task and update the clock, right?

emosun · December 15, 2023

an x86 cpu or just any theoretical cpu?

FI Fheonix · December 15, 2023

37 minutes ago, Gat Pelsinger said:

Aside from AVX and SIMD registers, how can a CPU execute multiple instructions per clock? One normal register will read one normal operation and execute one normal task and update the clock, right?

There are certain technices that allow a CPU to execute multiple instructions per clock cycle. For example, Superscalar processing. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor . Another technique is instruction pipelining, which allows a CPU to execute multiple instructions in parallel by breaking them down into smaller steps and executing them in a pipeline. It is difficult to say which technique is used more frequently as both techniques are commonly used in modern CPUs. However, superscalar processing is more commonly used in modern CPUs as it allows for more efficient use of the CPU’s resources.

noname8365 · December 15, 2023

1 hour ago, Gat Pelsinger said:

One normal register will read one normal operation and execute one normal task and update the clock, right?

Not always, sometimes it can do more, for example if you have instructions like

xor eax,eax

xor ebx,ebx

it can do these two at once because they're independent of each other (the "xor eax,eax" instruction breaks all previous flag dependencies and won't affect the flags that the "xor ebx,ebx" instruction changes)

starsmine · December 15, 2023

2 hours ago, Gat Pelsinger said:

Aside from AVX and SIMD registers, how can a CPU execute multiple instructions per clock? One normal register will read one normal operation and execute one normal task and update the clock, right?

clock updates regardless.

But it doesnt do more than one core instruction per clock really. It pipelines.

Think if washing and drying your clothing

You set up your clothes to wash
you put it into the wash
It drys
you fold

each one done in a "clock"
You can build different parts of the CPU to do each of those tasks
so while you are setting up your clothes, another set is washing, another set is drying, you have a 2nd person folding what just got taken out of the dryer.
you are doing 4 "instructions" in one clock cycle. (this Is oversimplified, fetching isn't really an instruction)

https://en.wikipedia.org/wiki/Instruction_pipelining

https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)

The problem comes in when you have data coming up that is dependent on the data that you just sent in. It stalls. it creates bubbles. You are clocking parts of your chip but not doing anything with that clock.

Thats where you have a control front end that goes. "oh shit a bubble", and shoves data into it that was behind it that was not dependent, or makes a guess on what the data is and shoves it in anyways (branch prediction). If the branch prediction is wrong, that's all wasted and thrown out.

Modern processors are often 20ish deep pipelined.
This is also why hyperthreading works. if your Out of Order front end is not smart enough, just send a second thread down the pipeline instead, you know its not dependent.
Skylake was 14 to 19 stages long

some instructions use other parts of the pipeline that others dont, like you dont need to even touch the core to multiply or divide ANY register by 2.

It should also be noted, the deeper the pipeline the more latency you have, some instructions can short cut out early, but others if you made a 60 deep will take 60 clocks to get out, and if anything needed that data, thats a 60 clock stall.

The whole increasing IPC is more so decreasing stalls and bubbles and parallelizing as much as possible.

Modern X86 SIMD superscales by having multiple inject and eject points in the pipeline to accomplish that.

I want to say, some of my details are probably wrong, I'm going off of memory from a class I took a decade ago.

Sign In

How can a CPU execute "multiple" instructions per cycle?

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

I Visited the Cradle of the Internet

Latest From ShortCircuit:

Razer Finally Got a Desk Job - Razer Pro Type Ergo

Latest From TechLinked:

This Summer’s Lookin’ Steamy

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

The Secret Council Behind Every Emoji

Latest From The WAN Show:

Google’s Best Feature In Years - WAN Show June 5, 2026

My Activity Streams