Jump to content

[Educational] What is SSE/AVX? (SIMD)

Next Topic?  

9 members have voted

  1. 1. Next Topic?

    • [Computer Architecture] Out-of-Order Execution (OOO)
      1
    • [Computer Architecture & Software] How to Optimize a Program for Specific CPUs (with example) (up to ~100x performance gain!)
      4
    • [Software] How does Compilers Work
      1
    • [Computer Architecture] Instruction Pipelining
      2
    • Make it a Video
      0
    • Turnip
      1


It's been a while since I've done my writeup on memory hierarchy back in... 2017? Wow time flies by fast. A lot has changed since then: I've gotten a Master's degree, gotten a job as a R&D compiler engineer, and I've learned a lot more about hardware and software in general. And I'm still bored so here I am.

 

Alright, onwards to the topic at hand. Some of us have probably heard of, or seen marketing on some CPU's, regarding features like AVX (Advanced Vector eXtension), SSE (Streaming SIMD Extensions), MMX (Multi-Media eXtension) and such. These can all be categorized into what's called SIMD Extensions.

Extension in this context refers to extra instructions and features added to the CPU that is separate from the main instruction set (x86, ARM, etc). 

 

SIMD: What is it?

It stands for Single Instruction, Multiple Data, and units inside the processors doing this are sometimes referred to as vector processor/co-processors. 

 

SIMD instructions operate on vector registers (See below for explanation if you don't know what that is), and they can hold multiple pieces of data at once. Think of a vector as just a one-dimensional array of data. For example, if the vector length is 128-bits, it can hold 4x single precision float values (fp32), 2x double precision float values (fp64), 16x int8's, etc. SIMD instructions would take these vector registers and operate on all of them in a single instruction. (Single-value registers/instructions are referred to as Scalar register/instructions).

 

Vector register sizes range between 128-bits to 2048-bits for ARM SVE (Scalable Vector Extension) (Although nobody that I know of was crazy enough to implement anything that big... yet), AVX2 uses 256-bit vector registers, and AVX-512 uses 512-bits. So, the more data you process at once, the faster a program can run!

 

Registers:

Spoiler

Registers are where the data is stored within a CPU when data is read from memory (hierarchy). Most instructions operate on registers, for example, an "add" instruction takes the data from two registers and stores the result into another register. 

Each register only stores a single value, and the size is typically 64 bits for modern x86-64 processors (That's where the 64 come from!). (Side note, engineers, whether they are working on software or hardware, really likes sizes that are powers of 2. Hence 8/16/32/64/128/256/512/1024. Also why a KiB is 1024 bytes - its 210!

 

Did you know?

Spoiler

Technically GPU's are also SIMD co-processors, as each "kernel" (essentially a program to be run on a GPU) has to be run on multiple "CUDA Cores" or "Stream Processors" (depending on who you ask), each operating on a different piece of data, but running the same instructions simultaneously. Therefore: single instruction, multiple data. They just have a lot more cores...

HOWEVER

Before you get too excited, SIMD is not being used for many, many, many programs. Most of the programs we use daily does not utilize SIMD instructions, or their usage does not gain a noticeable speed increase. 

 

Why? Well, as always, it comes down to software.

  • For one, in order to use SIMD instructions, there are often times restrictions on the layout of the data in memory. Sometimes, in order to get the data in the format the vector processors can handle, you're better off using scalar operations anyway. 
  • Secondly, not all programs can benefit from vectorization. Sometimes you just need to do calculations with a single value. Sometimes your values need different operations (e.g. adding one, subtracting another). If you can't consistently fill up your vector registers, they're not worth the trouble. 
  • Finally, in order to use SIMD instructions, the programmer often times need to embed SIMD instructions into the program they're writing directly. This is similar to embedding assembly code (basically "human readable" machine code) directly into C. It's also very hardware specific, older processors may only support MMX, or SSE... or maybe AVX-2. But typically only server parts have support for AVX-512. 

 

So what are they used for?

On the consumer side, SIMD is typically used for multimedia - software video encoding/decoding (Hence the Multi Media eXtension). Also, in recent years - AI. Some neural networks, for example, are not large enough to warrant transferring data to the GPU to process (remember, getting data to and from GPU takes time), and they are often times handled by the CPU, utilizing SIMD instructions. 

 

By the way:

Spoiler

There is a new generation of SIMD coming out, all of them have Matrix in their name for some reason... Intel has the Advanced Matrix Extension (AMX), IBM with the Matrix Math Assist (MMA)... It's not like AI is just a bunch of Matrix Multiply operations... Right? 

In all seriousness, matrix multiply can be very slow in large sizes, and can be a perfect example for optimization, to see how fast it runs before and after optimization Cough Cough

Also, SIMD is an essential part to HPC (High performance computing, basically supercomputers) applications. For example, the processor making up the fastest supercomputer in the world (last I checked), was the A64FX, powering Japan's Fugaku supercomputer. These are ARM processors with SVE, with a vector length of 512-bits. 

 

 

So that's about it. That's all I can come up regarding SIMD without going too much into the details... This lockdown is making me spend my time on things like this :v 

 

Me: Computer Engineer. Geek. Nerd.

[Educational] Computer Architecture: Computer Memory Hierarchy

[Educational] Computer Architecture:  What is SSE/AVX? (SIMD)

Link to comment
Share on other sites

Link to post
Share on other sites

 

1 hour ago, Wander Away said:

So that's about it. That's all I can come up regarding SIMD without going too much into the details... This lockdown is making me spend my time on things like this :v 

If you want to go into details in status updates, I'd be interested in reading that.

 

Also, I voted for one option, but I really wanted to vote for all the options except the video one. I personally prefer text over video by a vast margin.

Spoiler

CPU: Intel i7 6850K

GPU: nVidia GTX 1080Ti (ZoTaC AMP! Extreme)

Motherboard: Gigabyte X99-UltraGaming

RAM: 16GB (2x 8GB) 3000Mhz EVGA SuperSC DDR4

Case: RaidMax Delta I

PSU: ThermalTake DPS-G 750W 80+ Gold

Monitor: Samsung 32" UJ590 UHD

Keyboard: Corsair K70

Mouse: Corsair Scimitar

Audio: Logitech Z200 (desktop); Roland RH-300 (headphones)

 

Link to comment
Share on other sites

Link to post
Share on other sites

I want an engineering breifing of how a CPU works!

 

Now I mean, I know the transistors form gates that are interconnected to perform operations.
 

But a transistor is basically only an on/off switch, they still need to be fed power from somewhere and a carrying the signal to switch from off state to on state thus only amplifying the initial power flowing through the transistor. In a gate setup it’s more logical further down the line since you might have transistor before that feeds into one further down the line.

 

But the first transistor in the line, why does it matter that it get a weak carrying signal power to switch on only amplifying the signal, how can that turn to something useful for computations? Isn’t all the information allready present, why amplify? WHY?!

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Wander Away said:

Technically GPU's are also SIMD co-processors, as each "kernel" (essentially a program to be run on a GPU) has to be run on multiple "CUDA Cores" or "Stream Processors" (depending on who you ask), each operating on a different piece of data, but running the same instructions simultaneously. Therefore: single instruction, multiple data. They just have a lot more cores...

You could go on with that and make a new post on how GPUs work 🙂

 

8 hours ago, gabrielcarvfer said:

Now seriously: very informative, but I'd say SIMD is reasonably well known, understood, etc...

 

On a forum targeting "gamers" and LTT's audience? I really doubt 90% of the users even know what SIMD is lol

 

7 hours ago, Spindel said:

I want an engineering breifing of how a CPU works!

 

Now I mean, I know the transistors form gates that are interconnected to perform operations.
 

But a transistor is basically only an on/off switch, they still need to be fed power from somewhere and a carrying the signal to switch from off state to on state thus only amplifying the initial power flowing through the transistor. In a gate setup it’s more logical further down the line since you might have transistor before that feeds into one further down the line.

 

But the first transistor in the line, why does it matter that it get a weak carrying signal power to switch on only amplifying the signal, how can that turn to something useful for computations? Isn’t all the information allready present, why amplify? WHY?!

I recommend this game: https://nandgame.com/

 

It's really nice and goes through the bottom-up process of building a CPU.

If you already know how to make some basic gates from transistors, then this is a nice next step, something akin to an ELI5 intro to computer architecture.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×