Jump to content

Why do most CPUs only have 2 threads per core?

So, most CPUs have one or two threads per CPU core, but I was wondering why there are not more CPUs with 4c/16t or 8c/32t? Some CPUs like the IBM POWER9. That CPU has 4c/16t, and seems like, though it would be expected, it would be expensive, but, like why doesn't AMD or Intel make a CPU that has 32c/128t server CPU or something. 

 

Side note, the IBM POWER9 seems to be the only CPU with 4 thread per core.

I could use some help with this!

please, pm me if you would like to contribute to my gpu bios database (includes overclocking bios, stock bios, and upgrades to gpus via modding)

Bios database

My beautiful, but not that powerful, main PC:

prior build:

Spoiler

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

because why have more mouth when your stomach can only digest so fast?

 

would also make task scheduling more complicated than it already is

-sigh- feeling like I'm being too negative lately

Link to comment
Share on other sites

Link to post
Share on other sites

well there are x86 cpus that have 4 threads per core intels xeon phi series the thing is most workload cant use those threads very good because they are weak

if it was useful give it a like :) btw if your into linux pay a visit here

 

Link to comment
Share on other sites

Link to post
Share on other sites

Architectural differences. If it was feasible, they would have done it. CPU you listed uses PowerPC instruction set (which is RISC) while x86 is CISC, that most likely has a major effect but I am not that technically informed to comment more.

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, mahyar said:

well there are x86 cpus that have 4 threads per core intels xeon phi series the thing is most workload cant use those threads very good because they are weak

btw der8auer  and linus himself did videos on xeon phi!

if it was useful give it a like :) btw if your into linux pay a visit here

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Levent said:

Architectural differences. If it was feasible, they would have done it. CPU you listed uses PowerPC instruction set (which is RISC) while x86 is CISC, that most likely has a major effect but I am not that technically informed to comment more.

Isn't PowerPc more optimized? Like, even if it was writen for x86, SMG would not have been able to run like it does on wii.

I could use some help with this!

please, pm me if you would like to contribute to my gpu bios database (includes overclocking bios, stock bios, and upgrades to gpus via modding)

Bios database

My beautiful, but not that powerful, main PC:

prior build:

Spoiler

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Levent said:

Architectural differences. If it was feasible, they would have done it. CPU you listed uses PowerPC instruction set (which is RISC) while x86 is CISC, that most likely has a major effect but I am not that technically informed to comment more.

https://en.wikipedia.org/wiki/Xeon_Phi

if it was useful give it a like :) btw if your into linux pay a visit here

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, TheTechWizardThatNeedsHelp said:

Isn't PowerPc more optimized? Like, even if it was writen for x86, SMG would not have been able to run like it does on wii.

Optimisation usually happens on software side rather than hardware. That is only the case if hardware is being designed to run a specific software. So no.

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

Link to comment
Share on other sites

Link to post
Share on other sites

Well, i guess it's mostly for historical reasons ... and because Windows is "optimized"/"programmed" for two threads per core. The operating system's kernel would have to be tweaked and tuned and more clever algorithms would have to be written to support well more than 2 threads per core.

It can take years for some new thing to be well supported... you can see in Ryzen and Threadripper how long it took for Windows to understand the concept of CCX (core complex, chiplet with 4 cores / 8 threads) and using multiple dies to make a cpu...

 

There are rumors AMD wants to make the next AMD EPYC cpu (for servers) have 4 threads , so you'll have up to 64 cores / 256 threads ... but not sure it's gonna happen.

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, TheTechWizardThatNeedsHelp said:

with 4c/16t or 8c/32t?

2 threads does allow the core to do more than one task at once, with this there will probably be some software problems with scheduling.

PC: Motherboard: ASUS B550M TUF-Plus, CPU: Ryzen 3 3100, CPU Cooler: Arctic Freezer 34, GPU: GIGABYTE WindForce GTX1650S, RAM: HyperX Fury RGB 2x8GB 3200 CL16, Case, CoolerMaster MB311L ARGB, Boot Drive: 250GB MX500, Game Drive: WD Blue 1TB 7200RPM HDD.

 

Peripherals: GK61 (Optical Gateron Red) with Mistel White/Orange keycaps, Logitech G102 (Purple), BitWit Ensemble Grey Deskpad. 

 

Audio: Logitech G432, Moondrop Starfield, Mic: Razer Siren Mini (White).

 

Phone: Pixel 3a (Purple-ish).

 

Build Log: 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, TheTechWizardThatNeedsHelp said:

Isn't PowerPc more optimized? Like, even if it was writen for x86, SMG would not have been able to run like it does on wii.

"More optimized" isn't really a thing, at least not for instruction sets. x86 has more instructions, but it does individual ones slower. PowerPC and its replacement Power ISA have fewer instructions, but they do individual ones faster. Whether the reduced instruction set or the greater per-instruction capabilities are more valuable to your program depends on the program. Modern compilers generally mean that the same code on x86 and RISC end up being similarly fast without extreme edge cases.

¯\_(ツ)_/¯

 

 

Desktop:

Intel Core i7-11700K | Noctua NH-D15S chromax.black | ASUS ROG Strix Z590-E Gaming WiFi  | 32 GB G.SKILL TridentZ 3200 MHz | ASUS TUF Gaming RTX 3080 | 1TB Samsung 980 Pro M.2 PCIe 4.0 SSD | 2TB WD Blue M.2 SATA SSD | Seasonic Focus GX-850 Fractal Design Meshify C Windows 10 Pro

 

Laptop:

HP Omen 15 | AMD Ryzen 7 5800H | 16 GB 3200 MHz | Nvidia RTX 3060 | 1 TB WD Black PCIe 3.0 SSD | 512 GB Micron PCIe 3.0 SSD | Windows 11

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Benji said:

Well, the IBM chips, IIRC, are more for database management and interconnect systems which can handle multitasking better than usual end-user stuff. That's most likely why they have such high SMT counts. Also, the POWER9 not only has 24c SMT4 core options, but also 12c SMT8 cores, POWER10 supposedly bringing 30-core SMT8 CPZs (30C 240T).

@mahyar They aren't stand-alone CPUs that make a sense as CPUs, they are intended as accelerator cards. Due to the fact that they are x86 with a few modern extensions, they do run Windows, but horribly.

yeah earlier ones were coprocessors and used a pci-e bus to connect to system but newer ones are like normal cpus and they are used in LGA3647 socket

if it was useful give it a like :) btw if your into linux pay a visit here

 

Link to comment
Share on other sites

Link to post
Share on other sites

I think the answer becomes somewhat obvious if you know how SMT works.

 

Contrary to popular belief, SMT does not behave like having two cores would. A processor is comprised of several different units. To simplify things, imagine if your processor had a dedicated "addition" unit and a separate "subtraction" unit. If you have two threads that both need to do addition, SMT will flat out not work. You do not gain anything from having SMT in such a situation. In fact, you might even have a net negative performance effect because of the added complexity and power it uses.

However, if you just so happen to have one thread that needs to use the addition unit and one thread that needs to use the subtraction unit, an SMT enabled core will be able to schedule both tasks simultaneously on a single core, SMT will help out.

 

But SMT stops being efficient once you can reliably feed all different execution units at all time. I think that Intel has determined that for their CPU designs, going above 2 threads per core don't actually provide much benefit since they can keep all their execution units fed reliably. Going above 2 threads in such a situation will add die area which in turn makes the chips more expensive, use more heat and puts more stress on the cache (which might actually harm performance even more if you're already maxing out the cache) since you now need to store more instructions in each core's cache (so rather than having let's say 1MB of cache dedicated to the program running addition instructions, you now might only have 0,5MB for that thread, and the other 0,5MB is used by the subtraction thread).

 

Also, Intel shares their server designs with their consumer designs. They most likely save more money than they would make by designing one way that "works well enough everywhere" than the small profits they could potentially make by designing some 4 way SMT monster core that would not be useful anywhere outside of highly specialized tasks.

 

That's why Xeon Phi had 4 way SMT by the way. Because it was designed for highly parallel workloads and it was not suitable for what most people uses their computers for. They trashed that idea because it wasn't seen as profitable (enough).

Link to comment
Share on other sites

Link to post
Share on other sites

As explained above threads aren't physical cores. They basically feed the cores jobs. In certain workloads there is an appreciable gain by having more threads but in most day to day applications more physical cores can do more work than lesser cores with more threads.

 

Science and research applications though where they need to track many many simultaneous data points across a model like protein folding and the impacts of chemicals on DNA CPUs like Intel's Xeon Phi 7230 with 64C/256T can run this kind of simulation very well because of the special type of workload.

 

Meanwhile your desktop will do better with more real cores for its given type of workloads or lesser faster cores in the case of gaming.

 

It's all about the type of workload that determines which hardware will do the job best.

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, TheTechWizardThatNeedsHelp said:

So, most CPUs have one or two threads per CPU core, but I was wondering why there are not more CPUs with 4c/16t or 8c/32t? Some CPUs like the IBM POWER9. That CPU has 4c/16t, and seems like, though it would be expected, it would be expensive, but, like why doesn't AMD or Intel make a CPU that has 32c/128t server CPU or something. 

 

Side note, the IBM POWER9 seems to be the only CPU with 4 thread per core.

iirc, POWER (POWER8 in special) and SPARC where the only mainstream ISAs with STM8 (8 threads per core). Usually going for higher threads per core requires more execution units, which usually end up being a waste of space and resources for your regular user demands.

@LAwLz beat me to it and already gave a pretty good basic explanation on why that's the case.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Levent said:

Optimisation usually happens on software side rather than hardware. That is only the case if hardware is being designed to run a specific software. So no.

4 minutes ago, BobVonBob said:

"More optimized" isn't really a thing, at least not for instruction sets. x86 has more instructions, but it does individual ones slower.

Worth noting that the POWER10 chips are probably designed to run IBM software. So I wouldn't be surprised if IBM's software have played a major role in the development and design of POWER10.

 

IBM are currently VERY focused on storage and AI. Would you guess what the main features of POWER10 are? Better I/O architecture and performance in AI workloads!

 

 

8 minutes ago, mariushm said:

Well, i guess it's mostly for historical reasons ... and because Windows is "optimized"/"programmed" for two threads per core.

Even if it was the case that Windows was "optimized" for two threads per core (which it isn't), adding support for more would be rather trivial.

Link to comment
Share on other sites

Link to post
Share on other sites

Isn't that kinda how CELL worked also? 

 

Imagine the PPE is the "core" and the SPES are the "threads" 

 

 

And now imagine the true power you would get if you had 8 or 16 or more of these "cores" on one chip... 

architecture.png.1eee364cfb276bccbb964dfd3ad1ef72.png

 

IBM were truly ahead of their time. 

 

(I could be wrong lol)

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Mark Kaine said:

Isn't that kinda how CELL worked also? 

 

Imagine the PPE is the "core" and the SPES are the "threads" 

 

And now imagine the true power you would get if you had 8 or 16 or more of these "cores" on one chip... 

 

IBM were truly ahead of their time. 

 

(I could be wrong lol)

No, the SPE units in CELL is more akin to a SIMD unit with no proper branching, but capable of doing operations of tons of data at once (basically a shitty GPU implementation).

 

Sony planned to use it as the main graphics driver, but once they've seen how shitty it was, they had to ask for nvidia for a GPU. In the end it was kinda ok and helped the GPU somewhat, since the actual GPU itself wasn't that good either.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×