Hyperthreading in optimzed vs. non-optimized applications

Yasashii · November 22, 2020

Hello.

I've got a certain conundrum and met with very varying opinions on the subject.

If I have a 6-core CPU that supports Hyperthreading, and an application that only supports paralleling its operation to 6 threads, will it make any difference if I turn hyperthreading off, so that it can use each individual core as its own entity, or will its performance be decreased because the application will only use 6 threads of the 12 available if I leave hyperthreading on?

Some say that it will hurt performance because L1 and L2 caches will be divided for the 2 threads, but not otherwise. Others say it will slice the performance in half because the application would only use half of each core in this scenario. There are those who say that there is no appreciable difference.

Can anyone shine some more detailed light on the subject?

Thank you.

mahyar · November 22, 2020

if you turn off hyperthreading cpu (usually) turbo to a higher frequency therefore better single core performance

about cache idk

porina · November 22, 2020

It depends on the software and OS.

I run compute tasks that generally show no benefit from SMT, and optimal performance is reached running one thread per core. If between the software and Windows, running at one thread per core even with SMT on gets a good result. Sometimes, Windows may be stupid for example and put two threads on one core, leaving another core idle. That is not optimal and will result in loss of performance. The workarounds to that are either to manually set affinity, or turn off SMT.

I have a 12 core CPU I'm using for gaming. Something silly is going on with that too. Since games don't scale to 24 threads, I turned off SMT on that, so it is now a 12 core 12 thread CPU. Games run better than before. Each thread now gets the full resources of a core, and don't have to fight with each other for shared resources.

Note that in general SMT gets more things done faster, but it doesn't necessarily get a single thing done faster. If that single thread is critical path, it can be slower overall.

Mesterial · November 22, 2020

Core binding is the key here. If you can pin each thread of your program on each physical CPU, then do it. If you can't, then you will rely on the OS scheduler. Check if each CPU core has a load. It doesn't matter if some threads are on the logical cores as long as the corresponding physical cores are idle.