I'm going to admit to skimming through most of this thread (I did read a bit of the OP), but I feel like either the subject was not explained thoroughly enough or there is some misinformation still being spread about.   A background on processors To understand how HyperThreading and even Cluster Multithreading (both are similar to an extent), a basic understanding of processors is needed.   A processor is divided up into three major sections The front end, or control unit. This fetches instructions, decodes them, and schedules how to run them. The back end, or execution unit. This does the work on the actual instruction The memory interface, which connects the processor to the RAM interface and handles caching. Originally processors were designed to do one thing from start to finish in a fetch, decode, execute, write back pipeline. In simpler processors, this could simply be fetch, execute, write back. Over time, several features were developed to increase the throughput of the processor.   What exactly is HyperThreading? HyperThreading arose from a side effect of a feature in processors known as superscalar pipelining. A superscalar pipeline has two or more pathways for instructions to follow in order to increase instruction throughput. Superscalar pipelines often duplicate resources. What Intel found out was that a lot of times these duplicated resources would go unused. To understand this better, here's a block diagram for a single core of a Nehalem based Intel processor: Towards the bottom are the execution units. If you notice, there are multiples of the same thing (AGU, ALU) or components that are separated (FP ADD, FP MUL).   Another component that HyperThreading adds is duplicating the processor's registers. Registers are small bits of memory that hold the current processor execution state, i.e., what it's working on right now. For example if you were thinking about what 1+2 is, registers hold those two numbers and outputs it into another.   So HyperThreading allows for two CPU states to exist in a single core without creating an actual core. At this point, if there's something ready to run on the secondary state and there are execution units available to it, the processor will use those execution units to do the task. So if you have one thread that's doing a basic integer math instruction (like 1+1, 2*3), and another that's doing floating point math, because there are separate available resources for those instructions, HyperThreading will run both at the same time.   The only catch is I'm not sure how the processors do load balancing or how many execution units a thread actually uses, since HyperThreading has been shown to only improve performance at best by about 20%.   How is this related to Clustered Multithreading (CMT)? For that, let's look at the block diagram for AMD's Bulldozer Essentially CMT is almost identical to how HyperThreading works, but it's arranged differently. HyperThreading has two processor states sharing the same exact resources. This means one thread can hog up all the resources leaving the other starved. CMT however gives both processor states their own resources, making separate execution cores. The common point about HyperThreading and CMT is they both share the same front-end.   Where CMT falters is that if a single thread needs extra execution resources, it can't take the other execution core's resources, it's stuck with what it has. This is why Bulldozer was lackluster in performance on single threaded tasks, each integer core had less resources than a single K10 core.   Let's talk about threads real quick Switching gears, let's talk about threads. As the OP said, a thread is a task in an application. A thread can be used to handle the graphics, another the input, another some processing. Regardless of all this, all threads share the following life cycle: Running: The thread is currently being served by the processor. Waiting/Sleeping: The thread currently has no task to do or is waiting on something to be freed up. An example would be a thread waiting to use the hard drive. Ready: The thread is ready to run. Done: The thread no longer is needed and its resources are cleaned up.   Okay, so about Windows, scheduling, and all that One of the biggest, misinformed areas I've see on this topic is how operating systems schedule their workloads. Here's how Windows and Mac OS schedule their tasks: Every application breaks their tasks up into threads. It can be as many threads as they need. The OS then looks for threads that are in the ready state. Threads that are in the ready state are queued up in a first come first serve basis. Threads may also have priority over others. When it comes time to schedule a new thread to run, the OS looks based on priority, then by who's next in line. In other words, operating systems don't care about the application they're running. They only care about the threads that are available. Think of an application like building a car. The boss who's in charge of manufacturing doesn't care about what car is being built. All they care about is what components need to be assembled and who can do it.   Linux has a different approach to scheduling, called the "Completely Fair Scheduler". Tasks are still broken up into threads, but it looks at who spent the least amount of time executing.   The Misconception of "Dual threaded" or "Is only good up to x amount of cores" Applications come in two flavors of execution style: serial or parallel. Either they're one or they're the other (though different forms of parallelism exist). Serial is that all tasks are executed one after another with the next task in line waiting for the current one to finish. Parallel style is that tasks are broken up to not wait on each other.   In programming, you have what is called a "main loop". This is basically a forever loop that keeps the program "running" until something tells it to exit. A serial program will look something like this (we're making a car):   for(;;){ make_frame(); make_engine(); make_instruments(); put_panels_on_frame(); mount_engine(); intall_instruments(); } And the program will continue doing this in order forever. However, some of these tasks can be done in parallel. The first three for instance aren't dependent on each other, though the last three do depend on a frame being made first, otherwise those can run in parallel to some degree.   So in regards to the first part, "dual threaded" or whatever flavor it is, is a misconception the program literally has only two threads. But if you pop open Task Manager to the "Details" tab and show the "Threads" column, you can find that a lot of tasks have more than two threads. My Firefox instance at the time of writing this has 88 threads running.   However I also heard that term to mean "it can only run on two cores, period." Which is a misconception based on how operating systems schedule tasks. Applications do not schedule themselves. Applications also do not know how many cores there are in a PC. It may only have two threads ready to run at most throughout it's life. But there's also another stick in the mud.   Operating systems these days have gotten even smarter about scheduling and using their resources. For example, if you have a quad core processor but your task only uses a little under 50% of the CPU, in order to be more efficient, the OS may not fire up all four cores. It may only fire up two cores. That doesn't mean the application is only able to throw out two threads. It may be that some of these threads wait often enough and others can slip in and do work. Or the program has 10 threads to run, but they process so quickly they don't utilize the CPU all that much.   The other factor is just how good that processor is at doing things. Sure, an application may not achieve more performance after four cores, but only for that architecture. If for example you ran into this limit on a Skylake processor, a 6-core Nehalem processor will see a much better benefit than a 4-core Nehalem processor because Nehalem isn't as good at processing as Skylake. Otherwise, shouldn't PS4 and XB1 games ported to the PC require 8-cores?   The tl;dr version is this: scheduling is a complicated subject and it cannot be deduced to programs are incapable of doing better due to some hard cap, because there is no hard cap.