This seems to be a commonly misunderstood thing so I'll do my best to shed some light on it. Keep in mind the following however:
The purpose of this guide is to define some key terms that are often misused, and explain how threads and hyper-threading works in the context of "Set Affinity" in task manager. Which checkboxes mean what, how to use it, why it does things, etc.
Explaining how a CPU works is not the point of this guide; This information is just provided as background to get everyone on the same page. It is a simplified explanation of how CPUs work. To the best of my knowledge it's correct, but it is simplified. If you want more detail, go to page 3 where there is some good content.
Before we can do anything, some terminology is in order.
Core: This refers to the physical core on your CPU. You might have 2, or 4, or 6 or some other number. They are real, physical, and they are there, and with the right equipment, they are clearly visible:
Logical Core: This refers to the device exposed to the operating system. It may or may not represent a physical construct 1:1, but it is something Windows can schedule tasks to execute on. In a modern i7 for example, you have 4 Cores (4 physical cores), but 8 logical cores, because - due to hyper-threading - each actual core presents 2 logical cores to the system.
Thread: This is perhaps the most commonly misunderstood, or at least misused term with regard to this subject. A thread is not a physical thing, it is not part of hardware, and it does not arise from hardware in any way. A thread is a software concept, and is a single consecutive series of tasks. Every running program on your computer consists of one or more threads. Software can launch or terminate threads as needed, no differently than you open and close applications. Generally speaking, any time a program is doing 2 or more things at once (ie, has a dialog box open waiting for input, while the main program remains functioning in the background) it is using 2 or more threads. A program can launch virtually any number of threads, and you can see how many any given program is using in task manager by activating/showing the correct column. You will find that most programs have more than 10, and some have over 100. There is no limit to how many threads any CPU can run, but there is a limit to how many can be run simultaneously, and this is where multiple cores and hyper-threading comes in. It would be incorrect to say a CPU has a certain number of threads. What you really mean is it has that many logical cores, each one of which can be entirely occupied by one sufficiently demanding thread.
The vast majority of those threads exist because it makes logical sense to do so for the sake of the program design. Some, however, exist to perform computationally intensive tasks, and these are generally the only threads we care about. When we say "a program can't use more than 4 cores", it is because the intensive workload is not split up onto more than 4 threads, and since 1 thread can only run on one core at a time, this means the program cannot use more than 4 cores effectively. Think of it this way - if I tell you to add 3 to 4, and then multiply the result by 7, you can't start multiplying until the addition is done. This is a very basic example of a serial task - something that cannot be parallelized - something that cannot be split up onto more than 1 thread - onto more than 1 core. Every thread consists of a great many of these tasks in order, and if the programmer did his or her job correctly, it won't bundle up things that could be done simultaneously by another core. For example, if I had two employees, I could tell 1 to do this math problem and then fetch the dry cleaning, or I could be smart and tell one to work on the math while the other goes to the cleaner's. The tasks are unrelated and can be done in parallel, and so they should be put on separate threads so they may be executed simultaneously by separate cores. One core can run multiple threads (though not simultaneously [*1]), but one thread cannot run on multiple cores. It may appear to, depending on how Windows schedules it, but if it is loading all cores on the CPU, you will notice that it only loads 1 logical core's worth of percentage on to all of them (ie, a single-threaded application will load all 4 cores in a quad core to 25%, or there abouts).
The thing about an intensive task is it may not actually take effort from the CPU 100% of the time. Consider the following analogy:
Imagine a store with many customers, and several cashiers that each have one till. Each customer checking out is like a thread, and each cashier is like a core. Now what happens when an old lady is rummaging for change? That cashier - that core - is still occupied with that customer - that thread - but it's not really doing anything, just, waiting. This happens in real programs as well. Sometimes a task gets to the CPU, and then realizes, "oh wait, I need something from memory". In the nanoseconds that it is fetching that, the CPU is occupied but not actually accomplishing anything. Imagine if that cashier had a second till - the cashier is still only able to work so fast, but at least he/her can make use of his or her "spare time" more effectively. Now, with a store of customers, several cashiers, and 2 tills per cashier, each cashier can work on checking out a customer, unless they are held up for some reason, at which point that cashier can use his or her other till to start checking out another customer. This is hyper-threading in a nutshell. It is not another core, or anything like that - it is just a way that the existing hardware can be used more effectively.[*2] In theory, 1 core with hyper-threading and 1 core without hyper-threading will perform an identical task in exactly the same amount of time (assuming the cores are the same in every other way), but if faced with 2 tasks, the hyper-threaded core will be faster. Probably not twice as fast, but faster for sure - maybe ~50%, depending on the task, though in theory it could be anywhere from no better to twice as good.
This creates an interesting question though. When you open task manager and go "Set Affinity", how to you know which of those check boxes map to which physical core? That's what I will now explain/prove.
(Oh, and as an aside, we now can see why Intel shows an i7 for example as having 4 cores and 8 threads - it is because it can execute 8 intensive threads "simultaneously")
Now that we understand how the CPU and the processes it runs work, we can get into how this relates to Windows. From my very first i7 back in 2011, I had my own theories, but to be honest I never actually tested them until today. Luckily I was right about everything I had thought all along.
So how do they map? Are those 8 checkboxes just saying how many core-equivalents of power you want the task to take up, but have no correlation to an actual core? Does each one actually refer to a specific core? And if so, in what order or pattern? I performed some tests, running a process on just one of those logical cores at a time, and observing which actual core got hotter using HWMonitor. The results were conclusive and indicated the following mapping is correct. (Note that I count from 0 not 1, as per how task manager does it)
Well that's all fine and good, but how do they work (or interfere) with each other? To determine this, I took a predictable, repeatable and CPU intensive task consisting of 8 threads and ran it on the logical cores indicated in the chart below. I ran 4 trials in each case and averages the run times (in seconds). From this we should be able to gain additional insight.
I believe these results are also quite conclusive and verify the mapping provided above. Allow me to explain:
We notice that on 0, 1, 2, and 3 that the run time is essentially the same (within the margin of error). This makes sense, since all of these are 1 logical core tests. I could have continued to test 4, 5, etc but they would all have been the same, since in each case, all it means is the task is allowed to execute on one logical core (ie, execute on one physical core and not take advantage of hyper-threading).
We notice that the tests on 0+1 and 2+3 the run times are essentially the same. We also notice that tests 0+2, 1+2, and 1+3 are also the same as each other, but faster than the 0+1 & 2+3 tests. This would seem to confirm the mapping above. If my interpretation is correct, it means that 0+1 was running on core 0 with HT, and the 2+3 test was running on core 1 with HT (both effectively 1 hyper-threaded core), while test 0+2, 1+2, and 1+3 were all running on cores 0 and 1 without HT (effectively a physical dual core). This is backed up by the run times. It makes sense the actual dual core should outperform the hyper-threaded single core.
To continue, I tried a test on 0+1+2+3. This is effectively an i3: a dual core with hyper-threading. We see it get outperformed again by the test on 0+2+4+6, which is effectively an i5: a quad core without hyper-threading.
Finally, I tested on all cores just to show the full power of the i7 for reference.
What does it all mean?
It means that the options in task manager do map directly to certain cores, and parts of cores, and there is no mystery. It means you can basically simulate any kind of processor by setting the affinity in task manager.[*3] As Luke said, it means there is no difference between task manager's "core 0" and "core 1" - there is no hyper-threading minor core and a parent core or any of that; both core 0 and core 1 just mean run on your actual physical core 0. But, there is a difference between setting something to 0+1 and to 0+2: 0+1 will make it only use one physical core, taking advantage of hyper-threading when possible, while 0+2 puts the load on two physically separate cores.
[*1] CPUs are complex things. A single instruction, and even a series of instructions don't just march one after the next through the chip like the customers in my example. This is a good analogy in my opinion but it does hide (ignore) much of the complexity. There are many parts to a CPU - some perform integer math, some do other things, etc. and depending on the CPU, and the code you are running, and how it was compiled, there are various ways in which things are automatically parallelized. For example, if it appears that an upcoming instruction is unrelated to other things currently being run and the parts of the CPU that would handle it are free, it will jump ahead and run that while using other bits of the CPU for other instructions. It is because of this complexity that the idea of "simultaneous" execution becomes a bit muddy, but for the purpose of understanding what is going on here, just imagine the example I gave and it should represent what's going on relatively well.
[*2] I must confess, I do not know exactly the bit level "play by play" of what is going on, and I have heard that a core with HT and one without are not physically different, and that they are physically different, but again, for the sake of a general understanding, just consider it a way to use what's already there more effectively, and not as additional hardware, since that becomes hard to distinguish from simply having an additional core.
[3*] OK, obviously there's more to it - IPC, cache size and speed, other differences in architecture, etc. But what I mean is you could realistically simulate the performance, give or take, of an i3 6100 by disabling some stuff in an i7 6700k
I hope someone finds this useful, and I want this of course to be entirely accurate and correct so if you believe there is a mistake, please let me know, but to the best of my knowledge this is all valid.
Thanks for reading!