Jump to content

[Guide] Hyper-threading and Windows, explained for real

vanished

This seems to be a commonly misunderstood thing so I'll do my best to shed some light on it.  Keep in mind the following however:

  • The purpose of this guide is to define some key terms that are often misused, and explain how threads and hyper-threading works in the context of "Set Affinity" in task manager.  Which checkboxes mean what, how to use it, why it does things, etc.
  • Explaining how a CPU works is not the point of this guide; This information is just provided as background to get everyone on the same page.  It is a simplified explanation of how CPUs work.  To the best of my knowledge it's correct, but it is simplified.  If you want more detail, go to page 3 where there is some good content.

 

Terminology

Before we can do anything, some terminology is in order.

  • Core: This refers to the physical core on your CPU.  You might have 2, or 4, or 6 or some other number.  They are real, physical, and they are there, and with the right equipment, they are clearly visible:
Spoiler

4ZZQL.jpg

  • Logical Core: This refers to the device exposed to the operating system.  It may or may not represent a physical construct 1:1, but it is something Windows can schedule tasks to execute on.  In a modern i7 for example, you have 4 Cores (4 physical cores), but 8 logical cores, because - due to hyper-threading - each actual core presents 2 logical cores to the system.
  • Thread: This is perhaps the most commonly misunderstood, or at least misused term with regard to this subject.  A thread is not a physical thing, it is not part of hardware, and it does not arise from hardware in any way.  A thread is a software concept, and is a single consecutive series of tasks.  Every running program on your computer consists of one or more threads.  Software can launch or terminate threads as needed, no differently than you open and close applications.  Generally speaking, any time a program is doing 2 or more things at once (ie, has a dialog box open waiting for input, while the main program remains functioning in the background) it is using 2 or more threads.  A program can launch virtually any number of threads, and you can see how many any given program is using in task manager by activating/showing the correct column.  You will find that most programs have more than 10, and some have over 100.  There is no limit to how many threads any CPU can run, but there is a limit to how many can be run simultaneously, and this is where multiple cores and hyper-threading comes in.  It would be incorrect to say a CPU has a certain number of threads.  What you really mean is it has that many logical cores, each one of which can be entirely occupied by one sufficiently demanding thread.

 

Background

The vast majority of those threads exist because it makes logical sense to do so for the sake of the program design.  Some, however, exist to perform computationally intensive tasks, and these are generally the only threads we care about.  When we say "a program can't use more than 4 cores", it is because the intensive workload is not split up onto more than 4 threads, and since 1 thread can only run on one core at a time, this means the program cannot use more than 4 cores effectively.  Think of it this way - if I tell you to add 3 to 4, and then multiply the result by 7, you can't start multiplying until the addition is done.  This is a very basic example of a serial task - something that cannot be parallelized - something that cannot be split up onto more than 1 thread - onto more than 1 core.  Every thread consists of a great many of these tasks in order, and if the programmer did his or her job correctly, it won't bundle up things that could be done simultaneously by another core.  For example, if I had two employees, I could tell 1 to do this math problem and then fetch the dry cleaning, or I could be smart and tell one to work on the math while the other goes to the cleaner's.  The tasks are unrelated and can be done in parallel, and so they should be put on separate threads so they may be executed simultaneously by separate cores.  One core can run multiple threads (though not simultaneously [*1]), but one thread cannot run on multiple cores.  It may appear to, depending on how Windows schedules it, but if it is loading all cores on the CPU, you will notice that it only loads 1 logical core's worth of percentage on to all of them (ie, a single-threaded application will load all 4 cores in a quad core to 25%, or there abouts).

 

The thing about an intensive task is it may not actually take effort from the CPU 100% of the time.  Consider the following analogy:

 

Imagine a store with many customers, and several cashiers that each have one till.  Each customer checking out is like a thread, and each cashier is like a core.  Now what happens when an old lady is rummaging for change?  That cashier - that core - is still occupied with that customer - that thread - but it's not really doing anything, just, waiting.  This happens in real programs as well.  Sometimes a task gets to the CPU, and then realizes, "oh wait, I need something from memory".  In the nanoseconds that it is fetching that, the CPU is occupied but not actually accomplishing anything.  Imagine if that cashier had a second till - the cashier is still only able to work so fast, but at least he/her can make use of his or her "spare time" more effectively.  Now, with a store of customers, several cashiers, and 2 tills per cashier, each cashier can work on checking out a customer, unless they are held up for some reason, at which point that cashier can use his or her other till to start checking out another customer.  This is hyper-threading in a nutshell.  It is not another core, or anything like that - it is just a way that the existing hardware can be used more effectively.[*2]  In theory, 1 core with hyper-threading and 1 core without hyper-threading will perform an identical task in exactly the same amount of time (assuming the cores are the same in every other way), but if faced with 2 tasks, the hyper-threaded core will be faster. Probably not twice as fast, but faster for sure - maybe ~50%, depending on the task, though in theory it could be anywhere from no better to twice as good.

 

This creates an interesting question though.  When you open task manager and go "Set Affinity", how to you know which of those check boxes map to which physical core?  That's what I will now explain/prove.

 

(Oh, and as an aside, we now can see why Intel shows an i7 for example as having 4 cores and 8 threads - it is because it can execute 8 intensive threads "simultaneously")

 

Experiments

Now that we understand how the CPU and the processes it runs work, we can get into how this relates to Windows.  From my very first i7 back in 2011, I had my own theories, but to be honest I never actually tested them until today.  Luckily I was right about everything I had thought all along.

 

So how do they map?  Are those 8 checkboxes just saying how many core-equivalents of power you want the task to take up, but have no correlation to an actual core?  Does each one actually refer to a specific core?  And if so, in what order or pattern?  I performed some tests, running a process on just one of those logical cores at a time, and observing which actual core got hotter using HWMonitor.  The results were conclusive and indicated the following mapping is correct. (Note that I count from 0 not 1, as per how task manager does it)

 

Capture.PNG

 

Well that's all fine and good, but how do they work (or interfere) with each other?  To determine this, I took a predictable, repeatable and CPU intensive task consisting of 8 threads and ran it on the logical cores indicated in the chart below.  I ran 4 trials in each case and averages the run times (in seconds).  From this we should be able to gain additional insight.

 

Capture2.PNG

 

I believe these results are also quite conclusive and verify the mapping provided above.  Allow me to explain:

  • We notice that on 0, 1, 2, and 3 that the run time is essentially the same (within the margin of error).  This makes sense, since all of these are 1 logical core tests.  I could have continued to test 4, 5, etc but they would all have been the same, since in each case, all it means is the task is allowed to execute on one logical core (ie, execute on one physical core and not take advantage of hyper-threading).
  • We notice that the tests on 0+1 and 2+3 the run times are essentially the same.  We also notice that tests 0+2, 1+2, and 1+3 are also the same as each other, but faster than the 0+1 & 2+3 tests.  This would seem to confirm the mapping above.  If my interpretation is correct, it means that 0+1 was running on core 0 with HT, and the 2+3 test was running on core 1 with HT (both effectively 1 hyper-threaded core), while test 0+2, 1+2, and 1+3 were all running on cores 0 and 1 without HT (effectively a physical dual core).  This is backed up by the run times.  It makes sense the actual dual core should outperform the hyper-threaded single core.
  • To continue, I tried a test on 0+1+2+3.  This is effectively an i3: a dual core with hyper-threading.  We see it get outperformed again by the test on 0+2+4+6, which is effectively an i5: a quad core without hyper-threading.
  • Finally, I tested on all cores just to show the full power of the i7 for reference.

 

What does it all mean?

It means that the options in task manager do map directly to certain cores, and parts of cores, and there is no mystery.  It means you can basically simulate any kind of processor by setting the affinity in task manager.[*3]  As Luke said, it means there is no difference between task manager's "core 0" and "core 1" - there is no hyper-threading minor core and a parent core or any of that; both core 0 and core 1 just mean run on your actual physical core 0.  But, there is a difference between setting something to 0+1 and to 0+2: 0+1 will make it only use one physical core, taking advantage of hyper-threading when possible, while 0+2 puts the load on two physically separate cores.

 

Footnotes

[*1] CPUs are complex things.  A single instruction, and even a series of instructions don't just march one after the next through the chip like the customers in my example.  This is a good analogy in my opinion but it does hide (ignore) much of the complexity.  There are many parts to a CPU - some perform integer math, some do other things, etc. and depending on the CPU, and the code you are running, and how it was compiled, there are various ways in which things are automatically parallelized.  For example, if it appears that an upcoming instruction is unrelated to other things currently being run and the parts of the CPU that would handle it are free, it will jump ahead and run that while using other bits of the CPU for other instructions.  It is because of this complexity that the idea of "simultaneous" execution becomes a bit muddy, but for the purpose of understanding what is going on here, just imagine the example I gave and it should represent what's going on relatively well.

 

[*2] I must confess, I do not know exactly the bit level "play by play" of what is going on, and I have heard that a core with HT and one without are not physically different, and that they are physically different, but again, for the sake of a general understanding, just consider it a way to use what's already there more effectively, and not as additional hardware, since that becomes hard to distinguish from simply having an additional core.

 

[3*] OK, obviously there's more to it - IPC, cache size and speed, other differences in architecture, etc.  But what I mean is you could realistically simulate the performance, give or take, of an i3 6100 by disabling some stuff in an i7 6700k

 

I hope someone finds this useful, and I want this of course to be entirely accurate and correct so if you believe there is a mistake, please let me know, but to the best of my knowledge this is all valid.

 

Thanks for reading! :) 

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, awesomeness10120 said:

Awesome job. Perhaps elaborate on CMT multithreading like AMD's Bulldozer?

I'd like to, but frankly, I'm not really familiar enough with it to do it justice

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

"logical cores" are called threads

http://ark.intel.com/products/88195/Intel-Core-i7-6700K-Processor-8M-Cache-up-to-4_20-GHz

Quote
-
Performance
# of Cores 4
# of Threads 8
Processor Base Frequency 4 GHz

 

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Enderman said:

Did you actually click on the row and read Intel's description?  I think they actually agree with me ;) 

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, Ryan_Vickers said:

Did you actually click on the row and read Intel's description?  I think they actually agree with me ;) 

its still called a "thread"

you know words can have more than one meaning right?

 

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Enderman said:

its still called a "thread"

you know words can have more than one meaning right?

 

I've never seen it use like that anywhere ever.  Can you point me to any reputable source explicitly calling a logical core a thread?  Because it sounds to me like calling the representation of a printer in device manager, and a print job submitted to it the same thing.  Even that Intel site doesn't explicitly call it that - it just seems (and weakly to be honest) to imply it.  It even goes on to specify that a thread is a series of instructions.  

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Enderman said:

its still called a "thread"

you know words can have more than one meaning right?

 

You know arguing an official meaning of a word when the intended meaning and explanation is explicitly described is neither helpful nor useful.

 

It might as well be the definition of pedantics...

 

On topic, nice work @Ryan_Vickers, my only comment is the reminder that desktop mainstream i7s are currently 4/8, mobile i7s can be 2/4,4/8 (and irc there is one insanely annoying case of a 4/4 now with skylake). 

LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Note 10+ - Surface Book 2 15"

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Curufinwe_wins said:

On topic, nice work @Ryan_Vickers, my only comment is the reminder that desktop mainstream i7s are currently 4/8, mobile i7s can be 2/4,4/8 (and irc there is one insanely annoying case of a 4/4 now with skylake). 

Yeah good point, I was thinking only about the desktop side, but the general concepts represented here apply to them all (at least until Intel makes some crazy change :))

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Ryan_Vickers said:

I've never seen it use like that anywhere ever.  Can you point me to any reputable source explicitly calling a logical core a thread?  Because it sounds to me like calling the representation of a printer in device manager, and a print job submitted to it the same thing.  Even that Intel site doesn't explicitly call it that - it just seems (and weakly to be honest) to imply it.  It even goes on to specify that a thread is a series of instructions.  

have you never seen a CPU review?

literally everyone says a "4 core 8 thread" or "2 core 4 thread" processor

nobody says "logical cores"

 

yeah the proper meaning of that word is about the processes running, but "hyperthreading" has made the words "logical core" simply into "cpu threads" or "threads"

this is one example of where a word is commonly used for something other than its intended meaning, becoming a new definition of that word

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Ryan_Vickers said:

Yeah good point, I was thinking only about the desktop side, but the general concepts represented here apply to them all (at least until Intel makes some crazy change :))

ugh no it was an i5... http://ark.intel.com/products/88962/Intel-Core-i5-6440HQ-Processor-6M-Cache-up-to-3_50-GHz

 

Mobile pentium/celeron 2c/2t w/o turbo boost

Mobile i3 2c/4t w/o turbo boost

Mobile i5 2c/4t (-U moniker) , 4c/4t (-HQ moniker) w/ turbo boost

Mobile i7 2c/4t (-U moniker), 4c/8t (-HQ/HK) w/ turbo boost

 

I know this isn't the scope of the guide but this is why -U i7s are very rarely worth the upgrade cost over -U i5s. (There are some fringe cases with significantly better graphics.)

LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Note 10+ - Surface Book 2 15"

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Enderman said:

have you never seen a CPU review?

literally everyone says a "4 core 8 thread" or "2 core 4 thread" processor

nobody says "logical cores"

 

yeah the proper meaning of that word is about the processes running, but "hyperthreading" has made the words "logical core" simply into "cpu threads" or "threads"

this is one example of where a word is commonly used for something other than its intended meaning, becoming a new definition of that word

I know it gets used like that a lot, hence why I mentioned that it is a commonly misused term :)

I realize it's easier to say, and that to a lot of people, that's the meaning it has, and often that is how languages change and evolve, but I think it has obscured the truth and led to misunderstandings about how the whole system actually works.  I have an issue with someone just saying "the i7 has 8 threads" because - and let's accept your dual meaning of "threads" for the sake of this - they may actually be correct or incorrect, and only further discussion would reveal their true idea or understanding of the system.  If the "right" terms were the only ones ever used, it would remove a lot of ambiguity.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Enderman said:

have you never seen a CPU review?

literally everyone says a "4 core 8 thread" or "2 core 4 thread" processor

nobody says "logical cores"

 

yeah the proper meaning of that word is about the processes running, but "hyperthreading" has made the words "logical core" simply into "cpu threads" or "threads"

this is one example of where a word is commonly used for something other than its intended meaning, becoming a new definition of that word

Actually when needing to compare between different levels of SMT and the bullshit Bulldozer does, the term "logical core" comes up quite a bit. But I do understand where you are going with this.

 

Alternatively "logical threads" comes up as well (ie in dolphin's runtime).

LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Note 10+ - Surface Book 2 15"

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, Enderman said:

its still called a "thread"

you know words can have more than one meaning right?

 

It's just confusing to have a word have two different meanings in the same context. Which is why it's important to make the distinction. A thread then is a line of commands and a logical core is capable of executing up to one thread at the same time.

We have a NEW and GLORIOUSER-ER-ER PSU Tier List Now. (dammit @LukeSavenije stop coming up with new ones)

You can check out the old one that gave joy to so many across the land here

 

Computer having a hard time powering on? Troubleshoot it with this guide. (Currently looking for suggestions to update it into the context of <current year> and make it its own thread)

Computer Specs:

Spoiler

Mathresolvermajig: Intel Xeon E3 1240 (Sandy Bridge i7 equivalent)

Chillinmachine: Noctua NH-C14S
Framepainting-inator: EVGA GTX 1080 Ti SC2 Hybrid

Attachcorethingy: Gigabyte H61M-S2V-B3

Infoholdstick: Corsair 2x4GB DDR3 1333

Computerarmor: Silverstone RL06 "Lookalike"

Rememberdoogle: 1TB HDD + 120GB TR150 + 240 SSD Plus + 1TB MX500

AdditionalPylons: Phanteks AMP! 550W (based on Seasonic GX-550)

Letterpad: Rosewill Apollo 9100 (Cherry MX Red)

Buttonrodent: Razer Viper Mini + Huion H430P drawing Tablet

Auralnterface: Sennheiser HD 6xx

Liquidrectangles: LG 27UK850-W 4K HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Curufinwe_wins said:

Actually when needing to compare between different levels of SMT and the bullshit Bulldozer does, the term "logical core" comes up quite a bit. But I do understand where you are going with this.

I thought Bulldozer had half as many Floating Point Units as it had Integer processors?

We have a NEW and GLORIOUSER-ER-ER PSU Tier List Now. (dammit @LukeSavenije stop coming up with new ones)

You can check out the old one that gave joy to so many across the land here

 

Computer having a hard time powering on? Troubleshoot it with this guide. (Currently looking for suggestions to update it into the context of <current year> and make it its own thread)

Computer Specs:

Spoiler

Mathresolvermajig: Intel Xeon E3 1240 (Sandy Bridge i7 equivalent)

Chillinmachine: Noctua NH-C14S
Framepainting-inator: EVGA GTX 1080 Ti SC2 Hybrid

Attachcorethingy: Gigabyte H61M-S2V-B3

Infoholdstick: Corsair 2x4GB DDR3 1333

Computerarmor: Silverstone RL06 "Lookalike"

Rememberdoogle: 1TB HDD + 120GB TR150 + 240 SSD Plus + 1TB MX500

AdditionalPylons: Phanteks AMP! 550W (based on Seasonic GX-550)

Letterpad: Rosewill Apollo 9100 (Cherry MX Red)

Buttonrodent: Razer Viper Mini + Huion H430P drawing Tablet

Auralnterface: Sennheiser HD 6xx

Liquidrectangles: LG 27UK850-W 4K HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Energycore said:

I thought Bulldozer had half as many Floating Point Units as it had Integer processors?

That depends on what you mean when you say "integer processors"

AMD_Bulldozer_block_diagram_%28CPU_core_

But yes Bulldozer uses CMT instead of SMT. Power8 uses a 8-way SMT, while Intel uses 2-way SMT.

LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Note 10+ - Surface Book 2 15"

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Curufinwe_wins said:

That depends on what you mean when you say "integer processors"

AMD_Bulldozer_block_diagram_%28CPU_core_

But yes Bulldozer uses CMT instead of SMT. Power8 uses a 8-way SMT, while Intel uses 2-way SMT.

That diagram clarifies it, thanks. Looking forward to SMT chips from AMD.

We have a NEW and GLORIOUSER-ER-ER PSU Tier List Now. (dammit @LukeSavenije stop coming up with new ones)

You can check out the old one that gave joy to so many across the land here

 

Computer having a hard time powering on? Troubleshoot it with this guide. (Currently looking for suggestions to update it into the context of <current year> and make it its own thread)

Computer Specs:

Spoiler

Mathresolvermajig: Intel Xeon E3 1240 (Sandy Bridge i7 equivalent)

Chillinmachine: Noctua NH-C14S
Framepainting-inator: EVGA GTX 1080 Ti SC2 Hybrid

Attachcorethingy: Gigabyte H61M-S2V-B3

Infoholdstick: Corsair 2x4GB DDR3 1333

Computerarmor: Silverstone RL06 "Lookalike"

Rememberdoogle: 1TB HDD + 120GB TR150 + 240 SSD Plus + 1TB MX500

AdditionalPylons: Phanteks AMP! 550W (based on Seasonic GX-550)

Letterpad: Rosewill Apollo 9100 (Cherry MX Red)

Buttonrodent: Razer Viper Mini + Huion H430P drawing Tablet

Auralnterface: Sennheiser HD 6xx

Liquidrectangles: LG 27UK850-W 4K HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

I just ran an experiment a little bit ago, to attempt to test performance of hyperthreading vs a real core for myself.

 

My CPU is an Intel Core i7-4790K, running at stock settings, cooled by the Cooler Master Hyper 212 Evo.  Motherboard is an ASRock Z97 Extreme6, with 32GB RAM installed.  I'm running Windows 10 Pro 64-bit, and have no discrete graphics card.  (I'm waiting for Pascal & Polaris.  I hope to buy a ~$350 GPU later this summer, but I don't have the $ right now.)

 

The programs I used were Prime95 26.6, and Cinebench R15.  Also, Windows Task Manager was used to set process affinity to specific logical cores.

 

I ran a total of 8 tests, but in hindsight I only needed to run 4.  Tests 1, 3, 5 and 7 were with Cinebench in normal mode; 2, 4, 6, 8 were single-threaded Cinebench.  Prime95 was running for tests 1-4, and idle for tests 5-8.

 

I set Cinebench in the options to only run on 1 thread for these tests.  (I did run a couple tests prior, but had to quickly run over to Windows Task Manager to reset process affinity.  Seems that every time I'd start Cinebench, even in single thread mode, it'd set the affinity to use all logical cores.  This was in spite of me manually setting it in Task Manager previously.  Starting the CB run, THEN setting affinity in WTM did seem to work, though.)

 

Prime95 was set to run on 1 worker thread, and was started before the first Cinebench run, and stopped after the 2nd Cinebench run listed below.

 

I'll give settings and results for the relevant tests below.

 

Test 2

  • Prime95 status: Running
  • Prime95 Affinity: Core 6 (I presume it's the 4th Physical Core)
  • Cinebench R15 test: Single Thread
  • Cinebench R15 Affinity: Core 7 (I presume it's the hyperthread on the 4th physical core)
  • Cinebench Score: 89

 

Test 4

  • Prime95 status: Running
  • Prime95 Affinity: Core 6
  • Cinebench R15 test: Single Thread
  • Cinebench R15 Affinity: Core 6 (same as Prime95)
  • Cinebench Score: 134

 

Test 6

  • Prime95 status: Idle
  • Cinebench R15 test: Single Thread
  • Cinebench R15 Affinity: Core 7
  • Cinebench Score: 136

 

Test 8

  • Prime95 status: Idle
  • Cinebench R15 test: Single Thread
  • Cinebench R15 Affinity: Core 6
  • Cinebench Score: 136

 

 

I found it quite interesting that Cinebench actually scored higher when running on the same logical core as Prime95, than when running on the separate hyperthreaded (but same physical) core.  I would have expected it to be the same as the previous test, or possibly lower.  It surprised me to see it end up scoring almost the same as when Prime95 was Idle.

 

When Cinebench was essentially the only main process running on the 4th physical core (whether I ran it on the hyperthread (2nd logical core) or the real core), the scores were identical.

 

I guess that means a "hyperthread" uses the entire physical core when nothing else is using it.  When another program is using the physical core, the hyperthread doesn't perform as well as it otherwise could.

 

Of course there were other semi-idle processes in the background, for example a Linux VM with Audacity and Firefox open, Chrome with a few hundred tabs, etc. but CPU usage with both Prime95 and Cinebench idle was hovering at around 7-15% or so.  Memory usage is currently around 19.4 GB.  (I forgot to check it earlier during the test, but right now my system is in a similar state to what it was before I opened the programs for the test.)  While I didn't monitor CPU temperatures during the test in HWMonitor, I think it might have been around 45-50°C or so when I started Prime95.

 

 

 

I was also going to run this test on my laptop (Clevo P750DM-G which has an i3-6100 which also has HT), but right now it seems to be limited to 2 GHz instead of 3.7 GHz.  I know it's a user setting in some program somewhere, but I can't seem to open the program to change it.  (It's one that has presets for Power Saving, Performance, Media, Gaming, etc; and if you call up the advanced options, it brings up Intel XTU.)  Looks like I need to restart the laptop, but I have some other stuff running on it right now so I can't do it until later.

Also in the Windows power options, it's set to use 100% of CPU when on AC power, yet it's limiting itself to 2 GHz.  It's not thermal throttling, as it's only hitting about 50°C or so.

I haven't tested on battery recently, but sometimes I've had it get better performance on battery power than on AC.  Seems that either I have some settings wrong somewhere, or I need a better power adapter.  I had gotten the 230W one from RJTech - didn't think I needed the 330W one as I didn't pick a K CPU or the GTX 980M - just the i3 and the 970M.  Also speaking of the GPU, I still need to figure out some tweaks there.  In gaming and video editing I'd like to have the 970M's full potential, but when just doing generic tasks in Windows or watching movies that aren't too demanding, I'd like the power usage to be similar to an iGPU.  That's a topic for another post, though. :)

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, PianoPlayer88Key said:

Test 2

  • Prime95 status: Running
  • Prime95 Affinity: Core 6 (I presume it's the 4th Physical Core)
  • Cinebench R15 test: Single Thread
  • Cinebench R15 Affinity: Core 7 (I presume it's the hyperthread on the 4th physical core)
  • Cinebench Score: 89

 

Test 4

  • Prime95 status: Running
  • Prime95 Affinity: Core 6
  • Cinebench R15 test: Single Thread
  • Cinebench R15 Affinity: Core 6 (same as Prime95)
  • Cinebench Score: 134

 

Test 6

  • Prime95 status: Idle
  • Cinebench R15 test: Single Thread
  • Cinebench R15 Affinity: Core 7
  • Cinebench Score: 136

 

Test 8

  • Prime95 status: Idle
  • Cinebench R15 test: Single Thread
  • Cinebench R15 Affinity: Core 6
  • Cinebench Score: 136

Interesting test.  Here's my take on it :) 

 

In test 6 and 8, we just see a process running 1 thread on 1 physical core.  You tests confirm as mine did, and honestly as Luke mentioned, that there is no "main core" and a subordinate minor "HT core" - they both just mean run on that core.  But, this was no surprise to anyone I'm sure.

 

In test 2, we see both the cinebench task and the "dead load task" (prime 95) running on the same physical core, but each being scheduled in the spare time left by the other, due to being set to run in alternate logical cores presented by the same physical core.  Hence, as expected, the performance of both tasks (presumably - we can only measure the 1) drops, but not quite in half - in fact, it is about 30% than half.  This shows the power of hyperthreading.

 

Then comes test 4 and messes everything up :D  One would expect that in this scenario - with both cinebench and the prime95 scheduled on the same logical core - that, assuming equally shared resources, that the performance would indeed be half of what was found when running cinebench alone.  Perhaps there are other interpretations of this result, but I believe this is a result of priorities coming into play.  I believe Windows was scheduling the prime95 task at a lower priority, and while this is often unpredictable, it (in this case) worked perfectly and let cinebench take all of the resources of that core, leaving only what was left (nothing in this case, since cinebench maxes out the core) to the lower priority task.  Honestly, I'd love to see this tried again with different priority settings in task manager :) 

 

Now you're probably thinking, "wait, if cinebench is a higher priority, why did it not dominate p95 in test 2?"  Here's the trick: even though the tasks were sharing the same physical core, they were on separate logical cores, and thus, regardless of priority, (assuming there was no other intense task on either of their logical cores), each was free to take the entire logical core, which, physically, works out to taking all the power of the physical core that wasn't used by the other task.  Assuming both processes behave similarly in terms of time spent in the CPU doing nothing, this will approximate having them both on the same logical core with equal priorities (with the exception that when on separate logical cores, overall, performance will be higher since HT can come into play).  Since we see the cinebench task get slightly higher than 50% of the power of that physical core, I presume p95 was spending more time in the CPU waiting for memory or something like that.

 

That's my interpretation :) 

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Ryan_Vickers said:

Interesting test.  Here's my take on it :) 

 

In test 6 and 8, we just see a process running 1 thread on 1 physical core.  You tests confirm as mine did, and honestly as Luke mentioned, that there is no "main core" and a subordinate minor "HT core" - they both just mean run on that core.  But, this was no surprise to anyone I'm sure.

 

In test 2, we see both the cinebench task and the "dead load task" (prime 95) running on the same physical core, but each being scheduled in the spare time left by the other, due to being set to run in alternate logical cores presented by the same physical core.  Hence, as expected, the performance of both tasks (presumably - we can only measure the 1) drops, but not quite in half - in fact, it is about 30% than half.  This shows the power of hyperthreading.

 

Then comes test 4 and messes everything up :D  One would expect that in this scenario - with both cinebench and the prime95 scheduled on the same logical core - that, assuming equally shared resources, that the performance would indeed be half of what was found when running cinebench alone.  Perhaps there are other interpretations of this result, but I believe this is a result of priorities coming into play.  I believe Windows was scheduling the prime95 task at a lower priority, and while this is often unpredictable, it (in this case) worked perfectly and let cinebench take all of the resources of that core, leaving only what was left (nothing in this case, since cinebench maxes out the core) to the lower priority task.  Honestly, I'd love to see this tried again with different priority settings in task manager :) 

 

Now you're probably thinking, "wait, if cinebench is a higher priority, why did it not dominate p95 in test 2?"  Here's the trick: even though the tasks were sharing the same physical core, they were on separate logical cores, and thus, regardless of priority, (assuming there was no other intense task on either of their logical cores), each was free to take the entire logical core, which, physically, works out to taking all the power of the physical core that wasn't used by the other task.  Assuming both processes behave similarly in terms of time spent in the CPU doing nothing, this will approximate having them both on the same logical core with equal priorities (with the exception that when on separate logical cores, overall, performance will be higher since HT can come into play).  Since we see the cinebench task get slightly higher than 50% of the power of that physical core, I presume p95 was spending more time in the CPU waiting for memory or something like that.

 

That's my interpretation :) 

Ahh, thanks for your comments. :) I had a couple thoughts.  (Was going to post earlier but got "delayed" by my future $379 GPU…)

 

"presumably - we can only measure the 1" - Another idea I had ... maybe I should run TWO CPU benchmarks at the same time, so I could get actual scores.  Maybe I could run Cinebench R15 as one task, and the CPU portion of FireStrike (I bought 3DMark on Steam when it was on sale a couple months ago) as the other task.  Or should I maybe use Cinebench 11.5 or another benchmark?  (I don't have Aida64.)

 

"Scheduling the prime95 task at a lower priority" ... now I realize I forgot to manually set the priority for both tasks.  Maybe I should re-run the tests and set both to high or realtime? (Or whatever the highest setting is.)

 

And actually I would have expected, on test 4, for the Cinebench performance to be considerably worse, on account of Prime95 - like, at best, significantly less than half of the solo performance. "Presume P95 was spending more time in the CPU waiting" - Maybe I should have changed the settings in P95 to be a bit more demanding?  I ran on large FFT, maybe I should have run small FFT?  Or maybe set a custom FFT size of 32 or 256 (not K) or something like that, or near the lowest it lets me set it?  (I was running an older version of P95 so it wouldn't cook my 4790K.)

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, PianoPlayer88Key said:

Ahh, thanks for your comments. :) I had a couple thoughts.  (Was going to post earlier but got "delayed" by my future $379 GPU…)

 

"presumably - we can only measure the 1" - Another idea I had ... maybe I should run TWO CPU benchmarks at the same time, so I could get actual scores.  Maybe I could run Cinebench R15 as one task, and the CPU portion of FireStrike (I bought 3DMark on Steam when it was on sale a couple months ago) as the other task.  Or should I maybe use Cinebench 11.5 or another benchmark?  (I don't have Aida64.)

 

"Scheduling the prime95 task at a lower priority" ... now I realize I forgot to manually set the priority for both tasks.  Maybe I should re-run the tests and set both to high or realtime? (Or whatever the highest setting is.)

 

And actually I would have expected, on test 4, for the Cinebench performance to be considerably worse, on account of Prime95 - like, at best, significantly less than half of the solo performance. "Presume P95 was spending more time in the CPU waiting" - Maybe I should have changed the settings in P95 to be a bit more demanding?  I ran on large FFT, maybe I should have run small FFT?  Or maybe set a custom FFT size of 32 or 256 (not K) or something like that, or near the lowest it lets me set it?  (I was running an older version of P95 so it wouldn't cook my 4790K.)

  • I would suggest running two copies of the same benchmark, unless there is some reason I am unaware of that would prevent such a thing
  • Yeah I would be interested to see what they both do at the same priority (ideally a high one)
  • If my understanding is correct, anything that lets it spend more time just crunching numbers and less time interfacing with the memory or other parts of the system, the better.  To that end, I'd imagine smaller FFTs and turning down any option that taxes RAM would help in this regard.  Just be careful running that on any haswell chip :) 

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

On 5/7/2016 at 11:55 PM, Ryan_Vickers said:
  • I would suggest running two copies of the same benchmark, unless there is some reason I am unaware of that would prevent such a thing
  • Yeah I would be interested to see what they both do at the same priority (ideally a high one)
  • If my understanding is correct, anything that lets it spend more time just crunching numbers and less time interfacing with the memory or other parts of the system, the better.  To that end, I'd imagine smaller FFTs and turning down any option that taxes RAM would help in this regard.  Just be careful running that on any haswell chip :) 

 

Well, I tried running two copies of Cinebench, but Windows 10 wouldn't let me. :( (Unless I did it on 2 computers, but that doesn't really count, does it? :P)  I thought about installing another copy of Windows 10 (from the downloaded ISO) in a VM, but decided not to do it this time.

 

I was, however, able to do THIS: :) 

 

 

Yes, the piano is my computer desk, and yes, the laptop has a desktop CPU in it.  (My desktop's tower is not visible in the shot; it's standing on the floor to the left of the piano.)  Hopefully the annotations in the video should explain what's going on. :)

 

I did forget to set the priority of the tasks, but in this case I think it didn't matter.  Cinebench's snail pace when running on the same logical core as FireStrike was what I had expected.

 

Also I chose not to use Prime95, as I wanted something that would give a score.  Also it's the newer version of P95 that cooks Haswell chips (and I would presume Skylake but I haven't tested).  Someone over on Tom's Hardware forums says don't run P95 newer than 26.6 on Haswell.  (He's posted a pretty extensive guide on the subject there.)

 

FireStrike wouldn't give me a score when I was looping the test, but I figured if I did a time-lapse video, you could see the FPS counters. :)  The one on the laptop may be a little harder to see, but it's typically around 4-6 fps or so.  I guess if you could watch in 4K it should be more visible.  (I had limited both of them to 1 logical core.)  As of the time of this posting, only 1080p was available on YT, although it was uploaded in 4K so hopefully that'll eventually work.  (Unless the youtube video editor caps at 1080p, I hope not though.)

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...
On 5/5/2016 at 9:17 PM, Ryan_Vickers said:

Imagine a store with many customers, and several cashiers that each have one till.  Each customer checking out is like a thread, and each cashier is like a core.  Now what happens when an old lady is rummaging for change?  That cashier - that core - is still occupied with that customer - that thread - but it's not really doing anything, just, waiting.  This happens in real programs as well.  Sometimes a task gets to the CPU, and then realizes, "oh wait, I need something from memory".  In the nanoseconds that it is fetching that, the CPU is occupied but not actually accomplishing anything.  Imagine if that cashier had a second till - the cashier is still only able to work so fast, but at least he/her can make use of his or her "spare time" more effectively.  Now, with a store of customers, several cashiers, and 2 tills per cashier, each cashier can work on checking out a customer, unless they are held up for some reason, at which point that cashier can use his or her other till to start checking out another customer.  This is hyper-threading in a nutshell.  It is not another core, or anything like that - it is just a way that the existing hardware can be used more effectively.  In theory, 1 core with hyper-threading and 1 core without hyper-threading will perform an identical task in exactly the same amount of time (assuming the cores are the same in every other way), but if faced with 2 tasks, the hyper-threaded core will be faster. Probably not twice as fast, but faster for sure - maybe ~50%, depending on the task, though in theory it could be anywhere from no better to twice as good.  Now, Luke did explain this right at the end, but the rest of those tests just seemed... pointless.  You are about to see my attempt to do better.

So what you are saying is that hyper-threading's only purpose in life is to speed up context switching?

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, straight_stewie said:

So what you are saying is that hyper-threading's only purpose in life is to speed up context switching?

Not quite, although I suppose the concept is similar.  I think of context switching as doing one complete little instruction*, then a different one, etc.

 

Thing is, that instruction* may consist of a combination of actually performing calculations, and waiting on memory or something else during which time the CPU is idle.  Hyper-threading would allow the CPU to work on a different instruction during that time, creating the illusion that both are actually running simultaneously.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Ryan_Vickers said:

Not quite, although I suppose the concept is similar.  I think of context switching as doing one complete little instruction*, then a different one, etc.

I thought a context switch had to occur to allow "multitasking" on a single core?

Warning: This is a question, this may not be a correct description of how context switching works:

We have two programs, 0 and 1. Each program has two instructions A(n) and B(n). First we are running instruction A0. At the end of A0, we save all of our stuff for 0 as well as the program counter. Next, we run a subroutine that loads program 1 into memory and sets the program counter accordingly. Now we start execution at A1. Halfway through executing A1, the OS interrupts us (there's lots of complex stuff that allows that to happen that I don't fully understand yet) and tells us that we need to go back to program 0 immediately. So we roll back 1 to the state it was in before we started executing A1, save it's state and the program counter. We next run the context switch subroutine to load program 0 and it's saved program counter state. Execution of program 0 now begins at instruction B0. 

So my question is, isn't hyper-threading just really optimized context switching? 

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×