Note: I posted this as a status update, but it got long enough that I wanted to preserve it as a blog.
This popped up in my news feed: Netflix's senior software architect says Windows' CPU utilization meter is wrong. He has some good points in that it's not measuring the time a thread is actually doing useful work, rather than waiting on something, like data from RAM. Which he also points out that there is a gap in performance between RAM and CPU, but that's been a known problem for the past 30+ years.
In any case, while I like what he presents, I don't think Task Manager's CPU utilization is wrong, just misleading. All Task Manager's CPU utilization graph is measuring is the percentage of time in the sampling period (usually 1 second) a logical processor spent running the System Idle process. Nothing more.
The other thing is Windows can't tell if a thread is doing useful work or not unless it somehow observes what the thread is doing. Except the problem there is that requires CPU time. So it has to interrupt a thread (not necessarily the one its observing) to do this. And how often do you observe what a thread is doing? Adding this feature just to get a more accurate representation of CPU utilization is likely going to decrease general performance due to overhead.
If anything, I think it should be the app's responsibility to go "hey OS, I can't do anything else at the moment, so I'm going to sleep" when it actually can't do any more useful work. But a lot of developers like to think their app is the most important app ever and that any time they get on the CPU is precious, so they'll use up all of the time slice they can get.
Backup: About the System Idle Process
Just so people are informed about the System Idle Process (taken from https://en.wikipedia.org/wiki/System_Idle_Process):
The primary purpose of the idle process and its threads is to eliminate what would otherwise be a special case in the scheduler. Without the idle threads, there could be cases when no threads were runnable (or "Ready" in terms of Windows scheduling states). Since the idle threads are always in a Ready state (if not already Running), this can never happen. Thus whenever the scheduler is called due to the current thread leaving its CPU, another thread can always be found to run on that CPU, even if it is only the CPU's idle thread. The CPU time attributed to the idle process is therefore indicative of the amount of CPU time that is not needed or wanted by any other threads in the system.
The scheduler treats the idle threads as special cases in terms of thread scheduling priority. The idle threads are scheduled as if they each had a priority lower than can be set for any ordinary thread
Now you might go "but why bother with an idle process at all?" There's a few reasons described in https://embeddedgurus.com/stack-overflow/2013/04/idling-along/. A few of these are:
- "Petting the watchdog", a watchdog is a hardware timer that if it overflows, resets the system. This is for reliability reasons in case the system hangs. This action resets the watchdog timer.
- Power saving: If the CPU is running the idle process, you know the CPU is doing nothing. Depending on how aggressive you want the power saving to be, you can have it sleep the CPU as soon as it runs the idle state or after other conditions have been met
- Debug/logging tasks (Not mentioned in the article, but in a comment): if your tasks are time sensitive, you may not want them to dump stuff for logging or debugging since those specific tasks may not be predictable in the amount of time they spend. So you shove it to the idle task.