Jump to content

[Guide] Hyper-threading and Windows, explained for real

vanished
4 minutes ago, straight_stewie said:

I thought a context switch had to occur to allow "multitasking" on a single core?

Warning: This is a question, this may not be a correct description of how context switching works:

We have two programs, 0 and 1. Each program has two instructions A(n) and B(n). First we are running instruction A0. At the end of A0, we save all of our stuff for 0 as well as the program counter. Next, we run a subroutine that loads program 1 into memory and sets the program counter accordingly. Now we start execution at A1. Halfway through executing A1, the OS interrupts us (there's lots of complex stuff that allows that to happen that I don't fully understand yet) and tells us that we need to go back to program 0. So we roll back 1 to the state it was in before we started executing A1, save it's state and the program counter. We next run the context switch subroutine to load program 0 and it's saved program counter state. Execution of program 0 now begins at instruction B0. 

So my question is, isn't hyper-threading just really optimized context switching? 

Perhaps it is... I have to be honest, I'm not a computer engineer or anything like that so the exact goings-on at that level are unfamiliar to me, but my understanding is that it simply allows the existing cores to be active more of the time - it gets more out of them.  It's not actually additional hardware.  So, perhaps that is an apt description: that it is high-end context switching :D  I think it is a little different though.  It's sort of like having 4 arms so you can juggle two tasks at once rather than putting one down to do the other.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

Something I was just thinking as I was skimming the last few posts ...

 

How did it work in previous decades, where we still had multitasking-capable OS's, but only single-core, non-hyperthreaded CPUs?  Examples would be Pentium III and earlier (and most P4's), all the way back to at least the 386.  (I think the 286 and 8086 also had some limited multitasking capability as well, correct me if I'm wrong?)  For OS's, that would be Windows XP (which I believe came out before multi-threaded CPUs existed on the consumer side), back to at least Windows 3.0, possibly 1.0, right?

 

If a user had multiple concurrent tasks on, say, a '386 running Windows 3.1, like an antivirus scan, a video render, playing Wolfenstein 3D, zipping a folder, etc. simultaneously, how did that work then, compared to modern hyperthreading and multicore?  (I know of course the modern multi-threading gives much better performance.)

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, PianoPlayer88Key said:

How did it work in previous decades, where we still had multitasking-capable OS's, but only single-core, non-hyperthreaded CPUs?  Examples would be Pentium III and earlier (and most P4's), all the way back to at least the 386.  (I think the 286 and 8086 also had some limited multitasking capability as well, correct me if I'm wrong?)  For OS's, that would be Windows XP (which I believe came out before multi-threaded CPUs existed on the consumer side), back to at least Windows 3.0, possibly 1.0, right?

 

I'm researching this from a CS level perspective as we speak. The short answer is that old single core processors used timeslicing only, while modern processor either use "symmetrical multi-processing (SMP)" or "Simultaneous Multithreading (SMT)". SMP is just having two cores with one thread each, while hyper-threading is Intel's proprietary SMT architecture. I'll have more info and links to good articles in a while. 

The thing to remember about all single core multitasking is that it is some form of timeslicing. This means that using either method (Timeslicing, SMP vs SMT) does not allow a single core to ever run two things simultaneously. The idea is for each core to be able to switch tasks quickly and regularly, to appear as if programs running on a single core are running simultaneously, even though they are not.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

Below are some pertinent articles that should be read in the order presented:

  1. University of Alaska, Fairbanks A very brief and high level introduction into the history of multitasking methods.
  2. Simultaneous Multithreading A Wikipedia article that outlines the benefits and high level methods of SMT.
  3. Hyper-threading A Wikipedia article that outlines Hyper-threading.

The gist of what I have picked up is that SMT works in the following manner: Each core is designed in the traditional superscalar way. However, a departure from traditional design is taken. One in which each physical core has two instruction pipelines and two state registers (not the machine state flag register, a register which is used to store the state the processor is in at a given point of program execution, which coincidentally includes the flag register).

Hyper-threading (SMT) is nothing but highly optimized, hardware aided, context switching. All it does is remove the need to flush and reload the instruction pipeline and state registers every time a context switch occurs. A cache flush may have to take place during any context switch however, and context switching to new programs (programs which have not previously executed on that specific logical core) will still require a flush of the logical core that the new program is to execute on.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, straight_stewie said:

Hyper-threading (SMT) is nothing but highly optimized, hardware aided, context switching. All it does is remove the need to flush and reload the instruction pipeline and state registers every time a context switch occurs. A cache flush may have to take place during any context switch however, and context switching to new programs (programs which have not previously executed on that specific logical core) will still require a flush of the logical core that the new program is to execute on.

That makes a lot of sense.  So it is quite similar, but hyper-threading "the thing" is different than context switching "the concept/act"

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Ryan_Vickers said:

That makes a lot of sense.  So it is quite similar, but hyper-threading "the thing" is different than context switching "the concept/act"

Yes I suppose that's correct.

I would like to take this opportunity to stipulate that not all processors implement two instruction pipelines for two logical cores on a single physical core. Some prefer to use a single very optimized, very large pipeline with out of order execution. This allows instructions that can be run at the same time to be run at the same time, even if they exist in separate logical threads. This design is much more complex (and therefore expensive) but results in higher performance, as it allows some true multitasking. 

Computers truly are awesome, wondrous things.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, straight_stewie said:

Yes I suppose that's correct.

I would like to take this opportunity to stipulate that not all processors implement two instruction pipelines for two logical cores on a single physical core. Some prefer to use a single very optimized, very large pipeline with out of order execution. This allows instructions that can be run at the same time to be run at the same time, even if they exist in separate logical threads. This design is much more complex (and therefore expensive) but results in higher performance, as it allows some true multitasking.

You seem to know how this all works :)  What do you think of my tests and results?

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Ryan_Vickers said:

You seem to know how this all works

Barely. I couldn't design one at all (I've tried to build TTL machines, but I can never get them to be what I want them to be), but it is a topic of much self-directed learning for me. 

 

7 minutes ago, Ryan_Vickers said:

What do you think of my tests and results?

I think they were logical and scientifically correct tests. The results of which are accurate and make sense. I also think that they prove some interesting points about how much of a performance hit a context switch creates, and how it's beneficial to run non interdependent threads from the same program on separate physical cores as much as possible.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, straight_stewie said:

Hyper-threading (SMT) is nothing but highly optimized, hardware aided, context switching. All it does is remove the need to flush and reload the instruction pipeline and state registers every time a context switch occurs. A cache flush may have to take place during any context switch however, and context switching to new programs (programs which have not previously executed on that specific logical core) will still require a flush of the logical core that the new program is to execute on.

In fact, going back and re-reading my other two descriptions with this in mind is satisfying, because I think my analogies were actually spot on (but feel free to disagree :P)

On 5/5/2016 at 8:17 PM, Ryan_Vickers said:

Imagine a store with many customers, and several cashiers that each have one till.  Each customer checking out is like a thread, and each cashier is like a core.  Now what happens when an old lady is rummaging for change?  That cashier - that core - is still occupied with that customer - that thread - but it's not really doing anything, just, waiting.  This happens in real programs as well.  Sometimes a task gets to the CPU, and then realizes, "oh wait, I need something from memory".  In the nanoseconds that it is fetching that, the CPU is occupied but not actually accomplishing anything.  Imagine if that cashier had a second till - the cashier is still only able to work so fast, but at least he/her can make use of his or her "spare time" more effectively.  Now, with a store of customers, several cashiers, and 2 tills per cashier, each cashier can work on checking out a customer, unless they are held up for some reason, at which point that cashier can use his or her other till to start checking out another customer.  This is hyper-threading in a nutshell.  It is not another core, or anything like that - it is just a way that the existing hardware can be used more effectively.  In theory, 1 core with hyper-threading and 1 core without hyper-threading will perform an identical task in exactly the same amount of time (assuming the cores are the same in every other way), but if faced with 2 tasks, the hyper-threaded core will be faster. Probably not twice as fast, but faster for sure - maybe ~50%, depending on the task, though in theory it could be anywhere from no better to twice as good.

43 minutes ago, Ryan_Vickers said:

...my understanding is that it simply allows the existing cores to be active more of the time - it gets more out of them.

...

It's sort of like having 4 arms so you can juggle two tasks at once rather than putting one down to do the other.

 

===================

 

3 minutes ago, straight_stewie said:

I think they were logical and scientifically correct tests. The results of which are accurate and make sense. I also think that they prove some interesting points about how much of a performance hit a context switch creates, and how it's beneficial to run threads from the same program on separate physical cores as much as possible.

Thanks :) I was just thinking that the whole thing about different pipeline implementations could have some implications for the applicability of my results but that's good to hear :D

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Ryan_Vickers said:

I was just thinking that the whole thing about different pipeline implementations could have some implications for the applicability of my results

It shouldn't affect your results at all. Your goal was only to see whether logical cores mapped to specific physical cores or were arbitrary. Goal accomplished. Your performance oriented test wouldn't be affected either: The trends would remain the same. 

The pipelining thing is way more complicated than it sounds. I have no idea how it really works, but I do understand that it only allows you to run instructions that do not use the same resources simultaneously. So you cannot run two add instructions simultaneously. You could however, have program 0 fetch something from memory while program 1 is doing an add. This also starts to get into the topic of how instructions work on a hardware level, and the different stages of instruction execution (in other words, can we offset instructions a certain number of stages such that they never interfere with each other, but execute in a time less than running two instructions sequentially?) Modern control Units are more complex than entire processors of yesteryear.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

I'm about to go to bed (yes I know it's 5:13am in my time zone), but as I was skimming the posts (hope to have more time to look at the links later), I was remembering things I've heard about some of the AMD CPUs.  For example, ones like the FX-8xxx series that claim to be 8 cores, but have shared FPUs, like 2 per core.  I wonder if that'd be another twist on hyperthreading?  Like, effectively 4 cores with HT?

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, PianoPlayer88Key said:

For example, ones like the FX-8xxx series that claim to be 8 cores, but have shared FPUs, like 2 per core.  I wonder if that'd be another twist on hyperthreading?  Like, effectively 4 cores with HT?

FPU stands for Floating Point Unit. It's exactly like an Arithmetic Logic Unit (ALU), and is usually part of the ALU. The FPU specifically only handles floating point operations however. Assuming that SMT is enabled with two state regisers and two instruction pipelines, having 2 FPUS and 1 integer ALU per physical core would essentially turn each physical core into two physical cores with a shared integer ALU. 

The benefit of this only exists for people who do alot of floating point math with their machine. For the average user who just surfs the web and games, there will actually be no performance benefit over a non-FPU design.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

Another thing I'd really like to see, is some kind of inverse hyperthreading, or whatever it'd be called.  With that, if you have a multi-core/thread CPU, if you're running a single-threaded workload, the CPU will split it among multiple cores.  Operating system / software support wouldn't be needed, it'd all be done in the CPU, I'm thinking.  For example, if you ran CineBench R15 under Windows 7 on a CPU that supported that, like ... well, if the 6700K supported it ... if it gets 805 cb in the multi-threaded score, it'd also get 805 cb in the single-thread score.  (Now, where software WOULD come in would be if you wanted to force a thread to run on fewer cores.)

 

I just feel like we haven't had nearly as much progress in single-threaded performance year-by-year between, say, 2006 and today (2016), as we had between 1996 and 2006 or 1986 to 1996.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, PianoPlayer88Key said:

Another thing I'd really like to see, is some kind of inverse hyperthreading, or whatever it'd be called.  With that, if you have a multi-core/thread CPU, if you're running a single-threaded workload, the CPU will split it among multiple cores.  Operating system / software support wouldn't be needed, it'd all be done in the CPU, I'm thinking.  For example, if you ran CineBench R15 under Windows 7 on a CPU that supported that, like ... well, if the 6700K supported it ... if it gets 805 cb in the multi-threaded score, it'd also get 805 cb in the single-thread score.  (Now, where software WOULD come in would be if you wanted to force a thread to run on fewer cores.)

 

I just feel like we haven't had nearly as much progress in single-threaded performance year-by-year between, say, 2006 and today (2016), as we had between 1996 and 2006 or 1986 to 1996.

That would be the holy grail of computing but sadly I feel it is probably impossible.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, PianoPlayer88Key said:

Another thing I'd really like to see, is some kind of inverse hyperthreading, or whatever it'd be called.  With that, if you have a multi-core/thread CPU, if you're running a single-threaded workload, the CPU will split it among multiple cores.

What? The point of threads is that each thread should be the smallest part of a program that cannot be executed in parallel. Splitting it between more cores will do no good, as by nature programs should be designed such that each thread is a portion of a program that can only be executed sequentially, and multiple threads are spawned to run more than one sequential task at a time. 
 

I see where you are trying to go. I actually thought about this last night. The answer is better compilers/developers.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, straight_stewie said:

What? The point of threads is that each thread should be the smallest part of a program that cannot be executed in parallel. Splitting it between more cores will do no good, as by nature programs should be designed such that each thread is a portion of a program that can only be executed sequentially, and multiple threads are spawned to run more than one sequential task at a time. 
 

I see where you are trying to go. I actually thought about this last night. The answer is better compilers/developers.

I think what he's saying is using the power of multiple cores to all work together on a single thread to boost performance by a factor of however many more cores are thrown at it

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

@Ryan_Vickers maybe so. :( I still would like the single-threaded performance to catch up to where we would be if the pace hadn't slowed down.  Also, improved IPC from one generation to the next would be good.  5% isn't NEARLY good enough for me.  I was reading recently that the 286 was 2x or more faster than the 8086 at the same clock speed.

 

And, I like to skip a generation or two when I upgrade.  When I do upgrade, I would prefer to get at least 2-3x, preferably 5x better performance at the same price.  A measly little 5%, 10%, even 25%, to me isn't worth upgrading.  (Exceptions can be made, albeit for a much lower budget, if my system dies catastrophically before I was otherwise ready to upgrade.)

 

Also with the trend toward better power efficiency ... I wonder if we'll ever see an enthusiast-class CPU and GPU ($350 to $1000 range) that at stock settings doesn't need a fan or even a heatsink, even with ambient temps around 40°C?  I think 15-20 years ago, even the highest-end consumer products didn't have any coolers on them, right?  Although they weren't nearly as capable then, I realize. :)

Also, for overclocking, I think only 1 or 2 bins on the multiplier above stock turbo isn't really worth it.  I prefer to have a more significant overclock, like in terms of percentages, what the i5-2500K, Pentium G3258 can do, one of the 3D Mark achievements on steam (50% overclock), or the turbo boost on some low-power mobile CPUs (like ones that are like 1.0 GHz stock and boost up to like 2.8 or whatever they do).

 

And yes, that's exactly what I'm saying re: what @straight_stewie said - using multiple cores to boost a single thread's performance by however many cores are thrown at the thread.  Now, one possible caveat, is I'm guessing that a CPU that supports conventional hyperthreading, you wouldn't get the extra hyperthreads in performance.  For example, my 4790K is 4 GHz at stock, not counting turbo boost.  It's 4 cores, 8 threads.  A single-threaded workload would be effectively working at like 16 GHz in the example I'm thinking of, not 32 GHz.  (Now if you could get some of that extra performance, like effectively 20 or even 24 GHz, that'd be nice.)

 

Also, "the answer is better compilers/developers" - but that would require new programs, or existing programs to be recoded.  My idea would accomodate already-existing software that wasn't optimized to handle it.  This was something I think we really needed back when multi-core CPUs first came out, and almost all software wasn't optimized to handle it.

 

Although, another thing I thought of too ... or am I the only one who does this?  Running multiple programs at once, even to the point of having more things running simultaneously than I have CPU threads, like running a few VMs, several actively-refreshing pages open in a web browser, playing a game, rendering a video, doing an audio edit, applying an effect in a batch of photos, archiving/zipping a folder, running 3dmark firestrike, prime95, cinebench, and several other things.  Sometimes I've had my 4790K pegged at 100% usage. :o 

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Ryan_Vickers said:

I think what he's saying is using the power of multiple cores to all work together on a single thread to boost performance by a factor of however many more cores are thrown at it

The only benefit to multiple cores is simultaneous execution. This means that your problem must be able to be turned into a parallel problem for it to receive any benefit from running on multiple threads. I will give three examples:

  1. Do 3+5=x; Do 3+x=y; This is purely a sequential problem. The second "Do" statement cannot be successfully completed until the first "Do" statement has completed. There is no possible way to make this run on multiple threads and receive a performance benefit.
  2. Do 3+5=x; Do 4+6=y; This is a purely parallel problem. Each "Do" statement can be run inside of an individual thread, which would theoretically halve the time it takes to complete the task. Parallel processing is of huge benefit here.
  3. Do 3+5=x; Do 4+6=y; Do y+x=z; This is a sequential/parallel problem. This is where most optimizations come into play. The first two "Do" statements can be run simultaneously, however, the third "Do" statement cannot be run until both of the first two statements have completed. To accomplish this, you should run the first two statements in threads 0 and 1, and the third do statement in thread 0 upon completion of both threads.

The way to increase and fully utilize the benefits of parallel processing is to: 1) Write software more intelligently in regards to what can be concurrently executed and what cannot, and 2) Write compilers that can better catch the failures of developers to do step #1 and apply better optimizations to increase the use of parallel processing when applicable.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, straight_stewie said:

The only benefit to multiple cores is simultaneous execution. This means that your problem must be able to be turned into a parallel problem for it to receive any benefit from running on multiple threads. I will give three examples:

  1. Do 3+5=x; Do 3+x=y; This is purely a sequential problem. The second "Do" statement cannot be successfully completed until the first "Do" statement has completed. There is no possible way to make this run on multiple threads and receive a performance benefit.
  2. Do 3+5=x; Do 4+6=y; This is a purely parallel problem. Each "Do" statement can be run inside of an individual thread, which would theoretically halve the time it takes to complete the task. Parallel processing is of huge benefit here.
  3. Do 3+5=x; Do 4+6=y; Do y+x=z; This is a sequential/parallel problem. This is where most optimizations come into play. The first two "Do" statements can be run simultaneously, however, the third "Do" statement cannot be run until both of the first two statements have completed. To accomplish this, you should run the first two statements in threads 0 and 1, and the third do statement in thread 0 upon completion of both threads.

The way to increase and fully utilize the benefits of parallel processing is to: 1) Write software more intelligently in regards to what can be concurrently executed and what cannot, and 2) Write compilers that can better catch the failures of developers to do step #1 and apply better optimizations to increase the use of parallel processing when applicable.

Yes, I'm well aware of this.  In fact IIRC there's a bit in my OP about this ;)  But in a magical world, being able to use multiple cores to work on one thread would be amazing! :D

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

Okay, then, so how would you improve the performance of a sequential thread by clock speed * number of threads?

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, PianoPlayer88Key said:

Okay, then, so how would you improve the performance of a sequential thread by clock speed * number of threads?

Number of threads has no bearing on sequential problems. The only ways to improve their performance is to increase clock speed and remove inefficiencies like slow memory. RAM is the largest performance bottleneck in modern computers. For some reason the speed of memory hasn't been able to keep pace with improvements in all other areas.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, PianoPlayer88Key said:

Okay, then, so how would you improve the performance of a sequential thread by clock speed * number of threads?

You can't.  Not now, and, (I think) not ever.  It just doesn't work that way.  That's why I said if you could it would be the "holy grail" of computing.  People could just sequential tasks and let the hardware magically tackle it with 16 cores :D  

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

@Ryan_Vickers Ahh, so I guess we just have to do parallel computing then.

@straight_stewie: I thought storage speed was a major bottleneck?  (Especially in the case of my dad's Dell D830 laptop - sure, the Core 2 Duo T7250 isn't all that fast, and having only 2 GB of DDR2-667 RAM isn't as fast as like 128 GB of the new G.Skill or Corsair (whoever makes it) DDR4-4266 RAM, but they're using a spinning hard drive.  I'm guessing you didn't mention storage as a bottleneck because modern computers are expected to have an SSD, right?  But even some SSDs can be faster than others.  An Intel 750 would run circles around, say, a Crucial BX100, wouldn't it?

 

Another thing I'd like to see improved is bootup times from a cold start.  For example, a TI-30 calculator is ready to go within a fraction of a second after hitting the power button.  I'd like to see the same on a PC - from hitting the power button to the Windows desktop being fully ready to go, including post, etc, would be like 0.2 seconds. :)

 

I mentioned Cinebench earlier, and I was just remembering another thing.  Back in the early 1990s, my brother would do fractals on my dad's '286.  On many of them, he'd let it run overnight and part of the next day, cause that's how long it would take to do one fractal.  And, zooming in on it would require just as long to render the new view, iirc.  OTOH, on a more modern CPU like my i7-4790K, it seems like you can do like 15-20, maybe 30fps, instead of 1 frame every 24 hours.

How long might it be before cinebench is fast enough so you can draw the entire scene at like 30fps, instead of one frame every 10 minutes or however long it takes cb to run currently? :)  Would it take about as long as in the Fractals example I gave?

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, PianoPlayer88Key said:

Another thing I'd like to see improved is bootup times from a cold start.  For example, a TI-30 calculator is ready to go within a fraction of a second after hitting the power button.  I'd like to see the same on a PC - from hitting the power button to the Windows desktop being fully ready to go, including post, etc, would be like 0.2 seconds. :)

Well there are two parts to booting.  The POST/BIOS stage, where the motherboard is going through all the devices and doing it's thing, and then booting, which is primarily bottle-necked by your storage speed alone afaik.  It would be interesting to try and make the former faster, but remember that a RAMPAGE EXTREME or something like that is more advanced than a Ti-83 ;)  As for the latter, like I said, faster boot drives will get us there eventually.

2 minutes ago, PianoPlayer88Key said:

I mentioned Cinebench earlier, and I was just remembering another thing.  Back in the early 1990s, my brother would do fractals on my dad's '286.  On many of them, he'd let it run overnight and part of the next day, cause that's how long it would take to do one fractal.  And, zooming in on it would require just as long to render the new view, iirc.  OTOH, on a more modern CPU like my i7-4790K, it seems like you can do like 15-20, maybe 30fps, instead of 1 frame every 24 hours.

How long might it be before cinebench is fast enough so you can draw the entire scene at like 30fps, instead of one frame every 10 minutes or however long it takes cb to run currently? :)  Would it take about as long as in the Fractals example I gave?

If the improvements continue at the rate of doubling the performance every 2 years (which at this point is an unlikely best case scenario) we can calculate this rather easily:

 

1 frame in 10 minutes = 0.00166666666 fps

30 fps = 30 fps (duh :P)

 

30 / 0.00166666666 = 18000

 

So the CPUs will need to be 18000 times faster.  2x = 18000 -> x = 14.135

So they will have to double that many times.  Which means it should take ~28.3 years :)

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, PianoPlayer88Key said:

I thought storage speed was a major bottleneck?  (Especially in the case of my dad's Dell D830 laptop - sure, the Core 2 Duo T7250 isn't all that fast, and having only 2 GB of DDR2-667 RAM isn't as fast as like 128 GB of the new G.Skill or Corsair (whoever makes it) DDR4-4266 RAM, but they're using a spinning hard drive.  I'm guessing you didn't mention storage as a bottleneck because modern computers are expected to have an SSD, right?  But even some SSDs can be faster than others.  An Intel 750 would run circles around, say, a Crucial BX100, wouldn't it?

 

I was ignoring non-volatile storage. Even so, most execution comes from things stored in RAM. Normally, the processor will delegate the task of loading programs from long term storage into RAM to the memory controller, and then continue doing other tasks while it waits. 

 

11 minutes ago, PianoPlayer88Key said:

Another thing I'd like to see improved is bootup times from a cold start.  For example, a TI-30 calculator is ready to go within a fraction of a second after hitting the power button.  I'd like to see the same on a PC - from hitting the power button to the Windows desktop being fully ready to go, including post, etc, would be like 0.2 seconds. :)

 

I mentioned Cinebench earlier, and I was just remembering another thing.  Back in the early 1990s, my brother would do fractals on my dad's '286.  On many of them, he'd let it run overnight and part of the next day, cause that's how long it would take to do one fractal.  And, zooming in on it would require just as long to render the new view, iirc.  OTOH, on a more modern CPU like my i7-4790K, it seems like you can do like 15-20, maybe 30fps, instead of 1 frame every 24 hours.

How long might it be before cinebench is fast enough so you can draw the entire scene at like 30fps, instead of one frame every 10 minutes or however long it takes cb to run currently? :)  Would it take about as long as in the Fractals example I gave?


I really don't know. I have yet to successfully "though-experiment" my way through, or design, a computer that requires a multitasking operating system to function. I have yet to grasp exactly how the boot process works, although I understand the basic steps at face value. 

All that stuff about CineBench and fractals, I have no clue. TBH I'm unclear as to what exactly CineBench is.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×