Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

Mira Yurizaki

Member
  • Content Count

    20,911
  • Joined

  • Last visited

Reputation Activity

  1. Informative
    Mira Yurizaki got a reaction from DGBuildsPCs for a blog entry, Does Making Windows 10 Lean Do Anything For Gaming Performance?   
    There's been some talk about Microsoft implementing a "gaming mode" for Windows which should help improve the performance of games. Or at least, minimize the impact of the OS on gaming performance. What this means is up in the air. However I decided to take a stab at something that could be like it and create a lean build of Windows 10. That is, a lot of its components were either disabled, deleted, or otherwise no longer a factor.
     
    The Setup
    I'm using my computer for this test, which has the following specs:
     
    Core i7-6700
    16GB of RAM running at DDR4-2133
    EVGA GeForce GTX 1080 SC ACX 3.0
    256GB Samsung 950 Pro and 1TB Samsung 850 EVO
     
    I created a 128GB partition on my 850 EVO for this, as I wasn't going to mess with my current build. Then I installed Windows 10 Pro with all of the settings on by default using the Windows 10 build as of somewhere in December 2016. Then I only installed the following:
     
    Intel's chipset drivers
    NVIDIA's drivers, and only the drivers.
    The drivers for the Realtek ALC1150 chip
    3DMark
    Unigine Heaven 4.0
    Steam
     
    The following tests will be used:
     
    3DMark
    Sky Diver (Default settings)
    Fire Strike Extreme (Default settings)
    TimeSpy (Default settings)
    Unigine Heaven
    Extreme preset, with the resolution bumped up to 2560x1440
    Final Fantasy XIV Heavensward Benchmark
    Default settings on the "High Desktop" quality
    GTA V
    2560x1440 resolution
    No MSAA
    All settings on their highest, except reflection MSAA which was not set
    All the advanced graphics setting were set to their highest.
    Only Pass 4 (the final one) will be used for brevity.
    Deus Ex: Mankind Divided
    2560x1440 resolution
    No MSAA
    DX11
    Very high preset
    F1 2016
    2560x1440 resolution
    Ultra high preset
    All tests were run three times and their results averaged.

     
    The Vanilla Build Results
    While I forgot to get a screen cap of RAM usage and processes going on, here's what it's like on my current build:
    Memory Usage:

     
    Processes:


     
    And here are the results:
     
     
    3D Mark           Score Graphics Physics Combined Sky Diver Run 1 35045 70351 11034 23368 Sky Diver Run 2 35046 71077 11003 22992 Sky Diver Run 3 34835 70469 10904 23218 Average 34975.33333 70632.33333 10980.33333 23192.66667           Fire Strike EX Run 1 9588 10643 11682 4765 Fire Strike EX Run 2 9567 10619 11632 4762 Fire Strike EX Run 3 9553 10595 11662 4756 Average 9569.333333 10619 11658.66667 4761           Time Spy Run 1 6702 7229 4627   Time Spy Run 2 6699 7275 4626   Time Spy Run 3 6719 7286 4664   Average 6706.666667 7263.333333 4639             Heaven           Average FPS Score Min FPS Max FPS Run 1 64.7 1629 29.8 143.8 Run 2 65.4 1648 8.5 144.9 Run 3 64.8 1633 28.8 143.7 Average 64.96666667 1636.666667 22.36666667 144.1333333           FFXIV Heavensward           Score Average FPS Total Load Time   Run 1 14591 112.85 18.64   Run 2 14506 112.336 18.773   Run 3 14549 112.808 18.927   Average 14548.66667 112.6646667 18.78             GTAV           Pass 4 Min Pass 4 Max Pass 4 Avg   Run 1 41.248425 153.773926 82.296112   Run 2 36.866974 156.470566 81.178818   Run 3 40.984291 145.447479 75.742538   Average 39.69989667 151.8973237 79.739156             Deus Ex Mankind Divided           Avg Min Max   Run 1 62.4 50.5 77.3   Run 2 62.1 50.5 76.6   Run 3 62.2 50.5 76.9   Average 62.23333333 50.5 76.93333333             F1 2016           Avg Min Max   Run 1 92.3871 73.214142 111.866035   Run 2 95.232292 79.891083 118.127655   Run 3 94.716011 79.444923 116.410423   Average 94.111801 77.516716 115.4680377    
    As a curiosity, I ran F1 2016 again, this time with its priority set to "Realtime". Though it did have better performance, it was only up by 4-5 FPS at most. The other games saw no change so I didn't bother with them.
     
    Making Windows 10 Lean
    One of the first things I did was go to Control Panel\Programs\Programs and Features > Turn Windows features on or off. Then went and removed the following:
    IE11 Media features Microsoft Print to PDF Print and Document Services Remote Differential Compression API support SMB 1.0/CIFS File Sharing Support Windows PowerShell 2.0 Work Folders Client XPS Services XPS Viewer  
    After a reboot, I did the following:
    A pplied a registry hack that while it does nothing for performance, it reduces a lot of wait times for things so it gives the impression performance did improve. In Control Panel\System and Security\System > Advanced System Settings and in the Performance section, set it to "Adjust for best performance." I did not touch the page file, as disabling it may make things worse. In Control Panel\System and Security\Security and Maintenance and disabled SmartScreen. Disabled hibernate In Settings app, did a blanket disable on everything that I didn't need.  
    Now came the question of all those Universal Windows Apps (UWAs), especially the system ones that run in the background all the time. I went to find where they lived, and well, they're spread out quite a bit. The first set is in C:\Program Files\WindowsApps under a super hidden folder that you need to take ownership of before you can even access it. I also had to use WinDirStat to even see the folder. It won't show up in Explorer, even if you disable "Hide system protected files and folders."
     
    The first thing I tried is selecting everything and deleting it, but half the folders wouldn't be deleted for some reason. Rather than try to fight it, I found out I could rename them. So I did:

     
    Except this isn't where all of them live. Some live in C:\Windows\SystemApps. This is a doozy, because if they live here, they must be vital right? Well, good thing they really aren't. Most of them anyway. However, I didn't delete these, I just renamed them.

     
    A few things to note here though:
    ContactSupport_cw5n1h2txyewy is for the Contact Support app Microsoft.MicrosoftEdge_8wekyb3d8bbwe is for Edge Microsoft.Windows.Cortana_cw5n1h2txyewy is for Cortana If you disable Cortana, you also lose Start Menu Search Microsoft.LockApp_cw5n1h2txyewy is for the lock screen ShellExperienceHost_cw5n1h2txyewy is for the Start Menu. Microsoft.XboxGameCallableUI_cw5n1h2txyewy is for the Xbox Live app  
    In here I disabled everything but the LockApp and Start Menu, because you can disable the lock screen in other ways and not having a Start Menu is odd. The task bar will still work though. Make sure that before you do this, unpin any apps from this list. Otherwise, they'll be in limbo where they're pinned.
     
    And lastly, yes there's more, there's one more place to disable UWAs, otherwise they'll automatically run. These live in C:\Users\[username]\AppData\Local\Packages:

     
    Now that the UWAs are out of the way, I went and looked for services and such to disable. To find which ones were safe to do so, I went to Black Viper's website. Anything that was safe to disable was disabled.
     
    And just for kicks, I set the power profile to "High performance."
     
    So What Kind of Performance Does This Get Me Now?
    After letting Windows settle for a bit, I got it down to this:

     


     
    Shaved off nearly 600MB of RAM usage and got the process list down quite a bit. Note that the "Non-paged Pool" is half (~75MB) of what my current build is (~150MB). This is the core operating system components. So I must've done something good there! Right?
     
    Enough of that, what kind of performance are we getting now?
     
     
    3D Mark           Score Graphics Physics Combined Sky Diver Run 1 35089 70950 10993 23352 Sky Diver Run 2 35177 71106 11099 22911 Sky Diver Run 3 34987 70397 10983 23418 Average 35084.33333 70817.66667 11025 23227           Fire Strike EX Run 1 9566 10603 11697 4768 Fire Strike EX Run 2 9577 10634 11696 4750 Fire Strike EX Run 3 9567 10609 11645 4775 Average 9570 10615.33333 11679.33333 4764.333333           Time Spy Run 1 6705 7287 4618   Time Spy Run 2 6689 7267 4614   Time Spy Run 3 6715 7282 4661   Average 6703 7278.666667 4631             Heaven           Average FPS Score Min FPS Max FPS Run 1 66.2 1667 27.1 144.9 Run 2 66 1663 25 145.9 Run 3 66 1663 29.7 146.1 Average 66.06666667 1664.333333 27.26666667 145.6333333           FFXIV Heavensward           Score Average FPS Total Load Time   Run 1 14497 112.431 18.796   Run 2 14550 112.619 18.573   Run 3 14610 113.03 18.791   Average 14552.33333 112.6933333 18.72             GTAV           Pass 4 Min Pass 4 Max Pass 4 Avg   Run 1 20.244783 154.143906 81.087631   Run 2 39.342747 154.573441 82.079002   Run 3 20.863869 115.898499 81.30619   Average 26.817133 141.5386153 81.490941             Deus Ex Mankind Divided           Avg Min Max   Run 1 62.4 50.2 76.9   Run 2 62.1 50.5 76.9   Run 3 62.1 49.8 76.6   Average 62.2 50.16666667 76.8             F1 2016           Avg Min Max   Run 1 97.294609 80.587448 119.962158   Run 2 97.303444 81.302322 118.235237   Run 3 95.821739 80.525665 118.570518   Average 96.80659733 80.805145 118.9226377    
    Er... Almost no change at all. With all of these system resources freed up, wouldn't that mean performance should also go up? Well, not really. Most of what I disabled and turned off either wasn't actively being used to begin with, or they're mostly sitting around, waiting for something to happen, with maybe some background activity going on if it detects you're not doing something. i.e., the priority is lower than normal. And this makes sense. If the OS had components that were actively using the CPU time, then something is wrong. One of the OS's main job is to provide services to user applications when they need it. Those services should be running either on demand or at a very low rate.
     
    But I do want to make a note on the user experience. I think it actually improved. Although I think that had something to do with the registry hack pack I applied and removing animations more so things just taking up too much resources. Boot times were the same, if not feeling a little worse (perhaps due to Windows trying to find applications whose folders I renamed). However, shutting down is practically instant.
     
    Conclusions
    If you have a high-end machine, you can expect almost no performance improvement for making a "lean" Windows build. For lower end machines, it certainly might help, but I think what'll really help is just disabling a lot of the GUI fluff. Most of the applications Windows runs run in the background and are idling most of the time.
     
    I didn't bother reading, I just want the results of Vanilla Vs "Lean" Windows
      Vanilla Windows         Lean Windows         % Diff (Vanilla vs. Lean)       3D Mark                               Score Graphics Physics Combined   Score Graphics Physics Combined   Score Graphics Physics Combined Sky Diver Run 1 35045 70351 11034 23368   35089 70950 10993 23352   99.87460458 99.15574348 100.3729646 100.0685166 Sky Diver Run 2 35046 71077 11003 22992   35177 71106 11099 22911   99.62759758 99.95921582 99.13505721 100.353542 Sky Diver Run 3 34835 70469 10904 23218   34987 70397 10983 23418   99.56555292 100.1022771 99.28070655 99.1459561 Average 34975.33333 70632.33333 10980.33333 23192.66667   35084.33333 70817.66667 11025 23227   99.68925169 99.7390788 99.59624279 99.85600489                               Fire Strike EX Run 1 9588 10643 11682 4765   9566 10603 11697 4768   100.2299812 100.3772517 99.87176199 99.93708054 Fire Strike EX Run 2 9567 10619 11632 4762   9577 10634 11696 4750   99.89558317 99.85894301 99.45280438 100.2526316 Fire Strike EX Run 3 9553 10595 11662 4756   9567 10609 11645 4775   99.85366364 99.86803657 100.1459854 99.60209424 Average 9569.333333 10619 11658.66667 4761   9570 10615.33333 11679.33333 4764.333333   99.993076 100.0347438 99.82351726 99.93060212                               Time Spy Run 1 6702 7229 4627     6705 7287 4618     99.95525727 99.20406203 100.1948896   Time Spy Run 2 6699 7275 4626     6689 7267 4614     100.1494992 100.1100867 100.260078   Time Spy Run 3 6719 7286 4664     6715 7282 4661     100.0595681 100.05493 100.0643639   Average 6706.666667 7263.333333 4639     6703 7278.666667 4631     100.0547749 99.7896929 100.1731105                                 Heaven                               Average FPS Score Min FPS Max FPS   Average FPS Score Min FPS Max FPS   Average FPS Score Min FPS Max FPS Run 1 64.7 1629 29.8 143.8   66.2 1667 27.1 144.9   97.73413897 97.72045591 109.9630996 99.24085576 Run 2 65.4 1648 8.5 144.9   66 1663 25 145.9   99.09090909 99.09801563 34 99.31459904 Run 3 64.8 1633 28.8 143.7   66 1663 29.7 146.1   98.18181818 98.19603127 96.96969697 98.35728953 Average 64.96666667 1636.666667 22.36666667 144.1333333   66.06666667 1664.333333 27.26666667 145.6333333   98.33562208 98.3381676 80.3109322 98.97091478                               FFXIV Heavensward                               Score Average FPS Total Load Time     Score Average FPS Total Load Time     Score Average FPS Total Load Time   Run 1 14591 112.85 18.64     14497 112.431 18.796     100.64841 100.372673 99.17003618   Run 2 14506 112.336 18.773     14550 112.619 18.573     99.6975945 99.74871025 101.076832   Run 3 14549 112.808 18.927     14610 113.03 18.791     99.58247775 99.80359197 100.7237507   Average 14548.66667 112.6646667 18.78     14552.33333 112.6933333 18.72     99.97616076 99.97499175 100.3235396                                 GTAV                               Pass 4 Min Pass 4 Max Pass 4 Avg     Pass 4 Min Pass 4 Max Pass 4 Avg     Pass 4 Min Pass 4 Max Pass 4 Avg   Run 1 41.248425 153.773926 82.296112     20.244783 154.143906 81.087631     203.7484176 99.75997754 101.4903395   Run 2 36.866974 156.470566 81.178818     39.342747 154.573441 82.079002     93.70716793 101.2273292 98.90327127   Run 3 40.984291 145.447479 75.742538     20.863869 115.898499 81.30619     196.4366772 125.4955675 93.1571606   Average 39.69989667 151.8973237 79.739156     26.817133 141.5386153 81.490941     164.6307542 108.8276247 97.85025713                                 Deus Ex Mankind Divided                               Avg Min Max     Avg Min Max     Avg Min Max   Run 1 62.4 50.5 77.3     62.4 50.2 76.9     100 100.5976096 100.520156   Run 2 62.1 50.5 76.6     62.1 50.5 76.9     100 100 99.60988296   Run 3 62.2 50.5 76.9     62.1 49.8 76.6     100.1610306 101.4056225 100.3916449   Average 62.23333333 50.5 76.93333333     62.2 50.16666667 76.8     100.0536769 100.667744 100.1738946                                 F1 2016                               Avg Min Max     Avg Min Max     Avg Min Max   Run 1 92.3871 73.214142 111.866035     97.294609 80.587448 119.962158     94.95603194 90.85055281 93.25110257   Run 2 95.232292 79.891083 118.127655     97.303444 81.302322 118.235237     97.87145047 98.26420825 99.9090102   Run 3 94.716011 79.444923 116.410423     95.821739 80.525665 118.570518     98.84605726 98.65789124 98.17821914   Average 94.111801 77.516716 115.4680377     96.80659733 80.805145 118.9226377     97.22451322 95.92421743 97.11277731    
  2. Like
    Mira Yurizaki got a reaction from revdv for a blog entry, The actual reason why communication standards measure in bits per second, probably   
    When you look at the bandwidth of a communication bus or interface such as USB, SATA, or the speed your ISP advertises they give you, you notice that often times they measure everything in bits per second instead of bytes per second, a figure we're more used to. The common reason we think that companies advertise the bits per second is because it's a larger number. And obviously larger means better to the average consumer. Confusingly as well, the shorthand versions for bandwidth looks similar enough.
     
    Except there's a more likely reason why you see bits per second: in the physical aspect of communication, data isn't always 8-bits.
     
    Let's take for instance every embedded system's favorite communication interface: the humble UART (universal asynchronous receiver/transmitter). The physical interface itself is super simple, at most all you need is two wires (data and ground), though a system may have three (transmit, receive, ground). However, there's three issues:
    How do you know when the start of a data frame (a byte in this case) has started? What if you were sending a binary 0000 0000? If you were using a 0V as binary 0, the line would look flat the entire time so how would you know if you actually are getting data or not? How do you know when to stop receiving data? A UART can be setup to accept a certain amount of data bits per "character," and so it needs to know when to stop receiving data. Do you want some sort of error detection mechanism? To resolve these:
    A bit is used to signal the start of a transmission by being the opposite of what the UART 'rests' at. So if the UART rests at a value of 0, the start bit will be whatever the value of 1 is. A bit (or more) is used to signal the end of a transmission. This is often the opposite value of what the start bit is in order to guarantee at least a voltage transition takes place. A bit can be used for parity, which is 0 or 1 depending if the number of data bits are 1 is even or odd. Note error detection mechanisms are optional. A common UART setting is 8-N-1, or 8 data bits, no parity, 1 stop bit. This means at the minimum there are 10 bits per 8 data bits (the start bit is implied). This can be as high as 13 bits per 9 data bits such as in 9-Y-2 (9 data bits, using parity, 2 stop bits). So if we had a UART in an 8-N-1 configuration, if the UART is transmitting at a rate of 1,000 bits per second, the system is only capable of transferring 800 data bits per second, or an 80% efficiency rating.
     
    Note: Technically it's not proper to express the transmission rate of a UART as "bits per second" but "baud", which is how many times per second the UART can shift its voltage level. In some cases, you may want to use more than one voltage level shift to encode a bit, such as embedding a clock signal. This is used in some encoders like Manchester Code. But often times, baud = bits per second.
     
    Another example is PCIe (before 3.0) and SATA. These use another encoding method known as 8b/10b encoding. In this, 8 bits are encoded over a 10-bit sequence. The main reason for doing this is to achieve something called DC-balance. That is, over time, the average voltage of the signal is 0V. This is important because communication lines often have a capacitor to act as a filter. If the average voltage is higher than 0V over time, it can charge this capacitor to the point where the communication line reaches a voltage that causes issues such as a 0 bit looking like a 1 bit.
     
    In any case, like the UART setting 8-N-1, 8b/10b encoding is 80% efficient.
     
    This is all a long explanation to say the reason why communication lines are expressed in bits per second than bytes per second is bits per second is almost always technically correct, whereas bytes per second is not.
  3. Like
    Mira Yurizaki got a reaction from TopHatProductions115 for a blog entry, The actual reason why communication standards measure in bits per second, probably   
    When you look at the bandwidth of a communication bus or interface such as USB, SATA, or the speed your ISP advertises they give you, you notice that often times they measure everything in bits per second instead of bytes per second, a figure we're more used to. The common reason we think that companies advertise the bits per second is because it's a larger number. And obviously larger means better to the average consumer. Confusingly as well, the shorthand versions for bandwidth looks similar enough.
     
    Except there's a more likely reason why you see bits per second: in the physical aspect of communication, data isn't always 8-bits.
     
    Let's take for instance every embedded system's favorite communication interface: the humble UART (universal asynchronous receiver/transmitter). The physical interface itself is super simple, at most all you need is two wires (data and ground), though a system may have three (transmit, receive, ground). However, there's three issues:
    How do you know when the start of a data frame (a byte in this case) has started? What if you were sending a binary 0000 0000? If you were using a 0V as binary 0, the line would look flat the entire time so how would you know if you actually are getting data or not? How do you know when to stop receiving data? A UART can be setup to accept a certain amount of data bits per "character," and so it needs to know when to stop receiving data. Do you want some sort of error detection mechanism? To resolve these:
    A bit is used to signal the start of a transmission by being the opposite of what the UART 'rests' at. So if the UART rests at a value of 0, the start bit will be whatever the value of 1 is. A bit (or more) is used to signal the end of a transmission. This is often the opposite value of what the start bit is in order to guarantee at least a voltage transition takes place. A bit can be used for parity, which is 0 or 1 depending if the number of data bits are 1 is even or odd. Note error detection mechanisms are optional. A common UART setting is 8-N-1, or 8 data bits, no parity, 1 stop bit. This means at the minimum there are 10 bits per 8 data bits (the start bit is implied). This can be as high as 13 bits per 9 data bits such as in 9-Y-2 (9 data bits, using parity, 2 stop bits). So if we had a UART in an 8-N-1 configuration, if the UART is transmitting at a rate of 1,000 bits per second, the system is only capable of transferring 800 data bits per second, or an 80% efficiency rating.
     
    Note: Technically it's not proper to express the transmission rate of a UART as "bits per second" but "baud", which is how many times per second the UART can shift its voltage level. In some cases, you may want to use more than one voltage level shift to encode a bit, such as embedding a clock signal. This is used in some encoders like Manchester Code. But often times, baud = bits per second.
     
    Another example is PCIe (before 3.0) and SATA. These use another encoding method known as 8b/10b encoding. In this, 8 bits are encoded over a 10-bit sequence. The main reason for doing this is to achieve something called DC-balance. That is, over time, the average voltage of the signal is 0V. This is important because communication lines often have a capacitor to act as a filter. If the average voltage is higher than 0V over time, it can charge this capacitor to the point where the communication line reaches a voltage that causes issues such as a 0 bit looking like a 1 bit.
     
    In any case, like the UART setting 8-N-1, 8b/10b encoding is 80% efficient.
     
    This is all a long explanation to say the reason why communication lines are expressed in bits per second than bytes per second is bits per second is almost always technically correct, whereas bytes per second is not.
  4. Like
    Mira Yurizaki got a reaction from ImNewt for a blog entry, Pet peeves of a software developer   
    As a software developer, I've come across things people say that annoy me, because often it's not the reality:
    Software development is "easy"
    Like any other skill, the easy part isn't because it's actually easy, it's because people built-up experience and skills necessary to simply just do something. Because if it was easy, you, as a layperson, should be able to do it just as easily.
      Software is built from start to finish in one go, e.g.: "Day one patches are dumb"
    In software development land, this is known as the "Waterfall model." In a lot of commercially developed software, this process is almost never used as a whole. It might be used for just the software development process itself, as in, it's in the hands of the people actually churning out code, but as far as a software project goes, other models are used. After all, if you're doing a game and all the concept art and storyboarding is done, those creative people likely aren't going to be the same people coding the game. It might be the case in smaller studios, but not in a AAA studio.

    Typically what's done is some variation of incremental build model or iterative build model, which usually ends up going to Agile software development.

    This is why games have things as day-1 patches or DLC. Before the game can be formally released, it has to go through a validation process. Instead of having the people working on the game sit on their thumbs, why not have them work on stuff in the mean time and release it later? And since patches and DLC often have a less stringent validation process, it can go through much faster.
      Software is released the moment the last line of code is written and the application is built
    Any developer worth their salt will have a process in place once the final build is made. That process involves testing the heck out of the application to make sure it works, that all the things it needs to do are done, that it doesn't break other things in horrible ways. Only once the final build has passed all these checks can it be released.

    Granted it may not feel like this in certain cases, but it's silly to release the final build without doing some sort of testing.
      "But that problem should've been seen a mile away!"
    Have you ever proofread for the hundredth time a paper you wrote and you somehow missed a simple spelling or grammar rule? Same principle applies here.

    This is also on top of some software sources being huge, up to hundreds of thousands to millions of lines of code. Chances are you're not touching every bit of it, but laser focused on only some parts of it. Or so focused on solving one problem, you don't see that there's a problem with another area.

    Or basically it's a similar thing that this video attempts to point out: "How can X have so many problems?"
    This is sort of an umbrella. An example I can think of is Windows Updates. Yes it's become a sort of meme that Windows updates are unreliable and can break your system, but at the same time, Microsoft has to deal with having hundreds of millions of instances of Windows to update, likely with millions of different configurations not just hardware, but software as well. To think that a 100% success rate should be a thing is absurd. Also given the install base of say Windows 10, if we assume the figure given by Microsoft at 900 million "Windows devices" (https://news.microsoft.com/bythenumbers/en/windowsdevices), even a million people affected by the problem is less than 1%. A million people is a lot. < 1% of the userbase isn't.

    Basically, the pool of users Microsoft has to deal with is so large, using the devices uniquely, that they have enough of a sample size such that the probability of any problem coming up is basically 100%.

    You try making software that works on nearly a billion different devices with countless combinations used uniquely in each way without a problem.  
  5. Informative
    Mira Yurizaki got a reaction from ImNewt for a blog entry, List of Guides I've Written   
    A list of guides I posted somewhere on the site, just in case I post more than the 10 URL limit for profiles (plus that'd get wild anyway)
     
     
    A guide to how to identify if you have a CPU bottleneck and see how much it can affect you.
     
    An explanation on HyperThreading.
     
    It also answers the question "Why is it bad to have no page file?"
     
     
     
     
     
     
     
     
     
     
    Not really a guide, but might be helpful
     
     
    Not something I wrote, but I think it's useful to share in this post:
  6. Like
    Mira Yurizaki got a reaction from Results45 for a blog entry, Demystifying Ray Tracing Further   
    With NVIDIA's RTX cards out and the company pushing ray tracing, I figured I have a look around at what I could in the graphics community through blog posts and whatnot about ray tracing itself. Then interacting with the community it seems like there are some misunderstandings and perhaps a warped interpretation of what's going on. So this post is a random assortment of thoughts regarding the encounters of others discussing this topic and what my input is.
     
    Ray tracing describes a type of algorithm, but it's not necessary a specific algorithm
    The first thing I encountered with looking through literature is that what's called "ray tracing" can be vague. Does it describe a specific algorithm such as heap sorts or fast inverse square or does it describe a class of algorithms like selection sorting or binary search algorithm? Or in another way of thinking, does ray tracing describe something like "storage device" or does it describe something like "NAND-based, SATA solid state drive?"
     
    As far as the usage of the term goes, I'm led to believe that ray tracing is describing a type of algorithm. That is, the basic algorithm is shooting some ray out that mimics how a photon is shot out, then tracing it along some path and seeing how it interacts with the world. To that end, I've found several forms of ray tracing that exist:
    Ray Casting: This is the most basic version of ray tracing, where the first thing the ray intersects is what the final output is based on. One could argue this is the basic step of ray tracing in and of itself. Ray Marching: The most common implementation of this is the ray is the path generated by spheres that originate at some point. At the first point, a sphere grows until it hits something, then the the next point of the ray is at the edge of the sphere in the ray's direction. Then another sphere is generated that grows until it hits something, creates a new point at the edge in the direction of the ray, and so on. An object is considered "hit" when the sphere is small enough.

    (Taken from http://jamie-wong.com/2016/07/15/ray-marching-signed-distance-functions/)
    Path Tracing: When someone usually talks about ray tracing without any other context, this is the algorithm they're usually referring to. Path tracing attempts to trace the path of the ray from the camera to a light source. On top of this, each sample point use a ray that's pointed in a random direction. The idea is the more samples you use, the closer you get to the actual image. Some industry folks may consider "ray tracing" itself to be the original algorithm devised by J. Turner Whitted while "path tracing" is the algorithm described by Jim Kajiya.
     
    Ray tracing is also solving a problem with rasterizing
    What rasterized rendering does today is it dumps a ton of information about the scene before proceeding to work on it. One of the first thing it dumps is all of the geometry the camera view cannot see. Next, pieces of the scene are built up one by one. They either add on top of each other right away like in a forward renderer or the parts of it are assembled on the side to be combined later like in a deferred renderer. (https://gamedevelopment.tutsplus.com/articles/forward-rendering-vs-deferred-rendering--gamedev-12342)
     
    One issue with traditional rendering is also the order in which things are rendered. This can lead to weird artifacts like light spilling onto areas where there's no obvious light source, like in these example:

     

     
    By using ray tracing, the rays bring back information of what's visible, what isn't visible, and how light can indirectly affect other objects in a realistic manner.
     
    Real-time ray tracing isn't a relatively new thing for games
    The funny thing is, ray tracing has been used for some time. Some games, like Guerilla Games' Killzone Shadowfall uses ray tracing to do screen-space lighting (slide 84), mostly in reflections and what appears to be ambient occlusion.
  7. Like
    Mira Yurizaki got a reaction from Tech_Dreamer for a blog entry, The Chiplet "Problem" with GPUs   
    UPDATE: I've edited this blog too many times because I always think I'm done, but then another idea comes up. *sigh* But I should be done now.
     
    With AMD's semi-recent announcement of their server processors using the so-called "Chiplet" design, I thought it'd be a good idea to talk about how this could affect other processor types. People have pointed to GPUs being the next logical step, but I've been hesitant to jump on that and this blog is to discuss why.
     
    An Explanation: What is the Chiplet Design?
    To understand the chiplet design, it's useful to understand how many processors are designed today. Typically they're designed using the so-called monolithic approach, where everything about the processor is built onto a single piece of silicon. The following is an example of a quad core design:

     
    Everything going to the processor has to go through an I/O section of the chip. Primarily this handles talking to main memory, but modern processors also have other I/O built in like PCI Express lanes or display compositors (the GPU would be considered a separate thing). From there, it goes through a typically much faster inter-processor bus where the processor's cores talk among each other and through the I/O.
     
    What the chiplet design does is separate the cores and I/O section into different chips.

    The advantage here is that one part of the processor as a whole can break, but the entire processor doesn't have to be thrown away. But it doesn't stop here. As long as the I/O section can support more and more processor core chiplets, then you can expand it out however many you want. Or something like this:

    This is obviously a great design. You need more cores? Just throw on another chiplet!
     
    So what's the problem here with GPUs adopting this? It's the expectations of what each processor is designed to take care of. Their core designs reflect that.
     
    A Comparison Between a CPU Core and a GPU Core
    At the heart of a processing unit of any sort is the "core", which I will define as a processing unit containing a memory interface, a "front-end" containing an instruction decoder and scheduler, and a "back-end" containing the execution units. A CPU core tends to have a complicated front-end and a back-end with a smaller number of execution units, while a GPU tends to have a simpler or smaller front-end with a much larger back-end. To put it visually:
     

    Block Diagram of an AMD Zen 1 CPU Core
     

    Block Diagram of an AMD Fiji GPU Core. Each "ACE" is a Front-End Unit and Each "Shader Engine" is a Back-End Unit
     
    They are designed this way because of the tasks they're expected to complete. A CPU is expected to perform a randomized set of instructions in the best way it can from various tasks with a small amount of data. A GPU is expected to perform a smaller number of instructions, specifically built and ordered, on a large amount of data.
     
    From the previous section about chiplet design, you might be thinking to yourself: "Well can't the Fiji GPU core have the stuff on the left side (HBM + MC) and the right side (Multimedia Accelerators, Eyefinity, CrossFire XDMA, DMA, PCIe Bus Interface) separated into its own chip?" Well let's take a look at what the Fiji GPU die looks like (taken from https://www.guru3d.com/news-story/amd-radeon-r9-fiji-die-shot-photo.html)
     
     

     
    The big part in the middle are all of the ACEs, the Graphics Command Processor, and the Shader Engines from the block diagram. This takes up roughly, if guessing, 72% of the die itself. Not only that, aside from everything on the right side in the block diagram, this GPU core still needs everything from the left side, or all of the HBM and MC parts. Something needs to feed the main bit of the GPU with data and this is a hungry GPU! To put in another way, a two-chiplet design would very similar to the two GPU, single card designs of many years ago, like the R9 Fury Pro Duo:

    But Wouldn't Going to 7nm Solve This Issue?
    While it's tempting to think that smaller nodes means smaller sized dies, the thing is with GPUs, adding more execution units increases its performance because the work it solves is what is known as embarrassingly parallel, or it's trivial to split the work up into more units. It's more pixels per second to crunch. This isn't the case with the CPU, where instructions are almost never guaranteed to be orderly and predictable, the basic ingredient for parallel tasks. So while adding more transistors per CPU core hasn't always been viable, it has been for GPUs and so the average die size of a GPU hasn't gone down as transistors get smaller:

    Transistor count, die size, and fabrication process for the highest-end GPU of a generation for AMD GPUs (Data sourced from Wikipedia)
     
    Since AMD has had weird moments, let's take a look at its competitor, NVIDIA:

    Transistor count, die size, and fabrication process for the highest-end* GPU of a generation for NVIDIA GPUs (Data sourced from Wikipedia)
     
    Notes:
    G92 is considered it's own generation due to being in two video card series GTX 280 and GTX 285 were included due to being the same GPU but with a die shrink TITANs were not included since the Ti version is more recognizable and are the same GPU  
    But the trend is the same: the average die size for the GPUs has remained fairly level.
     
    Unfortunately transistor count for processors isn't straight-forward like it is for GPUs. Over the years, processors have integrated more and more things into it. So we can't even compare say an AMD Bulldozer transistor count to an AMD Ryzen transistor count due to Ryzen integrating more features like extra PCIe lanes, the entirety of what used to be "Northbridge", among other things. However, with that in mind, it's still nice to see some data to see where overall things have gotten:

    Transistor count, die size, and fabrication process for various processors (Data from Wikipedia)
     
    One just has to keep in mind that at various points, processors started to integrate more features that aren't related to the front-end, back-end, or memory interface, so processors from that point on may actually have a lower transistor count and thus die-size.
     
    How about separating the front-end from the back end?
    This is a problem because the front-end needs to know how to allocate its resources, which is the back end. This introduces latency due to the increased distance and overhead because of the constant need to figure out what exactly is going on. To put it in another way, is it more efficient to have your immediate supervisor in a building across town or in the same building as you work in? Plus the front-end doesn't take up a lot of space on the GPU anyway.
     
    What About Making Smaller GPUs?
    So instead of making large GPUs with a ton of execution units, why not build smaller GPUs and use those as the chiplets? As an example, let's take NVIDIA's GTX 1080:

     
    Compare this to the GTX 1050/1050 Ti (left) and the GT 1030 (right):
      
     
    With this, you could take away the memory and PCI Express controllers and move them to an I/O chip, and just duplicate the rest as many times as you want. Except now you have SLI, which has its problems that need to be addressed.
     
    The Problem with Multi-GPU Rendering
    The idea of multi-GPU rendering is simple: break up the work equally and have each GPU work on the scene. If it's "embarrassingly" easy to break up the rendering task, wouldn't this be a good idea? Well, it depends on what's really being worked on. For example, let's take this scene:

    Approximate difficulty to render this scene: Green = Easy, Yellow = Medium, Red = Hard
     
    The areas are color coded more or less to approximate the "difficulty" of rendering it. How would you divide this up evenly so that every GPU has an equal workload? Let's say we have four GPU chiplets.
     
    Obviously splitting this scene up into quadrants won't work because one of the chiplets will be burdened by the large amount of red in the top right while another will be sitting around doing nothing at all taking care of the top left. And because you can't composite the entire image without everything being done, the GPU taking care of the top right portion will be the bottleneck. Another option may be to have each chiplet in succession work on a frame. Though this may be an issue with more chiplets as you can't exactly render ahead too far and this sort of rendering is what causes microstuttering in multi-GPU systems. Lastly, we could have the chiplets render the entire scene at a reduced resolution, but offset a bit. Or divvy this entire scene by say alternating pixels. This could potentially minimize an imbalance of workload, but someone still has to composite the final image and there could be a lot of data passing back and forth between the chiplets, possibly increasing bandwidth requirements more than necessary. This is also not including another aspect that GPUs have taken on lately: general compute tasks. And then there's the question of VR, which is sensitive to latency.
     
    Ultimately the problem with graphics rendering is that it's time sensitive. Whereas tasks for CPUs often have the luxury of "it's done when it's done" and the pieces of data they're working on are independent from beginning to end, graphics rendering doesn't enjoy the same luxuries. Graphics rendering is "the sooner you get it done, the better" and "everyone's writing to the same frame buffer"
     
    What about DirectX 12 and Vulkan's multi-GPU support?
    With the advent of DirectX 12 and (possibly) Vulkan adding effective multi-GPU support, we may be able overcome the issues described above. However, that requires developer support and not everyone's on board with either API. You may want them to be, but a lot of game developers would probably rather worry more on getting their game done than optimizing it for performance, sadly to say.
     
    Plus it would present issues for backwards compatibility. Up until this point, we've had games designed around the idea of a single GPU and only sometimes more than one. And while some games may perform well enough on multiple GPUs, many others won't, and running those older games on a chiplet design may result in terrible performance. You could relieve this issue perhaps by using tools like NVIDIA Inspector to create a custom SLI profile. But to do this for every game would get old fast. Technology is supposed to help make our lives better, and that certainly won't.
     
    But who knows? Maybe We'll Get Something Yet
    Only time will tell though if this design will work with GPUs, but I'm not entirely hopeful given the issues.
  8. Like
    Mira Yurizaki got a reaction from Levent for a blog entry, Demystifying Ray Tracing Further   
    With NVIDIA's RTX cards out and the company pushing ray tracing, I figured I have a look around at what I could in the graphics community through blog posts and whatnot about ray tracing itself. Then interacting with the community it seems like there are some misunderstandings and perhaps a warped interpretation of what's going on. So this post is a random assortment of thoughts regarding the encounters of others discussing this topic and what my input is.
     
    Ray tracing describes a type of algorithm, but it's not necessary a specific algorithm
    The first thing I encountered with looking through literature is that what's called "ray tracing" can be vague. Does it describe a specific algorithm such as heap sorts or fast inverse square or does it describe a class of algorithms like selection sorting or binary search algorithm? Or in another way of thinking, does ray tracing describe something like "storage device" or does it describe something like "NAND-based, SATA solid state drive?"
     
    As far as the usage of the term goes, I'm led to believe that ray tracing is describing a type of algorithm. That is, the basic algorithm is shooting some ray out that mimics how a photon is shot out, then tracing it along some path and seeing how it interacts with the world. To that end, I've found several forms of ray tracing that exist:
    Ray Casting: This is the most basic version of ray tracing, where the first thing the ray intersects is what the final output is based on. One could argue this is the basic step of ray tracing in and of itself. Ray Marching: The most common implementation of this is the ray is the path generated by spheres that originate at some point. At the first point, a sphere grows until it hits something, then the the next point of the ray is at the edge of the sphere in the ray's direction. Then another sphere is generated that grows until it hits something, creates a new point at the edge in the direction of the ray, and so on. An object is considered "hit" when the sphere is small enough.

    (Taken from http://jamie-wong.com/2016/07/15/ray-marching-signed-distance-functions/)
    Path Tracing: When someone usually talks about ray tracing without any other context, this is the algorithm they're usually referring to. Path tracing attempts to trace the path of the ray from the camera to a light source. On top of this, each sample point use a ray that's pointed in a random direction. The idea is the more samples you use, the closer you get to the actual image. Some industry folks may consider "ray tracing" itself to be the original algorithm devised by J. Turner Whitted while "path tracing" is the algorithm described by Jim Kajiya.
     
    Ray tracing is also solving a problem with rasterizing
    What rasterized rendering does today is it dumps a ton of information about the scene before proceeding to work on it. One of the first thing it dumps is all of the geometry the camera view cannot see. Next, pieces of the scene are built up one by one. They either add on top of each other right away like in a forward renderer or the parts of it are assembled on the side to be combined later like in a deferred renderer. (https://gamedevelopment.tutsplus.com/articles/forward-rendering-vs-deferred-rendering--gamedev-12342)
     
    One issue with traditional rendering is also the order in which things are rendered. This can lead to weird artifacts like light spilling onto areas where there's no obvious light source, like in these example:

     

     
    By using ray tracing, the rays bring back information of what's visible, what isn't visible, and how light can indirectly affect other objects in a realistic manner.
     
    Real-time ray tracing isn't a relatively new thing for games
    The funny thing is, ray tracing has been used for some time. Some games, like Guerilla Games' Killzone Shadowfall uses ray tracing to do screen-space lighting (slide 84), mostly in reflections and what appears to be ambient occlusion.
  9. Like
    Mira Yurizaki got a reaction from Zando Bob for a blog entry, Demystifying Ray Tracing Further   
    With NVIDIA's RTX cards out and the company pushing ray tracing, I figured I have a look around at what I could in the graphics community through blog posts and whatnot about ray tracing itself. Then interacting with the community it seems like there are some misunderstandings and perhaps a warped interpretation of what's going on. So this post is a random assortment of thoughts regarding the encounters of others discussing this topic and what my input is.
     
    Ray tracing describes a type of algorithm, but it's not necessary a specific algorithm
    The first thing I encountered with looking through literature is that what's called "ray tracing" can be vague. Does it describe a specific algorithm such as heap sorts or fast inverse square or does it describe a class of algorithms like selection sorting or binary search algorithm? Or in another way of thinking, does ray tracing describe something like "storage device" or does it describe something like "NAND-based, SATA solid state drive?"
     
    As far as the usage of the term goes, I'm led to believe that ray tracing is describing a type of algorithm. That is, the basic algorithm is shooting some ray out that mimics how a photon is shot out, then tracing it along some path and seeing how it interacts with the world. To that end, I've found several forms of ray tracing that exist:
    Ray Casting: This is the most basic version of ray tracing, where the first thing the ray intersects is what the final output is based on. One could argue this is the basic step of ray tracing in and of itself. Ray Marching: The most common implementation of this is the ray is the path generated by spheres that originate at some point. At the first point, a sphere grows until it hits something, then the the next point of the ray is at the edge of the sphere in the ray's direction. Then another sphere is generated that grows until it hits something, creates a new point at the edge in the direction of the ray, and so on. An object is considered "hit" when the sphere is small enough.

    (Taken from http://jamie-wong.com/2016/07/15/ray-marching-signed-distance-functions/)
    Path Tracing: When someone usually talks about ray tracing without any other context, this is the algorithm they're usually referring to. Path tracing attempts to trace the path of the ray from the camera to a light source. On top of this, each sample point use a ray that's pointed in a random direction. The idea is the more samples you use, the closer you get to the actual image. Some industry folks may consider "ray tracing" itself to be the original algorithm devised by J. Turner Whitted while "path tracing" is the algorithm described by Jim Kajiya.
     
    Ray tracing is also solving a problem with rasterizing
    What rasterized rendering does today is it dumps a ton of information about the scene before proceeding to work on it. One of the first thing it dumps is all of the geometry the camera view cannot see. Next, pieces of the scene are built up one by one. They either add on top of each other right away like in a forward renderer or the parts of it are assembled on the side to be combined later like in a deferred renderer. (https://gamedevelopment.tutsplus.com/articles/forward-rendering-vs-deferred-rendering--gamedev-12342)
     
    One issue with traditional rendering is also the order in which things are rendered. This can lead to weird artifacts like light spilling onto areas where there's no obvious light source, like in these example:

     

     
    By using ray tracing, the rays bring back information of what's visible, what isn't visible, and how light can indirectly affect other objects in a realistic manner.
     
    Real-time ray tracing isn't a relatively new thing for games
    The funny thing is, ray tracing has been used for some time. Some games, like Guerilla Games' Killzone Shadowfall uses ray tracing to do screen-space lighting (slide 84), mostly in reflections and what appears to be ambient occlusion.
  10. Like
    Mira Yurizaki got a reaction from Arika S for a blog entry, Yet another AMA   
    I've been stewing on this for a while (and I kind of didn't want to stomp on @Arika S's) but I figured... why not go for it? So here's the AMA if you want to ask me a question, any question! Yes you can ask anything and I will answer. But just so you're aware of the "rules" about this:
    The answer that I give may not be the answer you want. Until it gets added to this post, I'll accept the question. But once the question is added to this post, I will ignore future repeats of that question.  
    If you want to ask me something, you have the following options:
    Via PM Commenting directly on this blog post Happy asking away!
  11. Like
    Mira Yurizaki got a reaction from LukeSavenije for a blog entry, How does the CPU/GPU bottleneck work?   
    The title might be a little strange to anyone who's remotely familiar with performance bottlenecks. But rather than try to explain things on a higher level, where all of the CPU and GPU usage comparisons are done, this explains on a lower level. That is, not only the what is going on, but why it happens.
    How Do Performance Bottlenecks work?
    To understand how performance bottlenecks, particularly for games, it's important to understand the general flow of games from a programming standpoint. Taken in its simplest form, the steps to running a game are:
    Process game logic
    Render image
    Of course, we can expand this out to be more detailed:
    Process inputs
    Update the game's state (like weather or something)
    Process AI
    Process physics
    Process audio
    Render image
    The one notable thing is rendering the image is one of the last steps. Why? Because the output image represents the current state of the game world. It doesn't make much sense to display an older state of the game world. However to get to the point where the game gets to the render image state, the CPU needs to process the previous steps (though physics may be offloaded to the GPU). This means if this portion of processing takes up too much time, this limits the maximum number of frames that can be rendered in a second. For example, if these steps take 2ms to complete, then expected maximum frame rate is 500 FPS. But if these steps take 30ms to complete, then the expected maximum frame rate is about 30 FPS.
     
    The Issue of Game Flow: Game time vs. Real time
    If a developer plans on having a game run on multiple systems, there's a fundamental problem: how do you make sure the game appears to run at the same speed no matter the hardware? That is, how do you get game time must match real time regardless of hardware? If you design a game such that each loop step is 10ms in real time, then you need to make sure the hardware runs the loop in 10 ms or less, otherwise game time will creep away from real time. Likewise, if the hardware can process the game in less than 10ms, you need to make sure the processor doesn't immediately work on the next state of the game world. Otherwise game time will be faster than real-time.
    To do that, developers find ways of syncing the game so it matches up with real time.
     
    Unrestricted Processing (i.e., no syncing)
    This runs the game without pausing to sync up to real time. While it's simple, this means if the game isn't running on a system it was designed for, game time will never match up to real time. Early DOS games used this method.

     
    This chart shows a comparison of an "ideal timeline" where the designer wanted 1 frame to be 100ms in real time. The next timeline is when the game is run on a faster system, and so it completes intervals faster. This results in more frames being pushed out and now game time is faster than real time. That is, in the unrestricted timeline, 1.7 seconds of game time has passed, but is being squeezed into 1 second of real time. The result is that the game runs faster than real-time
     
    Fixed Interval Processing, With an Image Rendered Every Tick
    The loop is run at a fixed interval. If the CPU is done early, the CPU idles for the rest of the interval. However, if the CPU takes too long, processing spills into the next interval to be completed, and then it idles. Note that the image is not rendered until the CPU is done with its work. If the CPU is late, the GPU simply displays the last frame it rendered.

    In this chart, we have a scenario where the CPU took too long to process game logic and so it spills into the next interval. If a frame is meant to represent 100ms of game time, this scenario completed 8 frames, resulting in a game time of 0.8s over a real-time period of 1s. The result is the game runs slower in real-time. Note: this is not how V-Sync works. V-Sync is a forced synchronization on the GPU. That is, the GPU will render the frame anyway, but will wait until the display is done showing the previous frame before presenting it.
     
    Chances are for 8-bit and 16-bit systems, if it isn't using unrestricted time syncing, it's using this. A convenient source of a time interval is the screen's refresh rate. Modern game consoles and other fixed-configuration hardware may also still use this because it's still easy to implement. If such a game gets ported to the PC and its time syncing wasn't updated, this can cause issues if a 60FPS patch is applied.
     
    Here's a video showing how the SNES used this method of syncing:
     
    Variable Intervals
    Instead of demanding a fixed interval, why not have an interval that's determined based on the performance of the CPU? While this can guarantee that game-time is the same as real-time, it presents a problem: now the game's physics isn't consistent. The thing with physics is that a lot of formulas examine a change over time. For example, velocity is position over time. This means that if have two different intervals where things are updated, you'll have two different outcomes.


     
    Say for example we have an object traveling at 10m/s. If we have two intervals, one 100ms (or 1m per tick) and the other 50ms (or 0.5m per tick), the object will be in the same place at any time as long as nothing happens to the object. But let's say the object is about to impact a wall and the collision detection assumes if the object either touches or is "inside" of the wall by the time of the next interval, it's a collision. Depending on where the object is from the wall and the wall's thickness, the object in the longer interval game may appear to have traveled right through the wall because of where it ends up on the next interval.
     
    Another issue is that because physics are calculated using floating point numbers, its inherit errors will compound readily with more calculations. So this means that the faster interval game may come to a different number because it's calculating numbers that have accumulated more errors.
     
    Essentially, the physics and interaction of the game are no longer predictable. This has obvious issues in multiplayer, though it can also change how single player game mechanics work.
     
    Fixed Intervals, but Drop Frames if the CPU Needs to Catch Up
    The game is run in a fixed interval, but instead of requiring that the GPU renders an image after every game tick, if the CPU is late, don't tell the GPU to render a frame and instead use the free time to catch up. Once the CPU is caught up, then allow the GPU to render the next image. The idea is that the CPU should be able to catch up at some point due to the variable load during the game and this load is not making the game always in the "CPU needs to catch up" phase. This allows systems of varying performance to run the game while keeping up with real time.

    Modern game engines these days use this method since it allows for stability and determinism, while allowing flexibility of rendering times. For the most part, the interval comes from a timer so that when it expires, servicing it becomes a high priority.
     
    Looking at a few scenarios: What happens when the GPU can complete its task faster or slower relative to when the CPU does?
    The following will be looking at a few scenarios on when game-time ticks are processed and when frames are rendered. One thing to keep in mind is that a rendering queue is used so that if the CPU does take a while, the GPU can at least render something until the CPU is less busy. You might know this option as "render ahead" or similar. With a longer render queue, rendering can be smooth at the expense of latency. With a shorter queue, latency is much shorter but the game may stutter if the CPU can't keep up.
     
    With that in mind, the charts used have the following characteristics:
    Processing a game tick or rendering a frame is represented by a color. The colors are supposed to match up with each other. If the CPU or GPU is not doing anything, this time is represented by a gray area.
    Assume the render queue can hold 3 commands.
    For the queue representation, frames enter from the right and go to the left. They exit to the left as well:

    The render queue will be shown at each interval, rather than when the CPU finishes up processing a game tick.
    CPU can complete its work within an interval, GPU is slower
    This is a case where the CPU can complete its work without spilling over into the next interval, but the GPU takes much longer to generate a frame.

     
     

    Note that 10 game-ticks were generated, but only 5 frames were rendered. The queue also at the end filled up and since the GPU couldn't get to the next frame, the queue had to drop a frame. In this case, the second purple frame was queued up but it had to be dropped at the end since the GPU could not get to it fast enough.
     
    This is also why the GPU cannot bottleneck the CPU. The CPU still processes the subsequent game ticks without waiting for the GPU to be done. However, if Fixed Interval Processing while forcing the game to render an image every tick is used, then the GPU can bottleneck the CPU. But since most PC games don’t use that method, we can assume it’s not an issue.
     
    CPU can complete its work within an interval, GPU is faster
    In this case, the GPU can easily render a frame, so the GPU is ready to render a new frame as soon as the current game tick is processed.

    (Note: technically the queue would be empty the entire time)
     
    This is not a CPU bottleneck condition however, as the game is designed around a fixed interval. Some game developers may design their game loop so it runs at a much faster interval than 60Hz so that high-end GPUs don't have idle time like this. But if the GPU can keep up with this and the interval is a lower frequency, then performance can be smooth or stuttering, depending on the CPU processing times.
     
    Some games may allow the CPU to generate GPU commands to render “in-between” frames and use the time between the render command and the last game tick to represent how much to move objects. Note that these extra frames are superfluous, meaning your actions during them have no impact on the state of the game itself until the next game-tick.
     

    CPU cannot complete its work within an interval, GPU is faster
    In this scenario, the CPU has issues keeping up with processing the game logic, so it drops the frame in favor of catching up. This has the effect of not queuing up a frame in the first place and the GPU is stuck repeating the last frame it rendered. However, when the CPU catches up and queues up a frame, it's representing the last game tick.

     

    (Note: technically the queue would be empty the entire time)
    In this case, because the first green frame took too long, it doesn't get queued and so the GPU continues to show the yellow frame. The CPU catches up on the first red frame which the GPU will render. A similar thing happens on the game tick of the second yellow frame.
     
    Out of the other scenarios, this one is probably the least favorable one. Notice how frames can be clumped up with longer pauses between them. This causes the feeling of the game stuttering.
     
    CPU cannot complete its work within an interval, GPU is slower
    This is a case where the CPU has trouble completing every tick within an interval and the GPU has issues rendering a frame within an interval as well:


     
    This scenario, depending on how fast the frame rate actually is, may not be as bad as it looks, as it spreads the frames out over time.
     
    Further Reading
    For further reading, I picked up most of this information from:
    https://bell0bytes.eu/the-game-loop/
    https://gameprogrammingpatterns.com/game-loop.html
    https://docs.microsoft.com/en-us/windows/desktop/api/dxgi/nf-dxgi-idxgidevice1-setmaximumframelatency
     
    https://www.reddit.com/r/nvidia/comments/821n66/maximum_prerendered_frames_what_to_set_it_to/?depth=2
  12. Informative
    Mira Yurizaki got a reaction from revdv for a blog entry, How does the CPU/GPU bottleneck work?   
    The title might be a little strange to anyone who's remotely familiar with performance bottlenecks. But rather than try to explain things on a higher level, where all of the CPU and GPU usage comparisons are done, this explains on a lower level. That is, not only the what is going on, but why it happens.
    How Do Performance Bottlenecks work?
    To understand how performance bottlenecks, particularly for games, it's important to understand the general flow of games from a programming standpoint. Taken in its simplest form, the steps to running a game are:
    Process game logic
    Render image
    Of course, we can expand this out to be more detailed:
    Process inputs
    Update the game's state (like weather or something)
    Process AI
    Process physics
    Process audio
    Render image
    The one notable thing is rendering the image is one of the last steps. Why? Because the output image represents the current state of the game world. It doesn't make much sense to display an older state of the game world. However to get to the point where the game gets to the render image state, the CPU needs to process the previous steps (though physics may be offloaded to the GPU). This means if this portion of processing takes up too much time, this limits the maximum number of frames that can be rendered in a second. For example, if these steps take 2ms to complete, then expected maximum frame rate is 500 FPS. But if these steps take 30ms to complete, then the expected maximum frame rate is about 30 FPS.
     
    The Issue of Game Flow: Game time vs. Real time
    If a developer plans on having a game run on multiple systems, there's a fundamental problem: how do you make sure the game appears to run at the same speed no matter the hardware? That is, how do you get game time must match real time regardless of hardware? If you design a game such that each loop step is 10ms in real time, then you need to make sure the hardware runs the loop in 10 ms or less, otherwise game time will creep away from real time. Likewise, if the hardware can process the game in less than 10ms, you need to make sure the processor doesn't immediately work on the next state of the game world. Otherwise game time will be faster than real-time.
    To do that, developers find ways of syncing the game so it matches up with real time.
     
    Unrestricted Processing (i.e., no syncing)
    This runs the game without pausing to sync up to real time. While it's simple, this means if the game isn't running on a system it was designed for, game time will never match up to real time. Early DOS games used this method.

     
    This chart shows a comparison of an "ideal timeline" where the designer wanted 1 frame to be 100ms in real time. The next timeline is when the game is run on a faster system, and so it completes intervals faster. This results in more frames being pushed out and now game time is faster than real time. That is, in the unrestricted timeline, 1.7 seconds of game time has passed, but is being squeezed into 1 second of real time. The result is that the game runs faster than real-time
     
    Fixed Interval Processing, With an Image Rendered Every Tick
    The loop is run at a fixed interval. If the CPU is done early, the CPU idles for the rest of the interval. However, if the CPU takes too long, processing spills into the next interval to be completed, and then it idles. Note that the image is not rendered until the CPU is done with its work. If the CPU is late, the GPU simply displays the last frame it rendered.

    In this chart, we have a scenario where the CPU took too long to process game logic and so it spills into the next interval. If a frame is meant to represent 100ms of game time, this scenario completed 8 frames, resulting in a game time of 0.8s over a real-time period of 1s. The result is the game runs slower in real-time. Note: this is not how V-Sync works. V-Sync is a forced synchronization on the GPU. That is, the GPU will render the frame anyway, but will wait until the display is done showing the previous frame before presenting it.
     
    Chances are for 8-bit and 16-bit systems, if it isn't using unrestricted time syncing, it's using this. A convenient source of a time interval is the screen's refresh rate. Modern game consoles and other fixed-configuration hardware may also still use this because it's still easy to implement. If such a game gets ported to the PC and its time syncing wasn't updated, this can cause issues if a 60FPS patch is applied.
     
    Here's a video showing how the SNES used this method of syncing:
     
    Variable Intervals
    Instead of demanding a fixed interval, why not have an interval that's determined based on the performance of the CPU? While this can guarantee that game-time is the same as real-time, it presents a problem: now the game's physics isn't consistent. The thing with physics is that a lot of formulas examine a change over time. For example, velocity is position over time. This means that if have two different intervals where things are updated, you'll have two different outcomes.


     
    Say for example we have an object traveling at 10m/s. If we have two intervals, one 100ms (or 1m per tick) and the other 50ms (or 0.5m per tick), the object will be in the same place at any time as long as nothing happens to the object. But let's say the object is about to impact a wall and the collision detection assumes if the object either touches or is "inside" of the wall by the time of the next interval, it's a collision. Depending on where the object is from the wall and the wall's thickness, the object in the longer interval game may appear to have traveled right through the wall because of where it ends up on the next interval.
     
    Another issue is that because physics are calculated using floating point numbers, its inherit errors will compound readily with more calculations. So this means that the faster interval game may come to a different number because it's calculating numbers that have accumulated more errors.
     
    Essentially, the physics and interaction of the game are no longer predictable. This has obvious issues in multiplayer, though it can also change how single player game mechanics work.
     
    Fixed Intervals, but Drop Frames if the CPU Needs to Catch Up
    The game is run in a fixed interval, but instead of requiring that the GPU renders an image after every game tick, if the CPU is late, don't tell the GPU to render a frame and instead use the free time to catch up. Once the CPU is caught up, then allow the GPU to render the next image. The idea is that the CPU should be able to catch up at some point due to the variable load during the game and this load is not making the game always in the "CPU needs to catch up" phase. This allows systems of varying performance to run the game while keeping up with real time.

    Modern game engines these days use this method since it allows for stability and determinism, while allowing flexibility of rendering times. For the most part, the interval comes from a timer so that when it expires, servicing it becomes a high priority.
     
    Looking at a few scenarios: What happens when the GPU can complete its task faster or slower relative to when the CPU does?
    The following will be looking at a few scenarios on when game-time ticks are processed and when frames are rendered. One thing to keep in mind is that a rendering queue is used so that if the CPU does take a while, the GPU can at least render something until the CPU is less busy. You might know this option as "render ahead" or similar. With a longer render queue, rendering can be smooth at the expense of latency. With a shorter queue, latency is much shorter but the game may stutter if the CPU can't keep up.
     
    With that in mind, the charts used have the following characteristics:
    Processing a game tick or rendering a frame is represented by a color. The colors are supposed to match up with each other. If the CPU or GPU is not doing anything, this time is represented by a gray area.
    Assume the render queue can hold 3 commands.
    For the queue representation, frames enter from the right and go to the left. They exit to the left as well:

    The render queue will be shown at each interval, rather than when the CPU finishes up processing a game tick.
    CPU can complete its work within an interval, GPU is slower
    This is a case where the CPU can complete its work without spilling over into the next interval, but the GPU takes much longer to generate a frame.

     
     

    Note that 10 game-ticks were generated, but only 5 frames were rendered. The queue also at the end filled up and since the GPU couldn't get to the next frame, the queue had to drop a frame. In this case, the second purple frame was queued up but it had to be dropped at the end since the GPU could not get to it fast enough.
     
    This is also why the GPU cannot bottleneck the CPU. The CPU still processes the subsequent game ticks without waiting for the GPU to be done. However, if Fixed Interval Processing while forcing the game to render an image every tick is used, then the GPU can bottleneck the CPU. But since most PC games don’t use that method, we can assume it’s not an issue.
     
    CPU can complete its work within an interval, GPU is faster
    In this case, the GPU can easily render a frame, so the GPU is ready to render a new frame as soon as the current game tick is processed.

    (Note: technically the queue would be empty the entire time)
     
    This is not a CPU bottleneck condition however, as the game is designed around a fixed interval. Some game developers may design their game loop so it runs at a much faster interval than 60Hz so that high-end GPUs don't have idle time like this. But if the GPU can keep up with this and the interval is a lower frequency, then performance can be smooth or stuttering, depending on the CPU processing times.
     
    Some games may allow the CPU to generate GPU commands to render “in-between” frames and use the time between the render command and the last game tick to represent how much to move objects. Note that these extra frames are superfluous, meaning your actions during them have no impact on the state of the game itself until the next game-tick.
     

    CPU cannot complete its work within an interval, GPU is faster
    In this scenario, the CPU has issues keeping up with processing the game logic, so it drops the frame in favor of catching up. This has the effect of not queuing up a frame in the first place and the GPU is stuck repeating the last frame it rendered. However, when the CPU catches up and queues up a frame, it's representing the last game tick.

     

    (Note: technically the queue would be empty the entire time)
    In this case, because the first green frame took too long, it doesn't get queued and so the GPU continues to show the yellow frame. The CPU catches up on the first red frame which the GPU will render. A similar thing happens on the game tick of the second yellow frame.
     
    Out of the other scenarios, this one is probably the least favorable one. Notice how frames can be clumped up with longer pauses between them. This causes the feeling of the game stuttering.
     
    CPU cannot complete its work within an interval, GPU is slower
    This is a case where the CPU has trouble completing every tick within an interval and the GPU has issues rendering a frame within an interval as well:


     
    This scenario, depending on how fast the frame rate actually is, may not be as bad as it looks, as it spreads the frames out over time.
     
    Further Reading
    For further reading, I picked up most of this information from:
    https://bell0bytes.eu/the-game-loop/
    https://gameprogrammingpatterns.com/game-loop.html
    https://docs.microsoft.com/en-us/windows/desktop/api/dxgi/nf-dxgi-idxgidevice1-setmaximumframelatency
     
    https://www.reddit.com/r/nvidia/comments/821n66/maximum_prerendered_frames_what_to_set_it_to/?depth=2
  13. Like
    Mira Yurizaki got a reaction from r2724r16 for a blog entry, List of Guides I've Written   
    A list of guides I posted somewhere on the site, just in case I post more than the 10 URL limit for profiles (plus that'd get wild anyway)
     
     
    A guide to how to identify if you have a CPU bottleneck and see how much it can affect you.
     
    An explanation on HyperThreading.
     
    It also answers the question "Why is it bad to have no page file?"
     
     
     
     
     
     
     
     
     
     
    Not really a guide, but might be helpful
     
     
    Not something I wrote, but I think it's useful to share in this post:
  14. Informative
    Mira Yurizaki got a reaction from De-Wohli for a blog entry, "What programming language should I start off with?"   
    This is a frequently asked question from people who are curious about programming. So here's the short answer: it depends.
     
    In my experience with various programming languages such as assembly, C, C++, C#, Java, JavaScript, Python, TI BASIC, Visual Basic, Ruby, Bash scripting, Windows Batch scripting, and even Brainfuck (though this was a curiosity), the language itself doesn't really matter. Over time you learn a lot of things that can carry over to other languages, and you find that a lot of languages have basic characteristics. There are other characteristics that can help aid in making applications, but there's nothing that, without anything else taken into consideration, makes one "better" than the other. A programming language I'd call is an implementation detail. Meaning, it doesn't matter what language you use, you can probably realize what you want.
     
    But you've shown an interest in programming, and obviously you need a programming language to start this journey of learning how to code! So for the sake of putting down a language, what should you learn? Well to ask another question: what are you interested in doing? This will help narrow down what you should focus on because certain categories of applications prefer one language over another for arbitrary reasons. For example, want to get into web app development? Start learning HTML, CSS, and JavaScript. You may not have to use the last two, but it certainly will help. Want to get into Android app programming? Start with Java. iOS app programming? Swift. Windows app programming? C#. Don't know? Just pick a language and go from there.
     
    However, if you're fresh to programming, I would argue not to care so much about the nuances of the language. I'd argue that any language worth its salt will allow you to do the following:
    Create symbols (or names) to represent data Freely operate on that data using basic math and bit-wise operations Allow for conditional control, using things like or similar to if-statements and loops Allow for controlling where the program can go, using things like function calls or jumps And many widely used programming languages have these features.
     
    Okay, may be you're still wracked with decision paralysis. If you were to ask me which one to use to start off your journey into the world of programming, and I'm sure I'll draw the ire of a few people, I would have to go with Python. Why? A couple of reasons.
     
    The first is the tool chain, so to speak. I don't believe the journey to programming should start off with the entire process of building the program. It's nice to know, but anything that gets in the way of the person jumping right into coding adds resistance. They likely just want to code and see results right away which is encouraging and can build even more curiosity. While you can select another language that has an IDE, those can be quite intimidating to tread through. You could argue "if they get into programming, they should be using industry standard tools to leverage the experience if they want to make this into a job", okay. But that's like telling a kid who's interested in cinematography to start with a $30,000 RED camera so they can get experience on the industry standard, not their smartphone because "who makes serious professional films using a smart phone?"
     
    I digress though. So what makes Python's tool chain great for beginners? To start, it has an interpreter. This makes it much quicker to jump into programming than say using C. If all you want to do is print the standard "Hello World!", this is literally what you have to do in Python:
    Open a command line Type in python to start the interpreter Type in print("Hello world!") in the prompt Doing the same thing in C would easily take twice as many steps. Whether you think so or not, this can be intimidating for some people. And if you go "well if this intimidates them, then they shouldn't be programming." Well going back to the cinematography example, if buying a $30,000 RED camera is intimidating or even getting something like a beefy computer with one of the widely used video editing software, should they stop pursuing their dreams?
     
    And when you're ready to move onto making Python files, you don't need to do anything different. It's just invoking python and the file in question.
     
    Secondly Python's multi-paradigm flexibility allows you to adjust how you want to code. You can start off being procedural which I argue is quite intuitive. If you want to do object oriented programming (OOP), Python supports that. You can group functionality into modules. There's no memory management to think about. Data types, while important to know, don't have to be explicitly defined. When I started working with Python, I was surprised how easy it was to work with and get something done.
     
    However, Python isn't a perfect language. No language is. It has its downsides as well
    Python is an interpreted language. So if performance is your goal, Python isn't for you. However, I'd argue while the pursuit of performance is fine, prove your app works before making performance your goal. While Python doesn't require you to explicitly say what a variable is (known as dynamic typing), it can cause some trip-ups if you're not careful at best and you may not even know what type of data the variable is supposed to handle at worst (though if you need to, just have Python spit out the variable's type). And since Python doesn't check the data type until the script is running, you may run into issues where you're trying to do something with two different data types, like adding a number to a string, and the thing throws an error and stops as a result. The way Python handles certain OOP concepts is not intuitive, but I'd argue you shouldn't be touching OOP until you've done some reasonably complex apps. But to get the basics down, Python offers a fairly low bar of entry. And once you have the basics down, then you can move onto more advanced topics and other languages.
     
    At the end of the day though: The language doesn't matter, what's important is to know the basics that programming in general requires. However, if you have a goal in mind of what applications you want to do, it might be better to start learning the languages used in that field first.
  15. Informative
    Mira Yurizaki got a reaction from revdv for a blog entry, "What programming language should I start off with?"   
    This is a frequently asked question from people who are curious about programming. So here's the short answer: it depends.
     
    In my experience with various programming languages such as assembly, C, C++, C#, Java, JavaScript, Python, TI BASIC, Visual Basic, Ruby, Bash scripting, Windows Batch scripting, and even Brainfuck (though this was a curiosity), the language itself doesn't really matter. Over time you learn a lot of things that can carry over to other languages, and you find that a lot of languages have basic characteristics. There are other characteristics that can help aid in making applications, but there's nothing that, without anything else taken into consideration, makes one "better" than the other. A programming language I'd call is an implementation detail. Meaning, it doesn't matter what language you use, you can probably realize what you want.
     
    But you've shown an interest in programming, and obviously you need a programming language to start this journey of learning how to code! So for the sake of putting down a language, what should you learn? Well to ask another question: what are you interested in doing? This will help narrow down what you should focus on because certain categories of applications prefer one language over another for arbitrary reasons. For example, want to get into web app development? Start learning HTML, CSS, and JavaScript. You may not have to use the last two, but it certainly will help. Want to get into Android app programming? Start with Java. iOS app programming? Swift. Windows app programming? C#. Don't know? Just pick a language and go from there.
     
    However, if you're fresh to programming, I would argue not to care so much about the nuances of the language. I'd argue that any language worth its salt will allow you to do the following:
    Create symbols (or names) to represent data Freely operate on that data using basic math and bit-wise operations Allow for conditional control, using things like or similar to if-statements and loops Allow for controlling where the program can go, using things like function calls or jumps And many widely used programming languages have these features.
     
    Okay, may be you're still wracked with decision paralysis. If you were to ask me which one to use to start off your journey into the world of programming, and I'm sure I'll draw the ire of a few people, I would have to go with Python. Why? A couple of reasons.
     
    The first is the tool chain, so to speak. I don't believe the journey to programming should start off with the entire process of building the program. It's nice to know, but anything that gets in the way of the person jumping right into coding adds resistance. They likely just want to code and see results right away which is encouraging and can build even more curiosity. While you can select another language that has an IDE, those can be quite intimidating to tread through. You could argue "if they get into programming, they should be using industry standard tools to leverage the experience if they want to make this into a job", okay. But that's like telling a kid who's interested in cinematography to start with a $30,000 RED camera so they can get experience on the industry standard, not their smartphone because "who makes serious professional films using a smart phone?"
     
    I digress though. So what makes Python's tool chain great for beginners? To start, it has an interpreter. This makes it much quicker to jump into programming than say using C. If all you want to do is print the standard "Hello World!", this is literally what you have to do in Python:
    Open a command line Type in python to start the interpreter Type in print("Hello world!") in the prompt Doing the same thing in C would easily take twice as many steps. Whether you think so or not, this can be intimidating for some people. And if you go "well if this intimidates them, then they shouldn't be programming." Well going back to the cinematography example, if buying a $30,000 RED camera is intimidating or even getting something like a beefy computer with one of the widely used video editing software, should they stop pursuing their dreams?
     
    And when you're ready to move onto making Python files, you don't need to do anything different. It's just invoking python and the file in question.
     
    Secondly Python's multi-paradigm flexibility allows you to adjust how you want to code. You can start off being procedural which I argue is quite intuitive. If you want to do object oriented programming (OOP), Python supports that. You can group functionality into modules. There's no memory management to think about. Data types, while important to know, don't have to be explicitly defined. When I started working with Python, I was surprised how easy it was to work with and get something done.
     
    However, Python isn't a perfect language. No language is. It has its downsides as well
    Python is an interpreted language. So if performance is your goal, Python isn't for you. However, I'd argue while the pursuit of performance is fine, prove your app works before making performance your goal. While Python doesn't require you to explicitly say what a variable is (known as dynamic typing), it can cause some trip-ups if you're not careful at best and you may not even know what type of data the variable is supposed to handle at worst (though if you need to, just have Python spit out the variable's type). And since Python doesn't check the data type until the script is running, you may run into issues where you're trying to do something with two different data types, like adding a number to a string, and the thing throws an error and stops as a result. The way Python handles certain OOP concepts is not intuitive, but I'd argue you shouldn't be touching OOP until you've done some reasonably complex apps. But to get the basics down, Python offers a fairly low bar of entry. And once you have the basics down, then you can move onto more advanced topics and other languages.
     
    At the end of the day though: The language doesn't matter, what's important is to know the basics that programming in general requires. However, if you have a goal in mind of what applications you want to do, it might be better to start learning the languages used in that field first.
  16. Like
    Mira Yurizaki got a reaction from TopHatProductions115 for a blog entry, Process for gathering video card data when analyzing application behavior, pt. 2   
    There was one bit I should add from this last blog:
    Once you've gathered up the data, how do you use it? The biggest trouble with PerfMon in the way it gathers GPU data is, while it can do per-process data gathering, it doesn't actually capture the process you're interested in unless it's already running. While that's fine and all for observing it without logging, creating a Data Collector Set that captures the process is impossible. PerfMon uses Process IDs, or PIDs, and the game's PID will change every time its run. So you're forced in PerfMon to capture data from every process using the GPU at the start of the capture.
     
    This presents a problem, because what you end getting is something that looks like this:

     
    Only some of this data is actually useful, so we have to figure out which ones aren't. However in this case, there's a semi-obvious choice: the solid green line going up and down during the benchmark. If you hover over it with the mouse, it tells you what it is:

     
    This helps narrow down the PID  of the game and what you can do to filter out the results. In this case, the PID is 3892. If we filter out everything in the "Instance" column for 3892, we get:

     
    And you can doubly check to make sure it's the game by checking GPU memory usage. Now remove every other "Instance" that isn't for PID 3892 to clean up the data. Once you're done with that, you can right click on the graph, select "Save Data As...", select "Text File (Comma Delimited)" as the file type, and save. Now you can use your favorite spreadsheet application of choice to process this data.
  17. Informative
    Mira Yurizaki got a reaction from CommandMan7 for a blog entry, Process for gathering video card data when analyzing application behavior   
    Since I've been doing some tests lately involving how applications use the video card, I thought I'd write down the process of gathering this data and presenting it. After all, any sufficiently "scientific" test should be repeatable by others, and being repeatable means knowing what to do!
     
    What data am I gathering?
    CPU Utilization
    This is to see how the CPU is being used by the application. The higher the usage overall, the more likely it is to bottleneck the GPU. I may omit this altogether if I'm not interested in CPU utilization. GPU engine usage
    A "GPU engine" is something that Microsoft calls a part of the GPU that handles a certain task. Which engines are available depends on the GPU manufacturer. The two I'm primarily interested in are the graphics and compute engines, because these two will show how the execution portions of the GPU are being used by the application. This can only be used for informational purposes, i.e., there is no "lower/higher is better" value. VRAM usage
    Since Windows Vista, Microsoft implemented virtual memory on a system level. This allows me to look at three elements: Committed VRAM (how much was requested to be reserved), Dedicated VRAM usage (how much is used on the video card itself), and Shared VRAM usage (which is VRAM usage in system memory). Like GPU engine usage, this can only be used for informational purposes. Frame Time
    This is the amount of time between frames. As long as VSync or frame limiting is not used, this should represent how long it took to render the frame. The inverse of this is what I call "instantaneous FPS," which is the time between the current and last frame normalized over a second. I call this "instantaneous" since FPS would require counting all of the frames in a second. What data am I not gathering?
    I'm not looking at temperatures, clock speeds, and fan speeds. These are aspects of hardware that don't reflect how the application is using it.
     
    What tools am I using?
    Performance Monitor (PerfMon)
    PerfMon gathers CPU utilization, GPU engine usage, and VRAM usage. Other tools like GPU-Z and MSI Afterburner cannot gather this data, at least with respect to the specific aspects I'm looking for. The other thing is that PerfMon can gather data per-application. Meaning the data I gather is specifically from the application in question, rather than on a system wide level. FRAPS
    While FRAPS is old (the last update was in 2013) and the overlay no longer seems to work in DX12 applications, its benchmark functionality still works. This allows me to gather data about frame times. Note that FRAPS counts a frame as when one of the display buffers flips. This poses a limitation when VSync is enabled but the application is not triple buffered or when frame rate limiting is used. How do I use these tools?
    PerfMon takes some setting up:
    Open it by going to Control Panel -> All items -> Administrative Tools -> Performance Monitor. Open it as an Administrator, otherwise you won't be able to do the other steps. Select "Data Collector Sets" in the left pane Right click  "User Defined" in the right pane and select New -> Data Collector Set In the wizard that pops up, name the Data Collector Set, choose "Create manually (Advanced)" In the next page, select "Create data logs" and check off "Performance counter" In the next page, click on the "Add..." button, the select the following: GPU Engine -> Utilization for All Instances GPU Memory -> Committed, Dedicated, and Shared memory for All Instances If doing CPU utilization, select Processor -> "% Processor Time" for All Instances The next page will ask where you want to save these logs When you want to start the data collection, select the one you created and on the tool bar on the top, press the green triangle. To stop collecting data, press the black square. Note: PerfMon gathers GPU data by the apps using the GPU that are currently running when the collection starts. If the app isn't running and you start data collecting, it won't gather data for that app. To open the log, go to where you said to save the data and double click on it. The data collected for each app is by process ID. Unless you figured this out ahead of time, the best way I've found to find it is to plot all of the 3D or graphics engines and see which one looks like the process ID of the app. Then I sort by name, then remove the data from the other process IDs. Once the data has been filtered, right click on the graph and select "Save Data" Save it as a "Text File - Comma Separated Values (CSV)" Once you have the data in a CSV format, you should be able to manipulate this data using spreadsheet apps like Microsoft Excel or Open/Libre Office Calc.
     
    FRAPS requires pressing F11, or whatever the benchmark hotkey is, to start then pressing it again to stop. FRAPS saves the data as CSV. The items of interest are frame times and MinMaxAvg data. Frame times do require additional work as FRAPS records the timestamp in milliseconds from the start of the run rather than the time between frames.
     
    What other tools did I consider and why weren't they used?
    EVGA Precision X
    Polls system wide stats. Also, while it it has a frame rate counter, it samples it over the period which can mask hiccups (and it's likely based on the inverse of FPS). While higher sampling rates can be used, I noticed this adds a significant use to the GPU. GPU-Z
    Polls system wide stats. MSI Afterburner
    Polls system wide stats. May also have the same issues as EVGA Precision X.
  18. Informative
    Mira Yurizaki got a reaction from aiden_mcmeme for a blog entry, "What programming language should I start off with?"   
    This is a frequently asked question from people who are curious about programming. So here's the short answer: it depends.
     
    In my experience with various programming languages such as assembly, C, C++, C#, Java, JavaScript, Python, TI BASIC, Visual Basic, Ruby, Bash scripting, Windows Batch scripting, and even Brainfuck (though this was a curiosity), the language itself doesn't really matter. Over time you learn a lot of things that can carry over to other languages, and you find that a lot of languages have basic characteristics. There are other characteristics that can help aid in making applications, but there's nothing that, without anything else taken into consideration, makes one "better" than the other. A programming language I'd call is an implementation detail. Meaning, it doesn't matter what language you use, you can probably realize what you want.
     
    But you've shown an interest in programming, and obviously you need a programming language to start this journey of learning how to code! So for the sake of putting down a language, what should you learn? Well to ask another question: what are you interested in doing? This will help narrow down what you should focus on because certain categories of applications prefer one language over another for arbitrary reasons. For example, want to get into web app development? Start learning HTML, CSS, and JavaScript. You may not have to use the last two, but it certainly will help. Want to get into Android app programming? Start with Java. iOS app programming? Swift. Windows app programming? C#. Don't know? Just pick a language and go from there.
     
    However, if you're fresh to programming, I would argue not to care so much about the nuances of the language. I'd argue that any language worth its salt will allow you to do the following:
    Create symbols (or names) to represent data Freely operate on that data using basic math and bit-wise operations Allow for conditional control, using things like or similar to if-statements and loops Allow for controlling where the program can go, using things like function calls or jumps And many widely used programming languages have these features.
     
    Okay, may be you're still wracked with decision paralysis. If you were to ask me which one to use to start off your journey into the world of programming, and I'm sure I'll draw the ire of a few people, I would have to go with Python. Why? A couple of reasons.
     
    The first is the tool chain, so to speak. I don't believe the journey to programming should start off with the entire process of building the program. It's nice to know, but anything that gets in the way of the person jumping right into coding adds resistance. They likely just want to code and see results right away which is encouraging and can build even more curiosity. While you can select another language that has an IDE, those can be quite intimidating to tread through. You could argue "if they get into programming, they should be using industry standard tools to leverage the experience if they want to make this into a job", okay. But that's like telling a kid who's interested in cinematography to start with a $30,000 RED camera so they can get experience on the industry standard, not their smartphone because "who makes serious professional films using a smart phone?"
     
    I digress though. So what makes Python's tool chain great for beginners? To start, it has an interpreter. This makes it much quicker to jump into programming than say using C. If all you want to do is print the standard "Hello World!", this is literally what you have to do in Python:
    Open a command line Type in python to start the interpreter Type in print("Hello world!") in the prompt Doing the same thing in C would easily take twice as many steps. Whether you think so or not, this can be intimidating for some people. And if you go "well if this intimidates them, then they shouldn't be programming." Well going back to the cinematography example, if buying a $30,000 RED camera is intimidating or even getting something like a beefy computer with one of the widely used video editing software, should they stop pursuing their dreams?
     
    And when you're ready to move onto making Python files, you don't need to do anything different. It's just invoking python and the file in question.
     
    Secondly Python's multi-paradigm flexibility allows you to adjust how you want to code. You can start off being procedural which I argue is quite intuitive. If you want to do object oriented programming (OOP), Python supports that. You can group functionality into modules. There's no memory management to think about. Data types, while important to know, don't have to be explicitly defined. When I started working with Python, I was surprised how easy it was to work with and get something done.
     
    However, Python isn't a perfect language. No language is. It has its downsides as well
    Python is an interpreted language. So if performance is your goal, Python isn't for you. However, I'd argue while the pursuit of performance is fine, prove your app works before making performance your goal. While Python doesn't require you to explicitly say what a variable is (known as dynamic typing), it can cause some trip-ups if you're not careful at best and you may not even know what type of data the variable is supposed to handle at worst (though if you need to, just have Python spit out the variable's type). And since Python doesn't check the data type until the script is running, you may run into issues where you're trying to do something with two different data types, like adding a number to a string, and the thing throws an error and stops as a result. The way Python handles certain OOP concepts is not intuitive, but I'd argue you shouldn't be touching OOP until you've done some reasonably complex apps. But to get the basics down, Python offers a fairly low bar of entry. And once you have the basics down, then you can move onto more advanced topics and other languages.
     
    At the end of the day though: The language doesn't matter, what's important is to know the basics that programming in general requires. However, if you have a goal in mind of what applications you want to do, it might be better to start learning the languages used in that field first.
  19. Like
    Mira Yurizaki got a reaction from [REDACTED] for a blog entry, The Chiplet "Problem" with GPUs   
    UPDATE: I've edited this blog too many times because I always think I'm done, but then another idea comes up. *sigh* But I should be done now.
     
    With AMD's semi-recent announcement of their server processors using the so-called "Chiplet" design, I thought it'd be a good idea to talk about how this could affect other processor types. People have pointed to GPUs being the next logical step, but I've been hesitant to jump on that and this blog is to discuss why.
     
    An Explanation: What is the Chiplet Design?
    To understand the chiplet design, it's useful to understand how many processors are designed today. Typically they're designed using the so-called monolithic approach, where everything about the processor is built onto a single piece of silicon. The following is an example of a quad core design:

     
    Everything going to the processor has to go through an I/O section of the chip. Primarily this handles talking to main memory, but modern processors also have other I/O built in like PCI Express lanes or display compositors (the GPU would be considered a separate thing). From there, it goes through a typically much faster inter-processor bus where the processor's cores talk among each other and through the I/O.
     
    What the chiplet design does is separate the cores and I/O section into different chips.

    The advantage here is that one part of the processor as a whole can break, but the entire processor doesn't have to be thrown away. But it doesn't stop here. As long as the I/O section can support more and more processor core chiplets, then you can expand it out however many you want. Or something like this:

    This is obviously a great design. You need more cores? Just throw on another chiplet!
     
    So what's the problem here with GPUs adopting this? It's the expectations of what each processor is designed to take care of. Their core designs reflect that.
     
    A Comparison Between a CPU Core and a GPU Core
    At the heart of a processing unit of any sort is the "core", which I will define as a processing unit containing a memory interface, a "front-end" containing an instruction decoder and scheduler, and a "back-end" containing the execution units. A CPU core tends to have a complicated front-end and a back-end with a smaller number of execution units, while a GPU tends to have a simpler or smaller front-end with a much larger back-end. To put it visually:
     

    Block Diagram of an AMD Zen 1 CPU Core
     

    Block Diagram of an AMD Fiji GPU Core. Each "ACE" is a Front-End Unit and Each "Shader Engine" is a Back-End Unit
     
    They are designed this way because of the tasks they're expected to complete. A CPU is expected to perform a randomized set of instructions in the best way it can from various tasks with a small amount of data. A GPU is expected to perform a smaller number of instructions, specifically built and ordered, on a large amount of data.
     
    From the previous section about chiplet design, you might be thinking to yourself: "Well can't the Fiji GPU core have the stuff on the left side (HBM + MC) and the right side (Multimedia Accelerators, Eyefinity, CrossFire XDMA, DMA, PCIe Bus Interface) separated into its own chip?" Well let's take a look at what the Fiji GPU die looks like (taken from https://www.guru3d.com/news-story/amd-radeon-r9-fiji-die-shot-photo.html)
     
     

     
    The big part in the middle are all of the ACEs, the Graphics Command Processor, and the Shader Engines from the block diagram. This takes up roughly, if guessing, 72% of the die itself. Not only that, aside from everything on the right side in the block diagram, this GPU core still needs everything from the left side, or all of the HBM and MC parts. Something needs to feed the main bit of the GPU with data and this is a hungry GPU! To put in another way, a two-chiplet design would very similar to the two GPU, single card designs of many years ago, like the R9 Fury Pro Duo:

    But Wouldn't Going to 7nm Solve This Issue?
    While it's tempting to think that smaller nodes means smaller sized dies, the thing is with GPUs, adding more execution units increases its performance because the work it solves is what is known as embarrassingly parallel, or it's trivial to split the work up into more units. It's more pixels per second to crunch. This isn't the case with the CPU, where instructions are almost never guaranteed to be orderly and predictable, the basic ingredient for parallel tasks. So while adding more transistors per CPU core hasn't always been viable, it has been for GPUs and so the average die size of a GPU hasn't gone down as transistors get smaller:

    Transistor count, die size, and fabrication process for the highest-end GPU of a generation for AMD GPUs (Data sourced from Wikipedia)
     
    Since AMD has had weird moments, let's take a look at its competitor, NVIDIA:

    Transistor count, die size, and fabrication process for the highest-end* GPU of a generation for NVIDIA GPUs (Data sourced from Wikipedia)
     
    Notes:
    G92 is considered it's own generation due to being in two video card series GTX 280 and GTX 285 were included due to being the same GPU but with a die shrink TITANs were not included since the Ti version is more recognizable and are the same GPU  
    But the trend is the same: the average die size for the GPUs has remained fairly level.
     
    Unfortunately transistor count for processors isn't straight-forward like it is for GPUs. Over the years, processors have integrated more and more things into it. So we can't even compare say an AMD Bulldozer transistor count to an AMD Ryzen transistor count due to Ryzen integrating more features like extra PCIe lanes, the entirety of what used to be "Northbridge", among other things. However, with that in mind, it's still nice to see some data to see where overall things have gotten:

    Transistor count, die size, and fabrication process for various processors (Data from Wikipedia)
     
    One just has to keep in mind that at various points, processors started to integrate more features that aren't related to the front-end, back-end, or memory interface, so processors from that point on may actually have a lower transistor count and thus die-size.
     
    How about separating the front-end from the back end?
    This is a problem because the front-end needs to know how to allocate its resources, which is the back end. This introduces latency due to the increased distance and overhead because of the constant need to figure out what exactly is going on. To put it in another way, is it more efficient to have your immediate supervisor in a building across town or in the same building as you work in? Plus the front-end doesn't take up a lot of space on the GPU anyway.
     
    What About Making Smaller GPUs?
    So instead of making large GPUs with a ton of execution units, why not build smaller GPUs and use those as the chiplets? As an example, let's take NVIDIA's GTX 1080:

     
    Compare this to the GTX 1050/1050 Ti (left) and the GT 1030 (right):
      
     
    With this, you could take away the memory and PCI Express controllers and move them to an I/O chip, and just duplicate the rest as many times as you want. Except now you have SLI, which has its problems that need to be addressed.
     
    The Problem with Multi-GPU Rendering
    The idea of multi-GPU rendering is simple: break up the work equally and have each GPU work on the scene. If it's "embarrassingly" easy to break up the rendering task, wouldn't this be a good idea? Well, it depends on what's really being worked on. For example, let's take this scene:

    Approximate difficulty to render this scene: Green = Easy, Yellow = Medium, Red = Hard
     
    The areas are color coded more or less to approximate the "difficulty" of rendering it. How would you divide this up evenly so that every GPU has an equal workload? Let's say we have four GPU chiplets.
     
    Obviously splitting this scene up into quadrants won't work because one of the chiplets will be burdened by the large amount of red in the top right while another will be sitting around doing nothing at all taking care of the top left. And because you can't composite the entire image without everything being done, the GPU taking care of the top right portion will be the bottleneck. Another option may be to have each chiplet in succession work on a frame. Though this may be an issue with more chiplets as you can't exactly render ahead too far and this sort of rendering is what causes microstuttering in multi-GPU systems. Lastly, we could have the chiplets render the entire scene at a reduced resolution, but offset a bit. Or divvy this entire scene by say alternating pixels. This could potentially minimize an imbalance of workload, but someone still has to composite the final image and there could be a lot of data passing back and forth between the chiplets, possibly increasing bandwidth requirements more than necessary. This is also not including another aspect that GPUs have taken on lately: general compute tasks. And then there's the question of VR, which is sensitive to latency.
     
    Ultimately the problem with graphics rendering is that it's time sensitive. Whereas tasks for CPUs often have the luxury of "it's done when it's done" and the pieces of data they're working on are independent from beginning to end, graphics rendering doesn't enjoy the same luxuries. Graphics rendering is "the sooner you get it done, the better" and "everyone's writing to the same frame buffer"
     
    What about DirectX 12 and Vulkan's multi-GPU support?
    With the advent of DirectX 12 and (possibly) Vulkan adding effective multi-GPU support, we may be able overcome the issues described above. However, that requires developer support and not everyone's on board with either API. You may want them to be, but a lot of game developers would probably rather worry more on getting their game done than optimizing it for performance, sadly to say.
     
    Plus it would present issues for backwards compatibility. Up until this point, we've had games designed around the idea of a single GPU and only sometimes more than one. And while some games may perform well enough on multiple GPUs, many others won't, and running those older games on a chiplet design may result in terrible performance. You could relieve this issue perhaps by using tools like NVIDIA Inspector to create a custom SLI profile. But to do this for every game would get old fast. Technology is supposed to help make our lives better, and that certainly won't.
     
    But who knows? Maybe We'll Get Something Yet
    Only time will tell though if this design will work with GPUs, but I'm not entirely hopeful given the issues.
  20. Like
    Mira Yurizaki got a reaction from [REDACTED] for a blog entry, [RQB] VRAM usage may not be what you think it is   
    Note: This is a copypasta of a reply I did to a topic.
     
    I think the VRAM thing is more complicated than "[Game] uses X amount of VRAM, therefore, you need more than X amount of VRAM these days" for performance. I've been reading on the interwebs from people that games will request more VRAM than they actually need and they may never use it, much like how apps may overshoot how much memory they need (yes this is actually a thing: https://blogs.msdn.microsoft.com/oldnewthing/20091002-00/?p=16513). Though in a lot of cases where I've seen VRAM usage, the game tends to use the same amount regardless of VRAM. e.g., a game uses around 4.5GB of VRAM regardless if there's 6GB, 8GB, or 16GB.
     
    So what about the case where a game uses roughly the same amount of VRAM regardless and there isn't enough? I'm not convinced there's a huge issue here. So here's an example (from https://www.techspot.com/article/1600-far-cry-5-benchmarks/ )
     

     
    Given that Far Cry 5 uses around 3GB of VRAM at 1080p, this may not be much of an interesting result to look at but for the record:
     
    Now the 1440p and 4K benchmarks should be more interesting. Clearly Far Cry 5 will use more VRAM than the GTX 1060 3GB has
    Yet strangely enough, performance, even the min FPS results, aren't tanking hard and is remaining in line with the expected performance delta from the GTX 1060 6GB. In fact, even the GT 1030 is only seeing a linear drop-off in performance despite having 2GB of VRAM (1440p is basically 2x 1080p, and 2160p is 4x 1080p)
     
    And for all the research I'm willing to do, I came across a PcPer article interviewing one of NVIDIA's VPs of engineering, with the most interesting bit being:
     
    tl;dr, VRAM usage may not actually be indicative of any actual requirement of what's needed.
  21. Like
    Mira Yurizaki got a reaction from TopHatProductions115 for a blog entry, Why are physics engines tied to frame rates?   
    I came across the news about Fallout 76 has a problem: you can edit a value in one of the ini files which affects the frame rate, and then the physics engine is tied to that, with the end result is players are able to move faster simply by rendering more frames. Obviously this is a problem but why do developers design the physics engines around a frame rate? You need to have a rate of change. Frame rate is a convenient source of what the rate is.
     
    A lot of equations in physics are over time. Meaning they measure something that happens between two points in time. Take velocity: it's the distance between two points in space divided by the time it took for something to travel that distance. Acceleration is the rate at which an object changes its velocity between two points in time. The problem with physics simulations in games (and perhaps in general) is that everything being calculated is being calculated for an instant of time. You might be able to know the input parameters, like how fast an object was going, but you won't know if the velocity will change in the future even if you knew all of the factors that are affecting it because you need one more thing: how long is the object going to be subjected to these effects? Or to put it in another way it's like asking this: I have an object going 5 meters per second, it's experiencing a acceleration of -1 m/s^2, what's its velocity going to be? I don't know, because I don't know how long this object will be experiencing said acceleration.
     
    What Bethesda is doing though is they likely have a reasonably frequent enough rate at which the physics simulation is run at. But they also cap the frame rate to that rate as well. This may also cap the input sampling rate. So why would they do this? Because rendering faster than the rate at which the physics are simulated would mean rendering extra frames that don't add any new information. This may have been a design choice, because other developers don't seem to care like what Ubisoft does with Assassin's Creed:
    The cloth runs at "half frame rate" because the physics engine for this isn't ran as fast the graphics rendering is, and so you have this weird effect with the cloth seems to move at a disjointed rate.
     
    So when you uncap the frame rate in a Bethesda game, you're effectively telling the game to run the physics engine at a faster rate, which is great!... except for the fact that if everything else about the game was designed around this one value and affects how everything else behaves. Really the solution to Bethesda's problem is to not make this value changeable in the first place. Or I guess, you know, design a physics engine that isn't dependent on frame rates.
  22. Like
    Mira Yurizaki got a reaction from TopHatProductions115 for a blog entry, My general software development axioms   
    An assortment of random axioms that I try to think about when doing software development work. The reason why I hold to these usually falls into one of these reasons:
    Making the most efficient use of my time Minimal need to maintain the code (hopefully, maybe) Ease of maintaining the code Ease of reading and understanding what the code does Finding the best way to make efficient software, without tramping on the above Regarding the software development process
    The priorities of development
    I stick to these priorities:
    Making the app easy to maintain If the app isn't easy to maintain, then it makes everything harder to do. And this is a very broad category, covering things like what tools I use, designing, and implementation details Making the app do what it's supposed to If the app doesn't do what it's supposed to do, then why bother using it? Making the app perform as best as it can Now some people who aren't developers looking into this may go "how come performance isn't your priority? Shouldn't you always make the app perform fast?" In particular say for games. To me, it doesn't matter if the game can get 300 FPS while looking good on potato hardware, if it crashes all the time, has bugs, or isn't, you know, fun, then I couldn't care less because it's not doing what it's supposed to do. And if it's a nightmare to maintain, then good luck fixing the problems.
     
    Find the tools that you can work with, but only find as many as necessary
    Having an excellent set of tools does wonders for productive software development. The tools need to be ones that you can work with depending on your preference. But only get the ones that you need and no more. This creates noise in your toolchain and whenever you setup your development environment, the fewer tools you need to install and configure, the faster you can get to doing the actual work. And before you think about adding a tool to your toolbox in a more "permanent" fashion, think about why you need it, how has it helped you, and how often does it help you.
     
    For example, the bare minimum toolset I would l like for a given project is:
    A text editor with code syntax highlighting and line number viewing on the side. Bonus points if it checks syntax and for issues, but that's not necessary A diff tool to see the differences between versions of source code and possibly merge them A version control tool of some sort. Some people may want more. Some people may want less. And some people use the bare minimum for these (if they do it). For example, The 8-Bit Guy for Planet X3 development uses a basic text editor (akin to Notepad) for coding and his "version control" is copying and pasting the project folder once in a while. He doesn't mention a diff tool, but he likely hasn't had a need for one.
     
    Regarding software design
    Figure out what needs to be done first
    The app has a purpose, figure out what that purpose is and what needs to be done to achieve it first. If you don't know what the purpose of the app is, then what's the point in cranking out code for it?
     
    Design from the top-down, but build from the bottom-up
    Do you design a skyscraper by first figuring out what kind of foundation you want before working on the shape you want it to have or do you design the skyscraper from the shape that you want it to have, then work your way down to the foundation? Or to put it more closer to geekiness, if you want to build a smartphone, do you first design the phone, its features, and what you'd like it to do, then work your way down to what kind of hardware it should have, or do you decide what hardware it should have, then work on the design of the phone, its features, etc.?
     
    Granted you could go either way, but the problem with designing from the bottom-up is the bottom houses details aren't oriented to solving the problem of designing the thing in question. Those details are oriented to implementing it. Using the phone example, if you find out too early the hardware doesn't actually work the way you thought it did, then you'll be at square one. If you design from the top-down first, you end up with a concept that is independent of the implementation so that if you find out the hardware doesn't actually work, then the amount of work thrown away is relatively smaller.
     
    So where does the "build from the bottom-up" come from? In a lot of cases, applications need a foundation and infrastructure to work with. By building from the bottom up, you create a platform in which makes developing the rest of the application easier. Using the smartphone example, once you're done designing and selecting the hardware, you need to first build the hardware, then, generally speaking, build the firmware, OS, and drivers before you can start working on the application itself.
     
    Another way of saying this as well from say a web app point of view, build from the back-end to the front end.
     
    Design principles are fine to learn, but should be understood before putting them to practice
    A design principle in software is always good to learn, but you shouldn't rush out and put it to practice before understanding what it's doing and how it would fit in your application.
     
    Design (and coding) Principles
    The following principles I like to follow:
    Keep it simple, stupid (KISS): The simpler the pieces of code are, the easier it is to work with. Don't repeat yourself (DRY): If you are copying and pasting ANY amount, you should probably see if it fits better in its own function You ain't gonna need it (YAGNI): Don't do things you don't need until you actually need it. Loose coupling: Do not have one file, class, etc. heavily depend on another. That is, I should be able to remove this file, class, etc. without needing extensive modifications in other places of code. Single responsibility principle: A feature or a small subset of a feature should be contained within a single file, class, etc.  
    Regarding software implementation
    Code is written for humans, not machines
    It may be tempting to believe that the whole point of source code is to describe to a computer what you want it to do (and it'll do it exactly). However, that's not 100% true. Source code is describing what a computer should do in a way that humans know what the computer is doing.
     
    Code should always be consistent, even when you don't agree with the style
    Everyone has their reasons for styling the code. But what's important is not the style of code, but that it's the same style across the board.
     
    Code should self-documenting
    Every function, variable, class, etc. name needs to be meaningful. And not just meaningful by description, but context. For example all variables should be nouns, because they are a thing. All function names should start with a verb, because they are doing things.
     
    Comments should be used sparingly and the only time I think it should be appropriate to use comments with no questions asked are:
    Separating sections of code Describing files at the header At the top of function definitions to describe them If something non-obvious needs to be communicated, like why an innocuous piece of code needs to be where it is Integrating someone else's solution is fine, if you understand what it does
    Let's face it, a lot of coding is basically copying and pasting someone else's code. But there's a difference in sticking a piece of code in your software, seeing it work, and going on your merry way than taking this piece of code, analyzing it, and seeing what it's doing, and how it affects your code.
     
    Build and test often
    I always feel there's this stigma that if you have to constantly build and test out your application, you're probably not a good developer (This is probably self-inflicted, I never actually read anything about this). But to me, building and testing often means catching bugs and figuring out problems earlier than later. And when you change large swathes of code and there's a problem... well good luck finding what did it.
     
    Work on one feature before moving onto another
    If the application has a lot of features or the work is updating a lot of features, focus on one feature before going to another. This will reduce mental load and allow concentrating on getting that one feature more or less perfected. If the feature is dependent on another one that isn't developed yet, provide dummy data or something.
     
    Break things up if possible, but don't break them down too far
    Breaking things up helps understand the code piecemeal, but don't break it up so far that you're always jumping around in the code.
     
    Reinventing the wheel is fine for simple things, but don't reinvent it again
    Basically, if you can hammer out a library or set of utilities for something simple, do it. And reuse it.
     
    While I'm sure in Python there are CSV reading libraries, building one from scratch is easy and I'm not hunting one down to get the features I want. And when I do build it, I should keep it around so I'm not building it again.
     
    Make the application predictable
    One of the biggest time savers for source code, both for the computer running it and the person reading it is making the application as predictable as possible. You can do this by trying to eliminate conditionals as much as possible.
    What I mean by a conditional is any if-statement or any loop. The less of these you have, the easier it is not only for the computer to process (predictable code is happy code), it's easier for developers to read and understand. To put another perspective on it: for every conditional you have, you double the complexity. Examples of eliminating conditionals:
    If you are initializing a variable and it has to be within certain bounds, find a way to make it impossible to go out of bounds in the first place. For example, if you are generating a random number and it needs to be within certain bounds, contain how the random number is generated in the first place so you don't have to check after the fact.
      Using a single variable to maintain state, rather than checking a bunch of variables to determine what to do next. If you catch yourself writing code that's always checking the same variables, it would be faster to have any change to those variables update a single "state" variable, then depending on what "state" its in, determine what to do next. Then when it comes to debugging a problem, you'll only have one thing to check: what was the last state it was in? And it's much easier to do a search on when the state changes to that particular value than say a dozen variables and finding the right combination that blows things up (I've had to deal with this before)
      Use switch-case statements when possible. If you need guidance on when you should use them, this answer on StackExchange helps. The compiler or interpreter will then find the best way to turn this into an cleanly executable bit of code https://www.eventhelix.com/RealtimeMantra/Basics/CToAssemblyTranslation3.htm is an example of how switch-case statements are handled in C.
      Avoid if-statements in loops if possible. An example of this is if you need to print out values, say the prime factorization of a number. This requires printing out an * after each value, but you don't want to print an * at the end of the last one. You could, in the loop that does the factorization, check if the value is going to be the last one and if it's false, print out an *. Or you can solve the first factor then in the loop, always print a * first thing. It boils down to this: make your code predictable. Deterministic. That is for any input, strive to make it run the same exact code.
     
    Don't prematurely optimize
    Premature optimization is trying to be "smart" with code for the sake of performance at the cost of readability. Premature optimization also just adds additional work. You shouldn't be caring about the performance unless it's too slow. And if you need to care about performance, benchmark and profile the application to see where the heaviest cost is, then work on optimizing that. This reduces the amount of work you need to do.
     
    It's okay to do the naive thing to prove the design works
    This ties in to "Don't prematurely optimize", but if you want to make sure the design works, do the naive solution just to prove what you're doing works. Once it does work, give it a once over for improvements if it doesn't take much effort.
  23. Like
    Mira Yurizaki got a reaction from TopHatProductions115 for a blog entry, Why multi-video card setups can't combine VRAM   
    ( I need a name for blog posts like these, but all the good ones are taken )
     
    While I don't think it's often brought up, an idea might come about that when using multiple video cards, such as in SLI or Crossfire, their VRAM combines. So if you have two 8GB cards, you effectively get the same thing as a 16GB card. However, this isn't the case. You might be asking... but why? If these setups combine GPU power, how come VRAM doesn't combine?
     
    On a broader view, the video cards are given the same data so that they can work on the same thing. After all, they're generating frames from the same scene in the application. But wouldn't it be cool if you weren't limited to just the amount of VRAM on the card and expand beyond it? There's just a few problems with it:
    How is the data going to be transferred? If we look at PCI Express, it's a relatively slow interface compared to VRAM. PCIe 3.0 x16 caps at about 15.75 GB/s. NVIDIA's Titan V's VRAM has a bandwidth of a mind boggling 652 GB/s (imagine having that for your internet speed). So transferring data to and from cards would be an incredibly slow affair that would introduce stalls. To put in perspective, this speed difference is larger than that between SATA SSDs and DDR4-2133 VRAM works basically like a huge RAID-0 system. That is, each chip only has a fraction of the bandwidth of the card and it's the combined total of all of the chips performing that gives the bandwidth. So in order to transfer the data as fast as necessary to other cards, you would need a huge number of lines. I don't think connecting say 200 pin cables would be fun (nor would manufacturing them) Data transfers would have to be over a parallel bus. I've talked about this in some detail why high-speed parallel buses for usage outside of relatively short ranges stopped being a thing. But aside from the bulky cabling, there's also the issue of signal timing. It's going to be very hard to ensure that all the bits of a 7GHz signal will reach its destination at the same time, even if it's only say six inches end to end. A similar issue exists in systems with multiple physical processors. Though in this case, since all of the interconnects are on the motherboard itself, there's little issue with either making a huge cable or signal propagation. However, even in such cases, the system has to be aware of how to schedule tasks. As there's still a significant amount of latency accessing another processor's memory, some tasks will perform worse if scheduling considerations aren't taken into account.
  24. Like
    Mira Yurizaki got a reaction from TopHatProductions115 for a blog entry, When something seemingly designed well still has a problem.   
    With yet another security bug found on processors, one has to think how anyone would've let this through for this long. People would like to think there's incompetent engineering out there and while sure, they exist, what people also don't see are the designs that even you would agree with all of the knowledge and experience in the world that seems sound without experiencing it in the real world. So I have an example of such. This one I love to share, partly because pride (I was a junior developer who found a bug in senior developer designed code, showing that even people with 5-10 years experience can make mistakes), partly because this illustrates the point well.
     
    A description of the system
    I was working on a system that comprised of a main controller unit and several wireless sensors. We had a rule with wireless communication in that we had to assume it's unreliable, even if 99.9999% of the time it appears reliable. This required that if a device transmits something, the recipient had to acknowledge it, or send an ACK. If the transmitter doesn't receive an ACK within some time, it'll retry sending the message. If three retries happened, then the device gives up on sending the message.
     
    To handle this in software, we used a state machine. I forget the exact details, but this is what it looks like more or less on the transmitter side.

    This particular style of state machine is called a hierarchical state machine. The lines with arrows represent state transition with the event that triggers it.
    The default state is "Tx Idle" When it gets a "send message" trigger, it transitions to the "Tx Busy" state. After the message is sent, it goes to the "Waiting for ACK" state. This is a sub-state of the "Tx Busy" state because until the last request has been ACK'd, the transmitter won't transmit another message. If another message request comes in while in the "Tx Busy" state or its substates, it gets queued. If an ACK wasn't received in time, it moves up to the "Tx Busy" state again as the message is sent again. If an ACK was received or the message was retried enough times, it goes back to the "Tx Idle" state. If the system needs to send an ACK for any reason, the system immediately moves to the "Tx Busy" state. I forget the exact detail of this mechanism, but sending an ACK basically had priority over anything else this thing was doing. A buffer was included to queue up any messages (except ACKs) that needed to be sent if the hardware transmitter was busy sending something.
     
    The problem: The message queue gets too full and breaks
    So the problem started when a project manager working on the system with us was doing a random test on his own. The system had 8 nodes that needed to transmit and receive data back and forth between a main unit. He invoked all of the nodes causing them to flood the main unit with messages that needed to be handled. If he did this long enough, the system would basically stop and "hang." There was a queue for requests in the state machine and if another one comes in but the queue was full, it'd trigger this behavior. Not that it was bug (i.e., hitting some overflow case), it basically failed an assertion check
     
    My investigation led to the cause being that the number of requests coming into the transmission queue was outpacing how fast this state machine could go through it.
     
    While I'll go over what happened, I want you to think about what the solution would be. You don't have to make a comment but stew on it. Just so you're not going blind, here are the parameters you'll be working with:
    The hardware this ran on at the time was an OMAP 3430. For those that don't know their SoCs, this was the same one that powered the Motorola Droid The devices connect through a ZigBee wireless network system. Unlike say Wi-Fi, ZigBee uses a mesh topology. This allows a device to only send data to the closest one, which will then send it to the next closest one until the ZigBee coordinator (the equivalent of a router in Wi-Fi) is within range. The ZigBee coordinator is within the main controller unit and communicates to the main board over a serial line at 115200 baud (or about 115.2 kbps) The messages were at most 300 or so bytes in length. Retry time is 100 milliseconds. At the time this problem happens, the system appears more or less fine (i.e., retries aren't piling up)  
     
     
    The root cause: There's an issue with the ACKing system
    The problem lies with the priority need for ACKs to be transmitted. The reason for having a "Tx Busy" state in the first place is not really as a courtesy, but that the serial lines are asynchronous. That is, once we fed the serial line some data and how much of it there is, it'll take care of the rest and the application is free to do other things. The state machine is waiting for the serial line to say "Okay, I'm done" before moving to the next state. However, whenever a "send ACK" request comes it, it gets sent regardless of what's going on.
     
    Because of the way ACKs are short cutting the process, they are constantly keeping the serial line busy. This unintentionally can introduce a stall in the state machine where it never gets to the "Waiting for ACK" state. Or rather it gets there, but it's constantly pulled away from it. To put in another way, let's say you need to talk to someone, but there are other people who have higher priority than you who are allowed to interrupt you whenever. So whenever one of these higher priority people come in, they butt you out, speak to the person, and leave. But there's a ton of these people, and eventually your request never gets served (and you'll feel like punching one of these higher priority people).
     
    (Note: I don't recall the exact way the serial line behavior was on the main unit, so there's some holes in the explanation here that I can't answer)
     
    The solution is to deffer all transmission requests until the transmission is completed. So now the state machine looks something like this:

     
    The fun part was the original state machine was also used in a few other places where some sort of communication with another device was happening. As you can imagine, this fix had to be propagated to various other parts of the system. And not only that, but we already had documentation with these state machine diagrams and such, so those had to be updated.
     
    So remember: just because something looks sound, doesn't mean it's bulletproof. If you want to critique a huge issue cropping up, you're free to do so as long as you understand most of the time, these things go overlooked because they're not readily visible.
  25. Like
    Mira Yurizaki got a reaction from TopHatProductions115 for a blog entry, Project Dolly: A look into cloning Windows (Conclusions)   
    It's earlier than I'd said I would report this but for reasons I'm choosing to wrap up this experiment.
     
    In day-to-day usage, I still haven't ran into any problems. Granted I did not play any games on the laptop, but I did run 3DMark after the cloning. However supposedly people do have issues regardless if they game or not. I also may not have been exactly representative of the use case, since I didn't clone after say a year of use. Though I can't think of anything that would cause issues since the only thing that should grow barring the installation of other programs is the Users folder and possibly ProgramData.
     
    There was one other problem I forgot to mention in my last entry since it slipped my mind: the hard drive was offline when I first booted into the SSD. This means Windows knows the drive existed, but it didn't mount it. i.e., it wasn't accessible. I found an article explaining why this was though: http://www.multibooters.com/tutorials/resolve-a-disk-signature-collision.html. Windows has what is called disk signatures, which originally was meant to help the OS configure a software RAID setup. The disk signatures are supposed to be unique, but sometimes cloning software will copy this over. When that happens, Windows will take the other drive(s) with the same signature offline. You can put them back online, however, Windows will reassign a new signature to that drive. However, this is only a problem for that disk. So if you planned on booting into it again, you'll have issues.
     
    The article claims that cloning tools used to either give you a choice in the matter of keeping or assigning new disk signatures. But now they automatically assign new signatures and change the configurations so Windows doesn't freak out.
     
    So yes, there is some grain of truth that if you clone, Windows will run into issues because it's expecting one set of disk signatures and you've changed them. However, this appears to be a combination of the cloning tool and what the user did. Like for example in my case, since Samsung's cloning tool copied the hard drive basically verbatim, if I accessed the SSD right after cloning the hard drive (which is very likely if the person wants to do a soft verification of the clone), it would've changed the disk signature and I would've had issues. But since I booted into the SSD instead, I changed nothing. I'd imagine since few people know about disk signatures and if this was the root cause of their problems, this is why they think the act of cloning itself causes issues.
     
    I used to clone via creating a bootloader partition, cloning C:\ onto the new drive with Partition Wizard (or Magic, I forget which), then making a bootloader on the drive. So disk signatures weren't really an issue for me with this method. However this hasn't been working for some reason or another and I don't really have a reason to clone drives these days so I never figured out why.
     
    My conclusions on the matter:
    Cloning is more or less a safe thing to do. You should use the tool from manufacturer of the drive you're cloning tool if they have one. If the manufacturer does not have a tool, get a program that is advertised to do so like Macrium Reflect or Arconis True Image. After cloning do not try to access the drive. Do a verification by booting into it. If you do not see the original drive after booting into the new one, do not try to access it until you're satisfied with the cloned drive.
×