Jump to content

[GUIDE] Bottlenecking Guide Ver. 2 [Updated 8/6/17]

You want to upgrade your video card, but your system isn't quite up to spec. You're also worried that an upgrade can mean your performance will be limited, and obviously you don't want to spend all this money just to not be able to get the most out of your hardware. This guide is to help figure out where there might be performance holes in your system and what you can do to fix them.

 

This an update to a previous guide I made about bottlenecking, since the original one I wrote focused a little too much on one aspect of it.

 

(Link to that post)

Recap: What is a bottleneck?

On the surface, a bottleneck is an imbalance or limitation of hardware. For example if you have a really strong graphics card but not a strong processor, then the processor could be spending too much time working on game logic to get around sending render commands to the GPU. A tale tale sign is you're running an application and it's not performing as fast as it could be. Though "as fast as it could be" is kind of hard to objectively say so. If you're playing a game, for instance, and you want to make sure the video card is working as fast as it can, you can check benchmarks of that video card against the game you're playing from review sites. Usually they run on the top-end hardware, lowering the potential of a bottleneck.

 

Bottleneck issues usually show themselves in mainly two ways

  • The average performance isn't as good as it should be.
  • The frame rate isn't consistent most of the time.

For the purposes of this article, I'm assuming your main goal is to have the GPU load close to 100% as much as possible.

 

Tools of the trade to figure out bottlenecks

GPU-z (https://www.techpowerup.com/gpuz/): This is useful for obtaining information about what's going on in the GPU including clock speed, GPU load, and for some GPUs a performance cap reason. GPU-z also allows you to log data.

Task Manager: You can check out CPU load, system RAM usage, and tweak how many threads the application can use to further test. As of Windows 10 Fall Creator's Update (1709), Task Manager includes a GPU usage section that can work as a substitute for GPU-z, although it's less informative.

Performance Monitor (https://technet.microsoft.com/en-us/library/cc749249(v=ws.11).aspx): This is a tool built-in Windows that you can use to log data. I usually use this to log CPU usage since you can't do this in Task Manager

FRAPS (http://www.fraps.com/): While it's primary use is to monitor frame rate performance, it can also log frame times which can be useful for determining where performance drops.

 

The kinds of bottlenecking

Since bottlenecks are usually the result of some issue with hardware, let's go over the various root causes. If you suspect you're having a bottleneck issue, consider the application you're running or the hardware setup you have to determine the root cause.

 

Note: This assumes the rest of your software setup is sound. i.e., you have the latest updates and your drivers aren't giving you problems.

 

Maximum CPU loading

Symptom: All of the processor's logical cores (or "threads") are maxed out at 100%.

How to verify: Reducing the application's processor affinity causes performance to tank linearly. i.e., If the application is maxing out 4 cores, disabling 1 reduces performance by about 25%.

How to fix: Upgrade the processor to something with more cores. While overclocking can help, it will not offer a lot of wiggle room.

Description: This is caused by the application needing so much of the processor's time that it cannot send render commands to the GPU fast enough. For the most part, games will typically only process logic every so often on a fixed timer and logic often goes before graphics. So if the processor spends too much time processing the game logic, it doesn't have enough time generating render commands. An example of this can be seen back in the days of 8-bit consoles. When too much action is going on the screen, the game slows down as the CPU needs more cycles to process logic before it can get to sending rendering commands.

 

Note: Some games, by design, max out the CPU load and performance will tank. This is because what they're doing requires no player input. An example of this is in Civilization during the computer's turn. At this point, the player shouldn't be doing anything so it's reasonable to expect the game to not focus on rendering graphics and spend most of its time calculating CPU turns.

 

Note 2: This is the definitive line when you should upgrade your CPU if you haven't already. If you are maxing out your CPU, then gaming performance will flat line. That is, upgrading the GPU will not improve anything at all.

 

Note 3: Maximum CPU loading can be exacerbated using NVIDIA GPUs on lower core-count systems due to the way NVIDIA's drivers work. NVIDIA implemented a method that causes DX11 multithreaded rendering to happen regardless if the developer implemented it or not in game. This adds additional CPU overhead, as the primary thread of the game needs to spend more time assembling the GPU command lists. Also since the reassembling requires cross-core coordination, any communication hiccups between the cores can cause issues as well.

 

tl;dr, for DirectX 11 games it's best to pair a high-end NVIDIA GPU with an 8-thread or higher Intel CPU if possible. 4-thread Intel CPUs may become too taxed while on Ryzen due to the nature of the CCX architecture, it's possible for one of the worker threads to land on another CCX, introducing a lot of latency to the primary thread.

 

Reliance on single core performance

Symptom: Either a few logical cores have high usage or the entire processor appears busy, but appears to flat line at a certain percentage.

How to verify: Reducing the application's processor affinity does not cause performance to tank unless you really reduce the affinity. e.g., if the total processor load on say an 8 thread processor is only 60%, disabling one core does not affect performance at all.

How to fix: Increase the single core performance by either changing the processor with a higher IPC or increase the clock speed of the current processor

Description: This is caused by applications that cannot issue enough threads of work on average to saturate the processor. As a result, the processor can easily finish up the work but has to wait for the application to issue more threads. This can be caused by many things, usually due to over-reliance on synchronous software design.

 

Note: Even if you hit this bottleneck, this is not a sign you need to upgrade the CPU right away. Unless the single core performance is low enough that it's getting in the way of your requirements, don't jump out and get an upgrade. For example, AMD's FX-8350 lost to a lot of benchmarks to Intel's Sandy Bridge and Ivy Bridge when released. Today? It can deliver comparable performance in modern games, such as in http://www.techspot.com/review/1128-rise-of-the-tomb-raider-benchmarks/page5.html

 

Note 2: Ditto with Note 3 on Maximum CPU Loading

 

Accessing storage

Symptom: Performance hiccups or outright drops for a few moments when moving around in the game's environment. This is normally a problem in open-world games

How to verify: Go to different areas of the game, but do not backtrack (those areas are often left in RAM, so they load "instantly"). If you have a hard drive, you can hear it working while you play the game.

How to fix: This is most likely going to be an issue with the game being on slower storage media like hard drives. The easiest solution is to upgrade the storage to an SSD. If that's not feasible, then defragging and having the defragger consolidate data should help. Windows' built-in defragmenter should consolidate data.

Description: This is caused by the application needing something, but it hasn't loaded it yet so it has to go to storage to retrieve it.

 

Note: Texture pop-in, while annoying, is not a performance related issue for the purposes of this article. However, if you want to avoid texture pop-in, getting faster storage will help.

 

VRAM/RAM loading

Symptom: Performance starts off fine, but at some point it starts to stutter.

How to verify: Check if the VRAM/RAM usage is close to 100%

How to fix: For VRAM, lower resolution based settings (texture, shadow, screen) or turn off MSAA. For system RAM, get more RAM or close applications that you're not using.

Description: Applications aren't aware of how much actual RAM there is in the system, only the OS. As a result, when the application starts requesting more RAM and no more physical RAM is available, the OS starts evicting data from RAM onto storage. This takes a very long time in computer time and normally the application can't do anything else while this is happening, causing hiccups.

 

RAM performance

I'm throwing this in here because while I do not believe RAM performance generally affects gaming performance all that much (unless you're running an integrated GPU, then it does)... see:

There are some cases that do show RAM performance does impact gaming performance:

The only way to fix this is to get RAM that performs faster. :P

 

Establish your baselines before worrying

It might be tempting to freak out when you realize your hardware isn't giving the performance you want, but don't look at things like that. Unless you're running top-end hardware, you will always have a bottleneck somewhere. It's best to establish your requirement baselines (be reasonable!) and be flexible in your expectations when upgrading.

Link to comment
Share on other sites

Link to post
Share on other sites

Appreciate the write-up. Also, I don't wish to belittle the contribution in any way, but when it comes to the B word, I think it's best to always emphasize a reference point up front. It streamlines things quite a bit.

 

Just straight up "Does your setup maintain your target framerate for a given scenario? (I.e. 1080p 60/120/144, 1440p 60/144 etc)". If not, what's the hindrance? That sort of thing.

OS: W10 | MB: ASUS Sabertooth P67 | CPU: i7 2600k @ 4.6 | RAM: 2x8GB Corsair Vengeance 1600mhz | GPU: x2 MSI GTX 980 Gaming 4G | Storage: x2 WD CB 1TB, x1 WD CB 500GB | PSU: Corsair RM850x | Spare a moment for Night Theme Users:

Spoiler

I'm an erudite cave-dwelling Troglodyte
I frequent LinusTechTips past midnight
Dark backgrounds I crave 
For my sun-seared red gaze
I'll molest you if you don't form your text right

 

Link to comment
Share on other sites

Link to post
Share on other sites

Help, I've dropped frames and can't get them up!

 

You should mention texture pop-in for low VRAM/RAM and storage. There was also a small test with low-VRAM GPU's benefiting from SSD installs. 

 

RAM bandwidth and latency usually affect minimum frame-rates in games as long as you're not running out of space.

 

I'm glad that you added the line for per core usage rather than overall as people commonly express that an i5 at 100% is worse than an i7 at 50% when programs only use four cores (logical or physical) when the true culprit could simply be frequency.

Cor Caeruleus Reborn v6

Spoiler

CPU: Intel - Core i7-8700K

CPU Cooler: be quiet! - PURE ROCK 
Thermal Compound: Arctic Silver - 5 High-Density Polysynthetic Silver 3.5g Thermal Paste 
Motherboard: ASRock Z370 Extreme4
Memory: G.Skill TridentZ RGB 2x8GB 3200/14
Storage: Samsung - 850 EVO-Series 500GB 2.5" Solid State Drive 
Storage: Samsung - 960 EVO 500GB M.2-2280 Solid State Drive
Storage: Western Digital - Blue 2TB 3.5" 5400RPM Internal Hard Drive
Storage: Western Digital - BLACK SERIES 3TB 3.5" 7200RPM Internal Hard Drive
Video Card: EVGA - 970 SSC ACX (1080 is in RMA)
Case: Fractal Design - Define R5 w/Window (Black) ATX Mid Tower Case
Power Supply: EVGA - SuperNOVA P2 750W with CableMod blue/black Pro Series
Optical Drive: LG - WH16NS40 Blu-Ray/DVD/CD Writer 
Operating System: Microsoft - Windows 10 Pro OEM 64-bit and Linux Mint Serena
Keyboard: Logitech - G910 Orion Spectrum RGB Wired Gaming Keyboard
Mouse: Logitech - G502 Wired Optical Mouse
Headphones: Logitech - G430 7.1 Channel  Headset
Speakers: Logitech - Z506 155W 5.1ch Speakers

 

Link to comment
Share on other sites

Link to post
Share on other sites

  • 5 months later...

By some fluke I ran across the graphs from an article on PCPer that I've been looking for a while.

 

The takeaway I wanted to show were these graphs:

3329-vtune-ashes.jpg

(This is for Ashes of the Singularity)

 

1636-vtune-gtav.jpg

(This is for GTA V)

 

Despite that GTA V does issue multiple threads at once, it only does enough to keep 3-4 logical CPUs on average, whereas Ashes of the Singularity does enough to keep 8 logical CPUs busy. Now this isn't to point out that GTA V is flawed in some game design because not every game provides the same workload. This is merely to show that in this case:

  • GTA V's multicore performance benefits largely drops off after 4 logical CPUs and after that, starts to become more sensitive to single threaded performance.
  • Ashes of the Singualrity's performance can be improved by increasing the number of logical CPUs.

The major problem though with PCPer's method is that the tool they used is a development tool that Intel likes to charge out the rear for and at the moment I know of no way with Microsoft's built-in Windows tools to isolate what process is using which cores.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×