Jump to content

Rise of the Tomb Raider DX12 thread usage study

So this is my first real deep dive into a topic like this. If you find any errors or have any questions about what I put down, don't hesitate to ask or tell me. :D

.

.

               So this essay was actually inspired by a session of gaming where I was playing Rise of the Tomb Raider where I noticed that the game was using all of my six cores and twelve threads on my Ryzen 5 1600. Closely inspecting the cpu usage graph for each core revealed that all of the cores seemed to have the same usage pattern which gave me pause as to whether or not the game was actually using all of the cores for a useful load or if it was just replicating data to be worked on across multiple cores at the same time. I did not notice this behavior in dx11 mode though my overall performance was lower in DX11 overall. I decided to further investigate this by seeing if I could disable threads and cores to find out at which point the frame rates would actually start suffering from the reduced thread count which should indicate how many threads Rise of the Tomb raider was actually using.

               My methods for testing were using Rise of the Tomb Raider’s built in benchmark in repetition at the medium preset with textures turned up to high, anisotropic filtering set to 16x, dof turned off, motion blur and vignette blur turned off, tessellation set to on, and smaa as my anti aliasing. DX12 was enabled as well as exclusive fullscreen with a resolution of 3440x1440. In order to test the core and thread differences, I would disable the amount of cores or disable smt in the bios of my Gigabyte ab350n motherboard prior to testing and then boot into Windows 10. I would allow all system processes to die down and then launch ROTTR. I used msi afterburner, HWINFO64, and windows task manager to monitor cpu and gpu usage along with temperatures and power draw. I would run an initial “sacrificial” run of the benchmark to allow the system to load whatever it needed to into ram to reduce stutter and random cpu usage which would be otherwise unrepresentative of the overall expected performance. I would then run the benchmark six more times sequentially recording the average, minimum, and maximum frame rates for each part of the test. At the end of the run I would take a screenshot of all aforementioned monitoring software for further analysis.

The system in question is comprised of an AMD Ryzen 5 1600 overclocked to 3.8Ghz using 1.325 Vcore, 16Gb Team Group Delta rgb ddr4 running at 2666Mhz cl 15, an EVGA GTX 1070 ti sc black edition with a sustained boost frequency of between 2050-2025Mhz, all on a Gigabyte ab350n wifi mini itx board and with all components custom liquid cooled on a 240mm loop with all fans at 100% pwm to ensure temperature was not a variable in regards to performance.

               Starting with the baseline of six cores and twelve threads, we can see the average is 78.45 fps with minimums dipping down to 44.71fps. This is a good result and if we look at the cpu and gpu usage graphs we can see that the cpu has a general downward trend in terms of overall usage and the gpu has no dips in usage which is what we are looking for. Now, simulating a 4 core cpu with smt to 8 threads such as a 1400 or 1500x, we can see that the average is 79.2fps with the minimums dipping down to 37.72. The difference between these two averages are not statistically significantly different if using a standard value of 0.05 for a standard 2 sample T test and anecdotally that seems true. The minimums however are in clear favor of the 6 core 12 threaded cpu. Looking more closely we can see that the maximums for the 4 core 8 thread cpu are noticeably ahead of those of the 6 core 12 threaded one. The minimums of the 4 core smt cpu are overall lower with a narrower spread than that of the 6 core smt cpu. The minimums of the latter processor were significantly better than those of the former. Looking at the cpu and gpu usage graphs shows us that there are a few more noticeable dips in gpu usage and overall cpu usage is higher though overall they both look good and would both make for a good experience.

Now looking at a six core cpu with smt disabled so 6 cores and 6 threads, the average comes to 79.24 fps with minimums dipping down to 44.96. In this case lower thread configuration is measurably higher than that of the higher thread count cpu in terms of average fps. However when looking into minimums we can see that they are not significantly different and so are effectively the same. Overall the experience between these two were very similar and looking at the cpu and gpu usage graphs reveals that while the cpu usage was overall higher, there appear to be a couple of times where the cpu spiked less though the gpu usage remained about as flat as the 6 core 12 thread cpu and gameplay remained similar between all three of the aforementioned cpus with a couple very slight frame skips with the smt enabled four core.

               Moving on to an uncommon cpu core and thread setup, I decided to test a 3 core and six thread configuration to see how smt thread performance would compare to real core usage. The average fps is 77.39 fps and the minimums are 39.95. This puts the average fps measurably worse than both the six core twelve thread configuration and the six core six thread configuration though only by about two fps at best. In terms of minimums, the three core six thread configuration is not quite statistically significantly different than the six core twelve thread configuration, though it would likely become different if sample size was increased, but is measurably worse than that of the six core six thread configuration which gives credence to the theory of an increased sample size. In gameplay there were again minor frame skips which is backed up by looking at the cpu and gpu usage graphs which show that the cpu usage was high and that there were longer dips in gpu usage than in any of the previous configurations.

Moving down to four cores and four threads effectively emulating something like a Ryzen 3 1200 or 1300x, we get an average fps of 76.32 and a minimum fps of 40.17. Comparing this to the original six core and twelve thread we can see that the average fps is measurably worse though the about two fps difference is not noticeable in gameplay. Minimums end up not being significantly worse since the sample size is low. Comparing to the four core eight thread configuration the four core without smt configuration comes in at about three measurable fps less with minimums that are effectively the same. In this case disabling smt ends up netting worse performance contradicting the six core configurations results. The last comparison to this configuration I thought was sensical was that of the three core six thread configuration since smt threads rarely equal real cores in workloads that are not highly parallelizable. In comparing these two there is a measurable drop in fps by about one frame per second which is measurable but is not perceptible in game. The minimums are once again effectively identical between the two configurations. Looking into the cpu and gpu usage graphs shows that there are longer periods high cpu usage in comparison to all other configurations with longer but even dips in gpu usage with the gpu usage dips being slightly deeper than those of the three core six threat configuration.

               Our last configuration is once again an odd configuration of three cores without smt for three total threads. While it is impossible to buy a modern cpu with this configuration it is useful for thread and core comparison. Also two cores and four threads was not stable with my overclock for some reason and by this time Rise of the Tomb Raider wanted me to wait 24 hours for it to revalidate that I had actually bought the game so I decided to cut it off at the three cores since you should probably not be buying a cpu with less than four in today’s market anyways. Moving onto the data, average fps come in at 75.21 which seems surprisingly high for such a low core configuration though not entirely surprising considering dx12’s claim to fame has always been better usage of weaker cpus. The minimums dip down to a pretty low 30.63 fps which begins to show how this cpu configuration is performing. This configuration is measurable and noticeably worse than all of the other configurations in both minimum and average fps. While the average fps seems to be pretty high, a quick look into cpu and gpu usage shows extremely high cpu usage with pretty much constant gpu usage variation with larger and longer dips than the previous configurations. I also observed during the runs that foliage would suddenly pop into existence while moving along in the run with other objects failing to load in at their proper times as well.

               Overall reviewing this data gives us a look into how Rise of the Tomb Raider utilizes threads in dx12 mode. Running this game in 1080p rather than ultrawide 1440p would have better accentuated thread differences but I chose to instead use a resolution that I think is somewhere near if slightly above the sweet spot for gaming right now. There are still some differences even if they are lower they do give insight as to how each thread of a system is handled in this game. Of note is that I would have liked to test a two core four thread configuration to match up older i3 cpus, it was unstable at my overclock of 3.8 Ghz but would run at 3.7ghz at stock settings. I did not want to redo all of my testing at 3.7Ghz so I decided that I would use a three core three thread configuration as a placeholder. Perhaps in the future I can take a deeper look into how a dual core with smt would fare in this game as well as others in comparison to the other configurations I have here. Overall it seems that this game prefers cores to threads though additional thread can make up for lacking cores. Smt does not have a purely negative impact on gaming as it did when ryzen was first launched and while it is usually behind in terms of minimum frame rates the difference is small enough to where I would say that there is no point in disabling smt in Rise of the Tomb Raider. In practice anything over 4 cores seems to be plenty to run this game at respectable frame rates and resolutions and in my testing scaling stops after 6 total threads. With that said I cannot dismiss that an eight core cpu might be a better performer though my data indicates that the cpu is already being used at a low enough amount that the additional threads should have no effect but to lower overall per thread usage. In the future I may add data should I get my hands on a higher core count cpu and it shows some significant difference in performance over this six core part.

.

.

Here is a link to the Google sheets document containing all of my data: https://docs.google.com/spreadsheets/d/17NCvNtm4q08kRsa4j7uPNO0zQ_aS9oEwidojbPtGu-Y/edit?usp=sharing

 

Here are the graphs from my data:

 

1119912925_6c12t4c8t6c6t3c6t4c4t(1).png.06c6ea83c2c12890604ea46269db84ff.png31065712_maxaverageandmin(1).png.06b979b4f05cb7e84d8a587722b0027e.png

 

and here are the screenshots of the cpu and gpu usage during the runs:

 

2039468920_6c12trottr.thumb.png.604c5da622c53377cbd43b398a3c7c27.png1114161853_22c8trottr.thumb.png.62de0d9a98ada104806730f4f19ffb4c.png286624750_6c6trottr.thumb.png.b8bb5bed5e51e4fd8908915ad2d3b08e.png107582406_3c6trottr.thumb.png.60406514b2c6e91d44f85afa718b4f29.png276844856_4c4trottr.thumb.png.ac7b883d3ffbd48d819666e6f08583d6.png898016753_3c3trottr.thumb.png.58191053eb4f1ec384adba684dcc5cf2.png

Link to comment
Share on other sites

Link to post
Share on other sites

Another takeaway is Tomb Raider is incredibly good at leveraging multiple threads, actually didn't expect that.

¯\_(ツ)_/¯

 

 

Desktop:

Intel Core i7-11700K | Noctua NH-D15S chromax.black | ASUS ROG Strix Z590-E Gaming WiFi  | 32 GB G.SKILL TridentZ 3200 MHz | ASUS TUF Gaming RTX 3080 | 1TB Samsung 980 Pro M.2 PCIe 4.0 SSD | 2TB WD Blue M.2 SATA SSD | Seasonic Focus GX-850 Fractal Design Meshify C Windows 10 Pro

 

Laptop:

HP Omen 15 | AMD Ryzen 7 5800H | 16 GB 3200 MHz | Nvidia RTX 3060 | 1 TB WD Black PCIe 3.0 SSD | 512 GB Micron PCIe 3.0 SSD | Windows 11

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, kevinisbeast707 said:

whether or not the game was actually using all of the cores for a useful load or if it was just replicating data to be worked on across multiple cores at the same time.

so just turn off cores and see if core usage goes up? that's a lot of work and time I'd give you that, but hardly efficient

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, emosun said:

so the takeaway from this is your 1070 is a bottleneck

(i skipped the text wall and just looked at the pictures)

My 1070 ti is somewhat of a bottleneck at that resolution though I believe the data is still somewhat relevant since there are statistical differences in each data set to the point of where I think that even at 1080p while there would be a greater difference between the 4 core and the 6 core 12 thread, I think that in practice both would be very playable. May do another run at 1080p just to confirm.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Jurrunio said:

so just turn off cores and see if core usage goes up? that's a lot of work and time I'd give you that, but hardly efficient

True but I wanted to emulate lower tier processors as accurately as possible since using windows to disable cores doesn't allow you to choose between cores and threads.

Link to comment
Share on other sites

Link to post
Share on other sites

this alone , showing little change to the average despite disabling over half the cpu , leads me to say its a pretty big gpu bottleneck

Untfvfffgfggfitled.png

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, BobVonBob said:

Another takeaway is Tomb Raider is incredibly good at leveraging multiple threads, actually didn't expect that.

if it was good at that, we would see big difference in fps...

anyway as someone already mentioned it seems like gpu bottleneck

MSI GX660 + i7 920XM @ 2.8GHz + GTX 970M + Samsung SSD 830 256GB

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, Neftex said:

if it was good at that, we would see big difference in fps...

anyway as someone already mentioned it seems like gpu bottleneck

I will be redoing this test at 1080p both at low settings and at what I would expect most people to play with to eliminate the gpu bottleneck.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Neftex said:

if it was good at that, we would see big difference in fps...

anyway as someone already mentioned it seems like gpu bottleneck

Not so much because of the actual performance, but because of the impressive level of load spreading with many threads

¯\_(ツ)_/¯

 

 

Desktop:

Intel Core i7-11700K | Noctua NH-D15S chromax.black | ASUS ROG Strix Z590-E Gaming WiFi  | 32 GB G.SKILL TridentZ 3200 MHz | ASUS TUF Gaming RTX 3080 | 1TB Samsung 980 Pro M.2 PCIe 4.0 SSD | 2TB WD Blue M.2 SATA SSD | Seasonic Focus GX-850 Fractal Design Meshify C Windows 10 Pro

 

Laptop:

HP Omen 15 | AMD Ryzen 7 5800H | 16 GB 3200 MHz | Nvidia RTX 3060 | 1 TB WD Black PCIe 3.0 SSD | 512 GB Micron PCIe 3.0 SSD | Windows 11

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Neftex said:

if it was good at that, we would see big difference in fps...

anyway as someone already mentioned it seems like gpu bottleneck

 

5 hours ago, emosun said:

so the takeaway from this is your 1070 is a bottleneck

(i skipped the text wall and just looked at the pictures)

I redid testing at 1080p with the exact same methods but at 3.7Ghz so that I could run 2 cores and 4 threads. I used both the lowest settings with very high level of detail which uses more cpu than normal and then I did some more runs with everything turned all the way up except for textures which were set to high, no motion blur, and smaa as the anti aliasing. Skipped testing of the 3 core series after noticing a general trend. Starting at 4 cores and 4 threads and going down, some models in the benchmark just straight up wouldn't render which gives way to the increase in frame rate you will see. This is either a side effect of this game's DX12 implementation or specifically the way this benchmark is designed. It is worth noting that the models not showing up in the lower core counts was a problem that I had in the previous resolution tests so just because an fps chart shows numbers doesn't mean they don't require nuance. DON'T USE A DUAL CORE IN RISE OF THE TOMB RAIDER. QUAD CORES NEED TO BE PROPERLY OPTIMIZED THROUGH SETTINGS TO USE MORE GPU.

 

781819773_Lowsettingsveryhighlevelofdetail.png.d297870a73b988b1f52fa7d2e6537ce7.png494603384_Veryhighpresetsmaa.png.b745c38c888e418147ac80e4fe227846.png

Link to comment
Share on other sites

Link to post
Share on other sites

looks like it just doesn't like hyperthreading and prefers 1 to 1

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, emosun said:

looks like it just doesn't like hyperthreading and prefers 1 to 1

This is mostly true though I did find more instances of glitchy models in 4 core 4 thread than with 4 core 8 thread and I also found 6 core 6 thread better than 4 core 8 thread which the graphs sadly fail to convey. But once on the higher resolution tests then 1:1 is the preferred core thread ratio though it doesn't seem to be too bothered by the addition of smt to the four and six core configs.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 year later...
On 4/4/2020 at 3:24 PM, md_rayan said:

ROTTR_DX11_vs_DX12_2.png

In this game in particular I haven't seen a compelling reason not to use dx12. Maybe it'd make a difference on a dual core 🤷‍♂️

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×