Jump to content

I’ve wasted my life testing tech. No more!

AdamFromLTT

One feature recommendation I'd like to see is some kind of way to automate actually "playing" a game ((even if in the most simplistic ways) or even maybe using a creative app, so that it can be used in any software that doesn't have built in benchmarks, and even within games that DO have built-in benchmarks. For those that are wondering why this is important, I've got three reasons, at least two of which were discovered by Kyle Bennett at HardOCP something like two decades ago. Credit to Kyle - he laid many of the foundations for sound PC game benchmarking when some of our community weren't even born.

 

The first is that some games' benchmarks are done in areas that are either not representative of the average experience in the game (it sometimes winds up being done in an area that's less demanding than the game) or even if it is representative of the game, it is often not representative of the sort of "worst-case scenario" i.e. a highly demanding portion of the game - and for the same reason game benchmarking started including 1% and 0.1% lows, sometimes you want to see that.

 

Another reason is that sometimes, game benchmarks are modern versions of the old Doom/Quake "timedemo" where a recording of events is played back and game subsystems like input, AI, and game engine overhead is not done, meaning the results that are spit out are better than the real game because at that point it's only testing the renderer. This can make quite the difference even when testing the exact same areas that you see in the benchmark.  

 

The final reason for doing this is that in open world games, many of these games' benchmarks just do a slow, plodding fly-around of one small area within the game world and don't travel around, which doesn't do a good job testing the storage subsystem of a PC where asset streaming comes into play - players who are using slower storage won't see issues in a canned benchmark but will start to when in game, and they start traveling around at any speed that's faster than walking. And while that may not seem like it's a big deal now for anyone that's at least not trying to game on spinning rust, as game studios really stretch their legs and start making *real* use of those higher transfer speeds of SSDs (and/or start making use of DirectStorage), benchmarks need to point out when SD cards (hello Steam Deck), hard drives, SATA SSDs and eventually even PCIe Gen 3 SSDs are holding players back. 

 

With all of that said, I don't know how deep this rabbit hole goes. Does one make sure to maintain a manual save and then the script clicks at specific coordinates to load whatever game is in the top spot to go to it (or even finds the right savegame via OCR?) then does a scripted run around of the world? Is there a reliable way to get into a car or similar higher-speed-transport in every open world game you wanna benchmark? Can you really use this in, say, a fighting game or similar where you are put up against an unpredictable enemy AI that creates skewed results? Are there cheat codes or mods that would make this easier? Would said codes or mods "invalidate" results as some kind of not-quite-legit gameplay experience? It's hard to say, depends a lot on a per-game basis, and is probably not something that *I* know how to solve. But having the ability to even just load into a given save just to stand there and either rotate around the environment or even stand still can answer some interesting questions that a game's canned benchmark won't answer. 

 

Edit: While I was thinking this post out and typing it up, I realize a whole discussion about multiplayer was going on. IMO, what I said above can count just for single player scenarios. For many multiplayer scenarios especially within MMOs, I just don't know what one could do to create consistent results. Even standing in a newbie area where, in a mature MMO, should be mostly devoid of other players, is still un-representative of what it's like standing in a city with a highly variable number of players so it's still kind of a crappy baseline to set. We all know that standing in, say, WoW's Dwarf newbie zone (sorry, it's been nearly 2 decades, I forget the name) won't offer the same frame rate as standing in a populated city. Or if there's some game event going on you can't disable that saps FPS, that skews results and makes them not comparable to results taken for a test you did just last week. It's just not reasonable to include games with dynamic events IMO, and that's whether in MMOs or even just "live service" games. But what I lay out above should be reasonable for single player modes. 

Link to comment
Share on other sites

Link to post
Share on other sites

Hey LMG,

 

I wanted to reach out to ask a question that will lead to other questions.


Have you considered allowing access to the data this program is going generate to none LMG parties?

 

Background:

I'm a Data Analyst, sometimes Data Engineer, and occasionally a computer gamer. (Get wrekt consoles!)

Meaning, I LOVE all things data. I believe that with enough data, you can answer just about any question. I would love to see the data that this program collects to be able to run analysis on it and answer questions. 

To give an example:

My marketing team came to me and asked a question. "How effective were our call in programs during a specific period?" To answer this question, I went to our VOIP program, downloaded incoming call data, and built a massive (2 million+ records and counting) dataset. I then filtered the data based on specific phone numbers and time periods and showed results for specific phone numbers. I didn't just provide a simple spreadsheet, I actually built a custom web app using Python, Flask, and Plotly that would allow them to change the display based on their inputs. 

I would love to build something like this with the data you're going to collect. A flexible solution that, depending on the data collected, could allow at a glance comparison between CPUs, GPUs, settings, etc. I think this would be an awesome project to work on to provide the community with real data to make their decisions.

 

Anyway, I'm just dreaming, let me know what you think.

 

Paul

Link to comment
Share on other sites

Link to post
Share on other sites

I'm thinking that there are so many factors in every build affecting performance. Would it be possible to make 2 versions - one that runs in people's existing windows platform, and one that runs in a dedicated boot mode, so as to rule out any system installed software? Might be technically difficult, but might help rule out people's random apps / software / driver issues that slow down performance somehow.

Link to comment
Share on other sites

Link to post
Share on other sites

On 10/13/2022 at 5:19 PM, AdamFromLTT said:

Benchmarking takes a lot of time. Time that we don’t have. And with our renewed focus on bringing more and better data to our reviews, if we want to create a quality video in time for release, something has got to give. So we automated it. Meet the benchmarking software that our lab is developing for use for our reviews and future Labs content. This exact software was used in our Nvidia RTX 4090 review AND AMD Ryzen 7000 series reviews. Take a look!

Give us your feedback and feature requests in the corresponding thread: 

 

Will MarkBench be able to push results top openbenchmark.org so I can import your test results into the phoronix test suite against my machines?

Link to comment
Share on other sites

Link to post
Share on other sites

  • 5 weeks later...

I think FPS is too simplistic a metric for better conclusions it  frame time along with separate CPU latency and GPU latency should be displayed in order to fully understand what is happening and how to "boost" things. 

E.g if one sees CPU latency>GPU latency but not by a lot then maybe he can by a low latency faster ram kit and increase his FPS that way and so on and so forth. 

Link to comment
Share on other sites

Link to post
Share on other sites

Maybe consider having an alternative to userbenchmark in the future?

 

That website is clearly unreliable these days.

MAIN SYSTEM: Intel i9 10850K | 32GB Corsair Vengeance Pro DDR4-3600C16 | RTX 3070 FE | MSI Z490 Gaming Carbon WIFI | Corsair H100i Pro 240mm AIO | 500GB Samsung 850 Evo + 500GB Samsung 970 Evo Plus SSDs | EVGA SuperNova 850 P2 | Fractal Design Meshify C | Razer Cynosa V2 | Corsair Scimitar Elite | Gigabyte G27Q

 

Other Devices: iPhone 12 128GB | Nintendo Switch | Surface Pro 7+ (work device)

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...

on the topic of "cheating" have you considered developpers cheating on the pre-created "benchmark" that you are testing? For example VW made so that the car detected was being "tested". 

Link to comment
Share on other sites

Link to post
Share on other sites

You can game Cinebench a little by changing priority in task manager.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×