Jump to content

I’ve wasted my life testing tech. No more!

AdamFromLTT

Love the idea, if I may suggest some other tests to run:

 

  • "esports settings" on popular esports games, serious players would enjoy to know how many thousands of FPS they could get in CSGO when playing in 800x600 resolution with a RTX4090 and a Ryzen 9 7950X . (also, is it even beneficial in any way to get thousands of FPS in a game? maybe you get lower frame time?)

 

  • Some kind of benchmark for music production, I don't know how this could be done but Kontakt with too many high quality virtual instruments can lag your whole PC. The devs basically recommend "buy the CPU with highest Passmark Score that you can afford". There's DAWbench but I think there's still a lot to learn on this field.
Link to comment
Share on other sites

Link to post
Share on other sites

You should definitely open source this! It would be cool to have a "My Hardware Specs" portion of your portfolio on this forum that would automatically upload your hardware specs & performance metrics from Mark Bench.

Link to comment
Share on other sites

Link to post
Share on other sites

Something I've noticed that never really gets touched on in CPU and GPU reviews anymore, is real-world game experiences in multiplayer games with a lot of non-NPC actors on the screen.  From my understanding, having high number of player characters on the screen at once can be taxing in a way that NPCs are not.


There was an interesting look at the Ryzen 5800X versus 5800X3D over on the FFXIV reditt (behind the spoiler below), which looked at the impact based on different game scenarios and came away with some interesting differences in results which suggests that having a very large amount of high speed CPU cache can have a very sizeable impact on framerates (including 1% and 0.1% lows) when more character models are on the screen at once.  The thread talks about how these results seem to translate to other MMO games or games that have a lot of character models on the screen or in the same area at the same time.  

Spoiler

 

 

 

 

 

Testing items like this would be very helpful for myself and I imagine a portion of your audience, since there are more genres than driving games, shooters, and cinematic story games played by many people.  This would be checking off another niche, much like the simulation genres like Civ6 or Stellaris, but it is a genre that is generally ignored by all the reviewers.  I know that MMO games can have benchmarks, but I don't know and my personal experience makes me question how good the benchmarks are at simulating some of the more taxing moments in the game where the game might be stressing both the GPU and CPU simultaneously.

Link to comment
Share on other sites

Link to post
Share on other sites

i hope some of the tests included in markbench are things like 3d mark and maybe 1-2 other benchmarks to have that data point to see

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Iblade said:

above are 3 differtant game styles that you currently do no benchmarking on and i think that this is a mistake not everyone plays games that you do use in your benchmarks.

Having followed since the very beginning, this has always been a problem point for me and only getting worse as time goes on. Not just the lack of game diversity, but also how their focus on particular titles and how they're portrayed, for lack of a better word can come off as gatekeeping. Essentially over time it feels like their view of genres outside of competitive esports are more or less meaningless. Even though I don't think that's ever been their intention.

 

While they do a decent job at including benchmarks of popular titles across the most popular genres, the focus tends to always be around "competitive" titles or titles that make you a "real gamer". By that I mean fast paced esport titles that really benefit from FPS, refresh rate, latency, etc. as it has a more direct impact on player performance. You'll never hear competitive and strategy or RTS in the same sentence in a video. This focus extends across all their content as well and really alienates a large portion of gamers that are looking for results that are not just around large FPS numbers.

 

Now this makes sense for their type of content and demonstrating the performance of a product and the consistent testing of select titles over and over is key in providing meaningful data over products and generations. I understand the reasoning behind this and the challenges of deciding what to include in their content due to time. They cannot cover everything.

 

However, this leads mostly meaningless results for the average person. Most are looking for the best experience, not another 3min deep dive segment on CS:GO and why X can only achieve 200 FPS yet Y can break 400 (albeit still interesting). Whether it's cutting the time in half in waiting for turns in CIV so they can get more gameplay out of their limited time to allowing them to achieve playable frame rates in Teardown with massive explosions or finally able to continue their progress in Factorio without it being a slideshow. These result would be important to many people but because they don't demonstrate performance upfront, they are not deemed important enough to include, hence leading back to my initial statement.

 

I really hope they use this time to take feedback from the community and now have the ability include more popular titles that appeal to a wider audience where the gameplay itself impacts performance.

 

Simulation, strategy, RTS, sandbox, etc. are a large portion of the active genres and only getting more popular.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, mynameisjuan said:

gameplay itself impacts performance

Doesnt help that some canned benchmark in of itself is severely flawed. Division 2 and GTA 5 in particular is a severe offender to this.

 

Anyway, in terms of benchmarking:

Id love to see audio-specific benchmark, even if it would fall outside of markbench. Audio latency can be kind of a mystery meat in gaming and it would be great to see not only a deep dive into it, but also seeing how certain combination of audio devices can make or break in that department. Also, android specific benches would be lovely too.

Press quote to get a response from someone! | Check people's edited posts! | Be specific! | Trans Rights

I am human. I'm scared of the dark, and I get toothaches. My name is Frill. Don't pretend not to see me. I was born from the two of you.

Link to comment
Share on other sites

Link to post
Share on other sites

Host project on github/gitlab/bitbucket etc., so we can also actively participate in its development 

Link to comment
Share on other sites

Link to post
Share on other sites

I probably will have a lot more to say in the future, but for now I'll mostly stick with my thoughts on publishing the "mother of all testing databases". In the screenshots so far and in the video this is shown to be Grafana. Which makes a lot of sense as it is a powerful that can present data in a lot of neat ways. 

 

It also is a tool that (sometimes depending on the source) can wildly misrepresent data if you are not careful. For example it is often combined with Prometheus as an underlying data source in IT environments for monitoring applications. However the way Prometheus works means that averages can be misleading if the dataset is over a too short period of time. It is a bit too much tl;dr to explain in depth and I fully expect the folks in the lab to be aware of this as well. 

 

However when making it a public dataset a lot of people will not be aware of these sorts of caveats. So my suggestion there would be to think about what to expose and how. For example if you are giving grafana dashboards to everyone make sure to also put a big fat explanation (or link to it) next to it that goes into how to read the data. That way you can somewhat reduce changes of people wildly misinterpreting your presented data in internet slapfights. 

 

Having said all that, I am a big fan of using what is already there for exposing and applying all this data. It is a sensible approach that hopefully will allow a lot more than would be possible if you tried to do it all with custom solutions.  

 

In that regard, I am curious to how things are set up in the background. Is the labs team leaning on the floatplane development infrastructure to make sure things they build are maintainable and such?  If not, what is the approach there. To me it seems that a lot of what the labs does is effectively very similar to modern software development and specifically very similar to test automation and performance testing. So I'd expect that internally there could be a lot of benefit from setting up things in a same way. So for example making use of gitlab (self hosted) for the development, hosting and possibly even triggering of MarkBench through pipelines.  Because to keep things consistent you of course want to make sure things are run with similar version of MarkBench and the same configs for games. In a similar sense I can see a benefit in keeping a repository with sets of game configs for various types of benchmarking. In fact, with a little bit of effort I can even see a future where you don't use a GUI or where the GUI doesn't trigger local MarkBench. But, instead triggers a pipeline which fires up the benchmarks on a few test benches in parallel. 

 

As someone who works as a test automation engineer I see a lot of potential here. Needless to say that I am excited about this 😄 In fact, if it wasn't for me being located in Europe I honestly would have considered applying for a ltt job at some point. 

There aren't many subjects that benefit from binary takes on them in a discussion.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

@Linus Tech Tips

Make a custom card switcher using multiple PCIE 5 Risers, a rotary custom 16 pin change over switch (would have a be custom built) with a stepper motor at its core controlled by a usb or external micro controller: a PIC, TEENSY, or Arduino. The program can let the controller know to rotate the switch after the next power down (PORT check), and after that power down is complete the board turns back on the PC (on sw on mb). Its not expensive to do and can be a fun project for many others to use when completed.
 
A rotary style changeover switch would ensure that only one card can be connected at a time.
A stepper motor would allow for precise control of the rotary switch.
It would be able to test cards sequentially. (5 - 10 cards or more depending on the design).
The only wildcard would be to make sure all cards are receiving power from a power supply but that's the easy part.
 
The software automation of reinstalling drivers is not my forte but I'm sure that can be figured out.

20221014_023328.jpg

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, AbesAbes3rd said:

@Linus Tech Tips

Make a custom card switcher using multiple PCIE 5 Risers, a rotary custom 16 pin change over switch (would have a be custom built) with a stepper motor at its core controlled by a usb or external micro controller: a PIC, TEENSY, or Arduino. The program can let the controller know to rotate the switch after the next power down (PORT check), and after that power down is complete the board turns back on the PC (on sw on mb). Its not expensive to do and can be a fun project for many others to use when completed.
 

Oef, with the amount of trouble risers on their own can already cause, I am not sure you'd want to introduce this in the mix. You are introducing a lot of extra contact points that can fail and introduce unpredictable behavior. Not to mention the extra resistance introduced by the extra mechanical connections. 

High speed connections like PCI-E generally don't do well with that. It's the same reason afaik that for USB-C there is no spec for extension cables. They do exist but are not officially usb and often come with a list of limitations in regard to capabilities. 

 

I also feel like this might be a bit of an overengineered thing, as the cards need to be plugged in anyway. So it likely is easier to automate everything but switching cards and just have that be done by someone manually who after switching will hit a button to continue. 

 

Edit: 

To be clear, I do like the idea from a conceptual point of view and think it is a creative approach. 

There aren't many subjects that benefit from binary takes on them in a discussion.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I love the idea and I hope it succeeds.  Transparency is everything.

 

I'd love to see GPU renderers like Redshift and Octane in all GPU tests.  GPU renderers rely heavily on GPUs and less on CPUs to render frames.  Currently, NVIDIA CUDA cards are favored (or required) for most GPU renderers.  That may change in the future as more renderers support AMD cards.

 

Note: Cinebench does not test third-party renderers.

 

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Caedwyr said:

Something I've noticed that never really gets touched on in CPU and GPU reviews anymore, is real-world game experiences in multiplayer games with a lot of non-NPC actors on the screen.  From my understanding, having high number of player characters on the screen at once can be taxing in a way that NPCs are not.


There was an interesting look at the Ryzen 5800X versus 5800X3D over on the FFXIV reditt (behind the spoiler below), which looked at the impact based on different game scenarios and came away with some interesting differences in results which suggests that having a very large amount of high speed CPU cache can have a very sizeable impact on framerates (including 1% and 0.1% lows) when more character models are on the screen at once.  The thread talks about how these results seem to translate to other MMO games or games that have a lot of character models on the screen or in the same area at the same time.  

  Reveal hidden contents

 

 

 

 

 

Testing items like this would be very helpful for myself and I imagine a portion of your audience, since there are more genres than driving games, shooters, and cinematic story games played by many people.  This would be checking off another niche, much like the simulation genres like Civ6 or Stellaris, but it is a genre that is generally ignored by all the reviewers.  I know that MMO games can have benchmarks, but I don't know and my personal experience makes me question how good the benchmarks are at simulating some of the more taxing moments in the game where the game might be stressing both the GPU and CPU simultaneously.

 

This is already a well known phenomenon and can be demonstrated in single players like Assassin's Creed, Horizon Zero Dawn and Shadow of the Tomb Raider - all of them have parts of the benchmark that go through populated areas precisely to show this behaviour you are describing. Definitely a challenge for automated testing where you have scenarios where a benchmark might be 2% faster w/ a specific cpu overall, but that improvement isn't a uniform increase and is concentrated in a specific section of the benchmark and thus communicating this information. Some benchmarks will output cpu & gpu data, whereas just capturing the frametime data won't necessarily pick up this information on its own.

 

Civ 6 (and other simulation type games) have been a standard benchmark used by reviewers for CPU testing for a long time.

 

You are definitely correct that many of the most popular games are multiplayer and that makes it hard to create repeatable scenarios for testing, and seemingly impossible to automate. On the other hand if you can automate all of the other testing scenarios, you save time for whatever manual testing you want to do.

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Caedwyr said:

Something I've noticed that never really gets touched on in CPU and GPU reviews anymore, is real-world game experiences in multiplayer games with a lot of non-NPC actors on the screen.  From my understanding, having high number of player characters on the screen at once can be taxing in a way that NPCs are not.


There was an interesting look at the Ryzen 5800X versus 5800X3D over on the FFXIV reditt (behind the spoiler below), which looked at the impact based on different game scenarios and came away with some interesting differences in results which suggests that having a very large amount of high speed CPU cache can have a very sizeable impact on framerates (including 1% and 0.1% lows) when more character models are on the screen at once.  The thread talks about how these results seem to translate to other MMO games or games that have a lot of character models on the screen or in the same area at the same time.  

next to impossible to test with any kind of consistency. in-game benchmarks are the go to because everything is a known factor. in Multiplayer games there is too much randomness.

 

For example, lets take FFVIX and decide to do your benchmark at Limsa Lominsa. the number of players on screen at any given time could be vastly different between runs and people tele in and out with various different outfits, effects, pets etc, or move to different parts of the city therefore giving inaccurate results and showing favour to the runs that just so happened to have less players around.

🌲🌲🌲

 

 

 

◒ ◒ 

Link to comment
Share on other sites

Link to post
Share on other sites

Surely there's other benchmarking software out there that does this with real games? Why reinvent the wheel!

Link to comment
Share on other sites

Link to post
Share on other sites

Test the training time of AI. Like neural networks, by complexity.

Link to comment
Share on other sites

Link to post
Share on other sites

I understand LLT have an image to maintain but they surely know the reason they do not get review units from the likes of apple has nothing to do with the analytic coverage they have of the products and everything to do with the “gamer brows” my dick is bigger than yours… I did your mum … jokes that ‘slip’ out every video like clockwork. 
 

Linus has even said in WAN shows that he understands when a band opts to not be associated with such content so I’m sure he knows when he considered untouchable by apples media relations teams.  

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, the gamer that is bad said:

Would this run on arm?

GoLang is able to build and run on basically every cpu architecture, and build executable binaries for Linux, Max and Windows, so yeah is should be able to. 

Link to comment
Share on other sites

Link to post
Share on other sites

As a developer and inadvertent dev-ops guy at work this very much pleases me. There's no better feeling than having an automation save you hours of time during the day. And I definitely feel the pain of those small tasks that are just frequent enough to demand your full attention

 

4 hours ago, AbesAbes3rd said:

Make a custom card switcher using multiple PCIE 5 Risers, a rotary custom 16 pin change over switch (would have a be custom built) with a stepper motor at its core controlled by a usb or external micro controller: a PIC, TEENSY, or Arduino. The program can let the controller know to rotate the switch after the next power down (PORT check), and after that power down is complete the board turns back on the PC (on sw on mb). Its not expensive to do and can be a fun project for many others to use when completed.

You know this is a novel and kinda cool idea in a way, points for that. But not really something that makes sense. I mean if you were to do this you'd probably do something like set up a VM with multiple GPUs indirectly attached. Then have some kind of way to automatically swap them around in software, possibly spinning up a new VM with the new drivers already pre-installed. Sure you'd have an overhead with the VM but if the only changing variable is the GPU it'd be good enough

 

However I think the main problem with your idea is that you kinda need to step back a bit. What's the actual goal here? Because I would imagine testing a variety of games across multiple GPUs would be only one of the goals. You'd also want to potentially test multiple CPUs on a variety of games, or multiple RAM speeds, multiple OSes, different BIOS settings. And in any case once you have run the benchmark for a particular config you have that data saved in a database. The main pain point they'd be trying to resolve would be the crunch before the NDA lifts when they're benchmarking one or two products across a suite of games. And the swapping of the GPU would be a very, very small part of that process

Fools think they know everything, experts know they know nothing

Link to comment
Share on other sites

Link to post
Share on other sites

Having it open source would be nice to test hardware on VR and see how it stacks up. Once it’s public you can show users what to upgrade to have a better experience playing at desired settings. 

Link to comment
Share on other sites

Link to post
Share on other sites

Another vote for open source.
I'd love to create harnesses for my favorite games!

Link to comment
Share on other sites

Link to post
Share on other sites

Made an account just for this, great news guys! I love the initiative and would love to contribute. The scope of this project is a lot wider than I would have expected, and I'm assuming you guys are planning to go open source with this - for the simple reason that this will be a pretty heavy chunk to maintain on the long term.

 

The idea of a central/cloud server to gather test data is the way to go here. I'm curious as to the role of the server. Does the server store results and then act as the "source of truth"? Meaning it will generate reports accessible from anywhere? And is the testing itself orchestrated from the server? In case the server manages test instances, you could even allow it to orchestrate a full test street (eg 10 hardware benches on a table) without having to work those stations one by one. You could even go as far as to make the server provide complete system images to fully centralize everything from deployment tot processing results.

 

Having said that, the value of a local instance is important. Most of us will run this on our own system, and in that case any user should be able to run the tests without being bound to a cloud service (as those cost money and require internet access).

 

Good luck with the project! I'm curious to see more of this.

Link to comment
Share on other sites

Link to post
Share on other sites

VR benchmarking would be cool. Also allowing users to search the entire database of benchmarks so they could build a computer that suits their needs.

 

Imagine being able to type in your build and know near exactly what your frames would be, that would be a game changer. Maybe  partner with pcbuilder? 

 

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Arika S said:

next to impossible to test with any kind of consistency. in-game benchmarks are the go to because everything is a known factor. in Multiplayer games there is too much randomness.

 

For example, lets take FFVIX and decide to do your benchmark at Limsa Lominsa. the number of players on screen at any given time could be vastly different between runs and people tele in and out with various different outfits, effects, pets etc, or move to different parts of the city therefore giving inaccurate results and showing favour to the runs that just so happened to have less players around.

I agree, and understand the attraction of the benchmark in being able to produce repeatable results, but I also have experiences that seem to be shared by many others that the benchmark does not seem to stress systems in the same way a highly populated area stresses systems.  I've seen speculation that it is because the loading placed on a system is different for the communication of movement, action, and appearance data of other players (that the game cannot predict) is different from the movement, action, and appearance data that can be predicted for NPCs.  This will obviously vary from game to game depending on how this type of interaction is programmed, but as an example, a friend was able to run the full Endwalker benchmark with a smooth 100+ FPS the whole way through, even in very character model dense scenes with lots of spell effects, while they have dips down to 60 FPS in 24 man encounters that have fewer character models on screen and lower graphical settings than are shown in the most demanding benchmark scenes.

 

Stepping away from the quibbling around the details though, my larger point is that the limited genres covered by basically all reviewers (and because they obviously test the easy areas that can be done within a tight timeframe), means that the reviews have always been substantially less useful if you do not play those types of games.  If you play other types of games, then you are left with trying to figure out which GPU/CPU/other thing performs the best for your particular use case, and there seems to be a number of situations where the best performing component is not necessarily the one that tops the charts in the commonly tested scenarios shown on all the reviews.  Because of the lack of clear testing around all of these though, we the public rarely know what is rumour versus actual results.  It could be that the apparent differences are not real or are caused by other factors, but since there's no testing done, we just don't know.

 

I can also touch on professional situations and note that which CPU/GPU performs best with software like Global Mapper (lidar and spatial data processing/management/conversion), AutoCAD (Drafting), and ArcGIS are areas where people generally have to make guesses on hoping that performance for creator software (all reviews are very heavily biased towards a small subset of professional uses) matches what you can expect for other processing intensive software.  As anyone who has looked at the variable performance in Adobe software, this is not necessarily a good approach to use to predict results for other software.  However, because it is one of the common tests that all the video creator content reviewers perform, the companies have all worked to optimize their results in these types of tests.

 

My big ask, and I know it is big, is to broaden the genres and scenarios that can stress CPU/GPUs.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Caedwyr said:

I also have experiences that seem to be shared by many others that the benchmark does not seem to stress systems in the same way a highly populated area stresses systems. 

Game benchmarks are somewhat problematic as they often aren't the full game but they can be controlled and repeatable. Like in WoW I can't go to a raid and do comparison benchmarks because each run is different. You can do a lot of boss pulls to get some sort of repeatable average but that's not sustainable for more active reviewers. I had to find a controlled environment that gives matching results to actually do accurate WoW combat benchmarks (mass mobs pull in old Karazan). FF14 benchmark does represent the game overall but if you zone into a city with multiple players it does not. And the benchmark on it own does not show FPS or stutter checks etc.

 

And in the case of MMO or similar online games the world state affects the results a lot where as a pre-build benchmark may not even have one. In WoW it sits on single core of the CPU and can quickly bottleneck it (so WoW avoids really big battles and alike). FF14 doesn't really do that big encounters aside of Alliance Raids but still in cities you can get more people and it quickly shows.

mmo.png.b693cc500be027746b176ac534f44592.png

 

This could not be automated without the game having a test server that has custom "players" connected that create a world state but in a fixed, consistent way.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×