Jump to content

MarkBench Development and Feature Requests

AdamFromLTT
8 minutes ago, ToboRobot said:

A database of system scores so users can compare and validate their system performance.

that would add up with my suggestion it seems:

13 minutes ago, GorujoCY said:

Not gonna lie but everything is looking very promising and I'm all for it, im willing to contribute if a little and especially beta test it or some sort!

My suggestion is adding a way to submit the data online for the community to see, so if LTT for example doesn't wanna test the Intel Arc A380 for example, well maybe another youtuber or an individual does those automation benchmarks and submits them onto the website for users to see, this will generalize the way we see gaming and productivity data while finally not worrying about bloody userbenchmark and their manipulations or something like that but since the data would be community contributed, it really does help to give consumers realistic expectations going forward... 

An optional one would also be trying to automate VR testing too, with what Headset is connected and what way etc. (eg. Direct SteamVR: Valve Index, SteamVR via Oculus Link/Air link/Virtual Desktop: Oculus Quest 2, SteamVR via iVry: phonevr, SteamVR via Oculus:  Oculus Rift S, SteamVR via WMR: HP Reverb G2 and etc. and for "other" headsets: SteamVR via OpenXR: unknown) and I can also see the appeal for VR youtubers to use this if it gets implemented, genuinely guys you got this!


I hope this gets implemented and this is definitely going to be the perfect benchmark application by far!

 

Edited by GorujoCY

Make sure to quote me if you want me to respond
Thanks :)

Turn your Mobile VR or PSVR Headset into a working 6DoF SteamVR one guide/tutorial (below):

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

My PC

 

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, Greyspectre said:

It would be nice if it also stored/uploaded a snapshot of system specs. It would be neat if in the future when the database is more mature if you could filter results via things like CPU. For example what is the performance of this set of tests on these different cards when using a 12900k

This, to quote Linus, "Mother of database" would really profit from different ways to search it. Not only would it be great to search for the performance of a product, imagine searching for the performance of a specific benchmark, a way to search for the results of three cards to directly compare them. 

Link to comment
Share on other sites

Link to post
Share on other sites

I'd love to see leaderboards of people's scores, not only of just global scores but of people with similar or identical specs to your machine, so you know if your PC is running as expected or not.

 

Similarly, if you got a low score on a certain part, for example your CPU, it'll give you guides or common solutions of why you're having said issues, like thermal throttling for example. That way, you're not left wondering after you spent 5 hours running tests what you can do to rectify the score. 

Keep in mind that I am sometimes wrong, so please correct me if you believe this is the case!

 

"The Nvidia Geforce RTX 3050 is brutally underrated"

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, RTX 3071 said:

There should be user-made scripts/configs for games that aren't officially in the app. That was the first thing which got in my mind.

 

Another thing which would be cool is a dotted FPS map (I don't know what these are called in English, I mean the maps which use dots to show stuff) which will show the average, 1% and 0.1% FPS on the same picture. It'd make the jumps and stutters more noticeable than a line graph, especially their frequencies. Python already has good libraries for that so shouldn't be a lot of workload as well.

 

This one might not be helpful for everyone but what about a local server which will use more than 1 machine for the same tests at the same time? For example imagine if 6 PCs are connected to the local server, they'll run the benchmarks and once they are complete their datas will be in the same dataset. Tho I'm not sure how professional this approach is when there's the option of data merging but I'm sure labs team knows it better 🙂

 

Edit: Overclock profile testing with connecting to MSI afterburner would be great as well. For example it can test all of my 5 profiles and output the results.

Would be great if they could handle the unsupported games a bit like modding, where the users can provide the code for the testing and make it public so everyone can use it to test for themselfs and maybe even upload their data 

Link to comment
Share on other sites

Link to post
Share on other sites

Hot stuff, love the ideas and data you can get from this project.

 

Would you test emulation performance, as well as video encoding (especially those settings used by streamers) if possible?

We do what we can, because we must.

Link to comment
Share on other sites

Link to post
Share on other sites

Feature request: Run the same game multiple times with different setting so we can know which affects performance more or help decide the best combo for FPS/Image quality

Link to comment
Share on other sites

Link to post
Share on other sites

I'm sure you all have a good handle on the main features, but I think one not-so-exciting, but valuable feature would be a robust system to flag problematic runs, detect anomalies, and optionally send some kind of alert so the issue can be addressed. These could be issues with the system state (i.e. way less RAM available than expected), high CPU usage in background processes, or even detecting user input during a run that might throw off a standardized set of game actions. Ideally, the tool would be able to ensure with some level of confidence that a given run was "clean" and the result can be trusted. Someone can gauge this on their own by running the tool many times and validating it manually, but that reduces the ability of this to deliver on its goals of saving time and improving consistency.

Link to comment
Share on other sites

Link to post
Share on other sites

As someone who worked on similar systems and also had to use them at a hardware manufacturer, here's some tips/features that'll make everyone's lives better:

  • A proper cli, and NOT an interactive one. You should be able to kick it off just with flags/args and have it run, then exit when finished
  • Ofc the automation for games is gonna be jenk, terrible, and platform specific. Just how it goes. But the tooling around it can be made cross platform, and individual tests should be classes which can be defined for different apps and OSes/arches
  • Rely as much as possible on OS and vendor tools/APIs vs random python/go packages to get metrics. They die, get unmaintained, etc all the time. Unless you plan to pick them up as well
  • Don't try and make metrics in your automation tool. Just collect raw data and let the data people use it in the rawest form
  • Doing things the "intense" way where you have a controller system running something that hooks into a server on the DUT is a lot of work, but rewards you with power cycling abilities, handling crashes, hooking into computer vis tools you're developing to analyze the screen, etc. Others do this, for a reason. Idk how big your team is or what your timelines are like, but long-term this enables a lot more data collection than otherwise available running locally only
  • Devs probably don't get a say, but time spent making a UI is time wasted. A good cli is all you need for automation. It is better, in fact. But since your stuff is also used by a media company and maybe the public idk how much you can afford to ignore that. All I know is I have wasted many days trying to automate proprietary tools who have fancy UIs but not 1000s of people to interact with them every day to run all the tests they're used for, and I always wished their time was better spent

Main Rig: R9 5950X @ PBO, RTX 3090, 64 GB DDR4 3666, InWin 101, Full Hardline Watercooling

Server: R7 1700X @ 4.0 GHz, GTX 1080 Ti, 32GB DDR4 3000, Cooler Master NR200P, Full Soft Watercooling

LAN Rig: R5 3600X @ PBO, RTX 2070, 32 GB DDR4 3200, Dan Case A4-SFV V4, 120mm AIO for the CPU

HTPC: i7-7700K @ 4.6 GHz, GTX 1050 Ti, 16 GB DDR4 3200, AliExpress K39, IS-47K Cooler

Router: R3 2200G @ stock, 4GB DDR4 2400, what are cases, stock cooler
 

I don't have a problem...

Link to comment
Share on other sites

Link to post
Share on other sites

Suggestion - Would it be possible to have the markbench software enable or disable PCI slots?

 

Theoretical workflow would be that someone using a testbench could load up more than one GPU for testing on it, then have markbench enable the PCI slot for the card it's testing, finish that. Then disable that PCI slot and enable the other PCI slot with the other card. You'd be able to get 2 multiple cards tested before having to physically setup another card on the board.

 

Another suggestion - Allow markbench to be handled/controller remotely? Start/Stop benches or specific tests without being at the computer/station. Other than obviously the convenience you'd potentially be able to then use it for testing in a closed chamber/room for heat/sound control? (if that was something that could be of interest)

Link to comment
Share on other sites

Link to post
Share on other sites

Very nice attempt, cant wait to see it shine!
Please include error margins in your data. Presenting data without showing the margin of error is very bad scientific practise, because it hides how reproducible the results actually are. And that ist very important. You will not find any scientific paper which shows data without error.
Grafana is a great visualization tool!

Link to comment
Share on other sites

Link to post
Share on other sites

Food for thought. 

 

Perhaps you have multiple PC's that you want to utilize for benchmarking. If you add a locally hosted web app, you could designate a network with CIDR and the server could list the hosts on that internal network that are available. 

 

Using something like Ansible or Terraform, you can automate the configuration of each PC, ensure Mark is installed, run Mark, collect the information and update the dashboard with a state of each PC. Then you could theoretically walk away and have a centralized place to view the state of all machines, as well as a way to execute different benchmarks.

Link to comment
Share on other sites

Link to post
Share on other sites

Here are some things I would like:

  • Command Line Interface
  • Integration with or into Phoronix Test Suite
  • GPL License (to make it so no other company can take it proprietary)
Link to comment
Share on other sites

Link to post
Share on other sites

Some ideas with different levels of possibility:

 

Make sure you have tests included that don’t require the user to own a specific game. While the Mark Bench software may be free, the games on steam that are required for testing are not. I saw someone else suggest including non-gaming tests (video encoding, AI, etc.) in the software as well.

 

You could potentially have crowd sourced data from individuals performing the Mark Bench tests. This would allow you to have more data on a multitude of different configurations (both by introducing new configurations and by normalizing the existing data). The main issue with this is being able to verify if the results are credible or not (ex: an overclocked system could skew results if you are looking at just base results). If there is a way to have the test automatically check for hardware info and overclocks, then you could effectively do this. (Btw, I am not saying to discard overclocking data, but to partition it from the rest of the data)

With this data, you could probably create predictions for how different configuration would perform. This would be helpful if someone is looking into a configuration that hasn’t been tested yet. With enough data, you should be able to train an AI model for that purpose.
 

Another thing you should do with labs is, while you aren’t doing your own testing, you could take public recommendations for configurations. This might provide more precise insight for people who are looking into building their own system, but don’t know for sure how it will perform. (Same purpose as the AI predictions, but more accurate.)

 

With the data you have, it would also be important to be able to sort by its performance in a specific area. Basic sorting would sort by price, gaming performance, video encoding performance, bang for buck, etc. More advanced would be to sort for best in-category in price range, best bang for buck in-category, etc. This could apply to individual components or configurations as a whole.

 

My last suggestion is to use this data to create a PC configurator that allow you to look at how the PC you are building will perform. This would make it very easy to decide the components you want in your system.

 

These ideas would definitely be a bit farther down the line, but they could all work together to help people make informed decisions.

Link to comment
Share on other sites

Link to post
Share on other sites

also a profile for every user can be useful to see the hardware his using and to help him upgrade his hardware when his results have a specific problem like stutters or crashes

Link to comment
Share on other sites

Link to post
Share on other sites

I'm a serial early adopter of OSS projects in enterprise environments and I strongly suggest at the least exposing your git repo and some build instructions to the public as soon as possible. Getting your tool into the hands of developers, engineers and tinkerers is a great way to improve the product. So long as the builds are reasonably stable don't be bashful about going public with a long list of features not yet implemented. This is coming from a guy who started using Hashicorp Vault in prod at 0.6 and Terraform at 0.10.

 

Avoid feature creep, Mark Bench should be all about running the benchmarks in a reliable and repeatable manner, it should not be provisioning/building infrastructure as well as there are already tools and stacks that do that far better. Having an accessible API so those other tools and stacks can orchestrate Mark Bench would be more valuable than trying to re-invent several wheels.

Link to comment
Share on other sites

Link to post
Share on other sites

It would be cool to have a visual indicator if the software im about to benchmark is known to be cpu or gpu bound. I think this will make MarkBench more accessible for newer people.

Link to comment
Share on other sites

Link to post
Share on other sites

A list of games that can be tested both vanilla and with many mods, as they are games where the community plays with mods normally:

 

Satisfactory, Factorio, Kerbel space program, GTA V, fallout 4, Subnautica, Don't Starve, all dark souls, red dead, metal gear, elder ring, XCOM 2, Stardew Valley, Skyrim, RimWorld, Terraria.

 

Yes there are light games but we have to remember that the moment you add mods the story changes.

 

I'm asking to test with mods because many games have a very active community of modders who do amazing things.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×