MarkBench Development and Feature Requests

Senzelian · October 14, 2022

Can we get Steve to benchmark the benchmark? Seriously, independent reviews of MarkBench would be great!

Shane Morris · October 14, 2022

Alright, here's some feedback, as a software engineer, who does automation, AI, machine learning, interfaces, etc. Usually I don't throw my resume around, but I think this is somewhere I can lend some help. I also have never posted here and created an account just for this.

1. I'm glad you started with Python, and the GUI in Golang is... fine. It's fine. I mean, the GUI has to be something. You probably could have made it easier on yourselves going with something more common (React or Angular), or even something that is written in Python, like Plotly Dash. You're building dashboards with outputs from Grafana, so logically, Plotly Dash makes sense here. It's also free and open source, which is nice. I'd say, since you're still really early in the process, and you're making these harness templates, use Plotly Dash for your outputs, since it already has all the Bootstrap components, HTML components, graphs, charts, etc. There's a lot you can do with it.

2. Good call on the hardware monitoring, this all made sense to me. It has to happen with something.

3. The CSV's are a smart move. I'm not sure what the column values are, but here's an easier way to move those CSV's to Postgres without converting to a binary file, and you don't even need to move it in the cloud at first -- stick it all in one Docker container. You can use Apache Airflow to ETL your CSV's into Postgres, and that becomes your "harness" as you're calling it. The cool part is, you can create a Docker container for every game, which means anyone with Docker (which is anyone) can create a new harness easily. Alternatively, you can load lots of harnesses into this same Docker container, and then allow users to select by game. (If you ever decide to make this open source, which I would encourage.)

4. Here's an easier option for your output, and it will allow you to connect to multiple databases at the same time, while also outputting your visuals into graphics that you can style to your liking: Apache Druid for multi-database ingestion, so you have a unique Postgres DB for each harness/game series, and then use Apache Airflow as your visualizer tool. As a nice perk, Airflow has native PNG output support, which you can style to match the LTT theme for colors, or Short Circuit, or whatever.

... anyway, this is a great idea. I like that y'all are doing this, and if you need any help developing it, I'm happy to offer my expertise. You started with some good ideas, and you're definitely close to where you need to be for the final product. Software is usually something you grow into, especially when you're making something totally customized for a niche need, like testing GPU's, CPU's, RAM, and multiple titles on different synthetic benchmarks.

I'm easy to find on TikTok if anyone from your team wants to reach out. Or you can email me: info at shanemorris dot sucks -- happy to help if you get stuck on anything.

Terryv · October 14, 2022

If there is a Linux version (and I very much hope there will be)

I'd like to suggest a flatpak release. This would essentially make a universal package for all Linux variations.

mariushm · October 14, 2022

My thoughts on this ...

Any extra application running in the background on the benchmark system has the potential to skew the numbers.

We don't or probably won't have access to the whole source code of this application, so it's an unknown, we don't know how well it's written and/or how buggy it is and how much resources it uses. We don't know how much resources it uses, how much it trashes the l2 and l3 caches of a cpu, we don't know how well optimized that "capture" feature is, does that feature slow down rendering or not...

Kudos for using Golang and I guess Python is OK but Python is an interpreted language and Go is nice but it has periodic garbage collection so it may from time to time use some cpu to garbage collect, like a .net application. It's probably fine, but it's not nothing.

Ideally, the test environment would be a disk image on a system WITHOUT internet access (so that Windows Update won't start in the background to mess things up) and antivirus disabled and configured to have some application start up at boot.

When the application starts, it checks a USB stick or an external usb drive for a test profile or something, and resumes the test process. For example, reads that next game that's supposed to be tested is Shadow of Tomb raider, then it copies the game to the disk, maybe wait a minute or two for the SSD to finish moving the data from pseudo-SLC to MLC/TLC memory (to guarantee all benchmarks run will get the same disk read speed) and then run the game and log the results. Or, run the game always from the same storage only (read only) drive, with all the test games and applications preloaded.

Then, delete the game from the drive (if only the game is loaded), take the results and dump them to the USB stick and update the profile on usb so that after a reboot the benchmark utility will resume with the next game in the queue.

Then reboot the machine, because the previous game has allocated memory, windows has cached portions of the game in memory, running another game right after the previous one may result in Windows dumping stuff from memory and caching other stuff, if you don't always test the games in the same order there may be infinitely small variations between tests.

Also, with SSDs the more data you dump on the SSD, the less pseudo-SLC cache you'll have, so not a good idea to install game, test, install another game, filling the drive the games are tested on.... either have ALL the games preloaded on a big drive that's only for reading (no writes on it), or you'll want to maintain the same amount of free disk space each time you test a game, by installing game , then deleting it before you install next game.

Shane Morris · October 14, 2022

10 minutes ago, Terryv said:

If there is a Linux version (and I very much hope there will be)

I'd like to suggest a flatpak release. This would essentially make a universal package for all Linux variations.

This was my reasoning by putting it in a Docker container. You can run Docker on Windows, Mac and Linux. You don't need the GUI elements in Docker, since those could be hosted elsewhere. I was just thinking all the scripting elements would be easy to button up in Docker, and the CSV output could move elsewhere. I'm a Linux user for my daily driver, but use a Mac for work, and Windows for fun.

houtarou123 · October 14, 2022

Add some often used productivity apps to test and compare it with other apps that is most likely will work if you got an entry level rig where some offices has and cares productivity to the minimum cost.

Salmonwrath · October 14, 2022

I think after reading many of the suggestions and watching the video, one concern cropped up that was addressed in the video on the manual test side, but not much on the automated side, exception/trend testing.

What I mean, as I may not be choosing the correct verbiage here; is if the data output from a particular test(s) is exceptional (e.g. sudden 251% performance improvement over previous generation H/W in 1 game due to the game not actually changing its settings, exceptional) but also way out of trend/scope of expectations of the part. This is data output we don't necessarily want to automate, and an exception to the testing.

Could Markbench retest specific benchmarks/games if a given output is far outside a given scope or trend the H/W is presenting?
Could it simply flag that particular output data for the user to check and validate if it flagged a problem or problems during testing?
- As Another way to handle this, would be a spot check, individual retest on the user's part after checking the test output for error. Which again I could see as something of a feature in Markbench, being able to checkbox or drag/drop the specific testing you want to accomplish or re-accomplish to correct or validate potentially egregious data. The re-accomplishing would also require a means to link the new data into the previous dataset, to make edits, validation.

nharter42 · October 14, 2022

Hey guys, this is awesome. A couple of ideas would be to do Linux runs. I am trying to separate myself from Windows since I do not like the way they are going. If you could provide Linux tests you could compare to Windows ones. Will we get better results on Linux? and... ark will you take the lead because of your opendriver's? In the next few years?

The other idea I had would be to have AI test since that will become a thing for users and gamers before too long.

Player4595 · October 14, 2022

The possibility to change (Graphic) Settings automatically (best by switching the ini file probably)

(optional) The possibility to change overclocking settings (probably not easy to do might be something for the future)

Recovery from system crash (bluescreen usw) so you can check on the how many runs the system crashed.

Linux support

El Xando · October 14, 2022

15 hours ago, Justin04 said:

Great tool! Will it compare your results with other users like Userbenchmark?

I would also love this feature, I hate using Userbenchmark because of their clear bias, but the very quick results to see that your components are running as they should be is very useful. I've found it difficult in the past to find what sort of range a certain GPU should score in 3DMark, and Userbenchmark does every component.

Krumper · October 14, 2022

feature request 1: publicly available database of user benchmark results (and ofc a way to upload them directly from within the tool itself)

feature request 2: a framework that allows the creation and use of third-party harnesses

feature request 3: command line usage of the program (if this isn't already possible, which it should be)

looking forward to trying it out whenever it gets released. i strongly recommend you open-source it as soon as possible after release, which should be possible even if you intend to sell licenses for commerical use (via multi-licensing).

Tol · October 14, 2022

I dont know if there are any but a benchmark for VR-Aplications would be really appreciated. But from the looks of it it will be the great tool we all needed! Props to the devs

thexder1 · October 14, 2022

I have worked on similar things. The one piece that I envisioned for this type of thing that was not mentioned, though not necessarily something home users would likely use, is I would setup a PXE boot system to automate fresh windows installs, or more likely restores of a pre-configured image, and automated driver installs and such to start with a clean system after each run, or as often as configured. The idea being you have configuration in a database where you select what test bench is being used, what tests are to be run on it, and what products are being tested. Then you hook up the bench. install the first product, turn the test bench on and the automation takes it from there, until it is done, then it shuts down and is ready for the next one. I was considering working on that myself, and probably will still do it when I have a test bench to work with on this.

Annix_ · October 14, 2022

I’ve been doing a lot of low power / NAS builds lately, and with the price of energy going to the moon, I’m always hunting for a way to get real world usable data on power draw.

Obviously there are inherent flaws with (existing) software solutions for this, but if there were some way for Markbench to accept data from a wallplug / hardware power draw monitor - my god that would be incredible. I’m not sure if the hardware for this already exists (specifically with the ability to output data over USB or something) at a consumer level, so that’s certainly a….thing to look in to.

I do know however that having a standardised and centralised database of real world system power draw stats would be of massive benefit to the community. A growing number of enthusiasts / professionals who are conscious of their power draw would have an important new metric when choosing their components, whether that’s in the context of big beefy gaming rigs, or a network appliance / home server build.

DOG_EGG · October 14, 2022

17 hours ago, AdamFromLTT said:

Presenting... MarkBench!

This is the home of MarkBench discussion for the time being! While this post is relatively empty right now, you can look forward to occasional development updates, answers to common questions and (eventually) a link to the software!

What the heck is MarkBench?
Wouldn't you like to know? And wouldn't we like to tell you! Check out our announcement video!

When will it be released?

When it's ready!

Is it free?

When it launches, MarkBench will be completely free.

Can I contribute?

In the future? Maybe! In the now? Development will be done in-house. BUT, you can contribute by providing feedback and feature suggestions in this thread!

We want to hear from you! What do you think of MarkBench? What features would you like to see?

can you add a score so and of somthing errorers it tryes to find to fix for it

JurandM · October 14, 2022

Considering, you guys test workstation card over dummy games (3090, 3090TI, and now 4090) I suggest adding more real life things

Image processing: Topaz product and Lightroom
Video encoding and interpolation: handbrake, FrameFlow, Premiere pro and or DaVinci
3D computing: Photogrammetry like Reality Capture, texture computing - Substance Designate
Rendering: Cycles (blender), V-RAY, Unreal Engine/Twinmotion (really important this days)
AI: perhaps Stable Diffusion

most of the options named above are single time purchase or free or provide free option with limitation like watermark.

Please check that video for point of reference what professional industry expect to see in review of 4090 (instead silly games ) - more likely that pro purchase 4090 than gamer.

Cheers and good luck - if indeed such tool will be released, or at least you going to provide more data than just games and little few things, it will be gain for everybody

(and yes, I was really salty when noticed that so many channels shows just game results)

Edited October 14, 2022 by JurandM
add explanation why i call "game" tests - silly

Omni-Owl · October 14, 2022

A missed opportunity to call it "ColtonReplacer".

Jokes aside, it's an interesting time at LMG/LTT with this beast of a project. Open Sourcing it later would be a really good idea.

Ghostkwebb · October 14, 2022

19 hours ago, AdamFromLTT said:

Presenting... MarkBench!

This is the home of MarkBench discussion for the time being! While this post is relatively empty right now, you can look forward to occasional development updates, answers to common questions and (eventually) a link to the software!

What the heck is MarkBench?
Wouldn't you like to know? And wouldn't we like to tell you! Check out our announcement video!

When will it be released?

When it's ready!

Is it free?

When it launches, MarkBench will be completely free.

Can I contribute?

In the future? Maybe! In the now? Development will be done in-house. BUT, you can contribute by providing feedback and feature suggestions in this thread!

We want to hear from you! What do you think of MarkBench? What features would you like to see?

Release it on the mac!!

Slippery Anvil · October 14, 2022

Have you guys also thought of recording an xperf trace on each game that's benchmarked? Might be good if you want some really granular data for CPU testing.

LTTMobileJake · October 14, 2022

18 hours ago, sergiogd112 said:

It might be a good idea to include non-gaming related test. Such as Blender rendering, Ai training, encoding and decoding, etc. That way you may be able to test other components, such as CPUs

Heya!

We have already included a number of non-gaming workloads such as WAV-FLAC encoding, π calculations (using Y-cruncher), and 7zip compression/decompression!

Thanks for the feedback!

LTTMobileJake · October 14, 2022

1 hour ago, thexder1 said:

I have worked on similar things. The one piece that I envisioned for this type of thing that was not mentioned, though not necessarily something home users would likely use, is I would setup a PXE boot system to automate fresh windows installs, or more likely restores of a pre-configured image, and automated driver installs and such to start with a clean system after each run, or as often as configured. The idea being you have configuration in a database where you select what test bench is being used, what tests are to be run on it, and what products are being tested. Then you hook up the bench. install the first product, turn the test bench on and the automation takes it from there, until it is done, then it shuts down and is ready for the next one. I was considering working on that myself, and probably will still do it when I have a test bench to work with on this.

PXE image installation is definitely on the roadmap to better manage benches

LTTMobileJake · October 14, 2022

5 hours ago, Shane Morris said:

Alright, here's some feedback, as a software engineer, who does automation, AI, machine learning, interfaces, etc. Usually I don't throw my resume around, but I think this is somewhere I can lend some help. I also have never posted here and created an account just for this.

1. I'm glad you started with Python, and the GUI in Golang is... fine. It's fine. I mean, the GUI has to be something. You probably could have made it easier on yourselves going with something more common (React or Angular), or even something that is written in Python, like Plotly Dash. You're building dashboards with outputs from Grafana, so logically, Plotly Dash makes sense here. It's also free and open source, which is nice. I'd say, since you're still really early in the process, and you're making these harness templates, use Plotly Dash for your outputs, since it already has all the Bootstrap components, HTML components, graphs, charts, etc. There's a lot you can do with it.

2. Good call on the hardware monitoring, this all made sense to me. It has to happen with something.

3. The CSV's are a smart move. I'm not sure what the column values are, but here's an easier way to move those CSV's to Postgres without converting to a binary file, and you don't even need to move it in the cloud at first -- stick it all in one Docker container. You can use Apache Airflow to ETL your CSV's into Postgres, and that becomes your "harness" as you're calling it. The cool part is, you can create a Docker container for every game, which means anyone with Docker (which is anyone) can create a new harness easily. Alternatively, you can load lots of harnesses into this same Docker container, and then allow users to select by game. (If you ever decide to make this open source, which I would encourage.)

4. Here's an easier option for your output, and it will allow you to connect to multiple databases at the same time, while also outputting your visuals into graphics that you can style to your liking: Apache Druid for multi-database ingestion, so you have a unique Postgres DB for each harness/game series, and then use Apache Airflow as your visualizer tool. As a nice perk, Airflow has native PNG output support, which you can style to match the LTT theme for colors, or Short Circuit, or whatever.

... anyway, this is a great idea. I like that y'all are doing this, and if you need any help developing it, I'm happy to offer my expertise. You started with some good ideas, and you're definitely close to where you need to be for the final product. Software is usually something you grow into, especially when you're making something totally customized for a niche need, like testing GPU's, CPU's, RAM, and multiple titles on different synthetic benchmarks.

I'm easy to find on TikTok if anyone from your team wants to reach out. Or you can email me: info at shanemorris dot sucks -- happy to help if you get stuck on anything.

Hey Shane!

Thanks so much for the feedback (from one engineer to another)!

The data ingest server currently receives data from more than just MarkBench (there are other tools that we haven't announced yet ), and using protobuffers for data transfer made the most sense to minimise processing of multiple data types.

We are hoping to transition from Python eventually, and because our other tools also use Go for the GUI and underlying logic, it's possible that we transition purely to Go in the future.

LostS77 · October 14, 2022

Love this idea. Something I was thinking about the other day. Could/would you all make it so that your results be captured to a DB of some kind and allow people to pull export of the data via xml or something? So that we could possible make our own reports and stuff, especially for some of us who might be working on learning how to do reporting in say PowerBI and would love to have a non-sensitive source of data we can play around with and show others how we can create reports and stuff. Just a nicety for those of us who like to crunch numbers and stats and stuff.

LostS77

tkitch · October 14, 2022

19 hours ago, RTX 3071 said:

There should be user-made scripts/configs for games that aren't officially in the app. That was the first thing which got in my mind.

Another thing which would be cool is a dotted FPS map (I don't know what these are called in English, I mean the maps which use dots to show stuff) which will show the average, 1% and 0.1% FPS on the same picture. It'd make the jumps and stutters more noticeable than a line graph, especially their frequencies. Python already has good libraries for that so shouldn't be a lot of workload as well.

This one might not be helpful for everyone but what about a local server which will use more than 1 machine for the same tests at the same time? For example imagine if 6 PCs are connected to the local server, they'll run the benchmarks and once they are complete their datas will be in the same dataset. Tho I'm not sure how professional this approach is when there's the option of data merging but I'm sure labs team knows it better

Edit: Overclock profile testing with connecting to MSI afterburner would be great as well. For example it can test all of my 5 profiles and output the results.

THis is questionable due to reasons that LTT can't control:

Two different CPUs of the same SKU (IE: 2x 12900K's) will score a bit differently, just due to silicon lottery, so different GPUs may perform differently on one platform vs the other.

So running multiple benches is an iffy solution.

Meliketoast21 · October 14, 2022

UE5 is pretty important and also get nuke. Maya and cinema 4d. Also obs and streaming and recording stuff can be useful.

Sign In

MarkBench Development and Feature Requests

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites