MarkBench Development and Feature Requests

05032-Mendicant-Bias · October 14, 2022

Sounds like a solid MVP to start leveraging the expertise at LTT Labs in benchmarking!

I wouldn't be surprised if in the future we'll see UR robots swapping GPUs and automating the benchmarking even further

merspieler · October 14, 2022

Since it has been mentioned, that it's python, it would be awesome if you'd not only run this on windows but also on Linux.

Pax11235 · October 14, 2022

I used to do performance testing for software. I don't think this is anything LTT doesn't already know, but we valued things like:

Test consistently: The goal is to make it so that the ONLY thing that changes is the thing you're testing. As much as possible, anyway. In no particular order:

1. Run each benchmark several times, store all the data, then do things like compute the Standard deviation, but also drop the best and worst runs and then use the remaining data to calculate an average. If you run it 5 times for example, you use the middle 3 to calculate the average.

2. Idleness checking: Checking to see whether the CPU or disk are currently busy before you start benchmarking them. If you don't do this, you could wind up benchmarking the system at the same time that Windows has decided to download updates, clean the disk, index the disk for searching, or any number of other things. (This is also when we'd record all the machine information used to record the results.)

3. Boot from LAN images that don't persist on the machine. This helps ensure you get the same starting point, truly free of any cruft from previous drivers, and so on.

4. When you record the data, record as much as possible about the system it ran on, too. BIOS versions of everything, driver versions of everything, OS versions, everything that you'd need to know during a forensic investigation into "what happened" if you need to go back later. Some people even think you should make the primary key a combination of some of that stuff, but that might be going too far. It should go without saying that you'd record the version/build # of MarkBench with the results. Or Github tag, or whatever. Knowing whether or not Windows is configured to use the VM features of the processor might be useful, or whether it's in "Game Mode" if that's still a thing in Windows 11 which is where I assume you'll be testing this.

5. Validate the machines before each test. This is related to #2. Basically if the machine wasn't idle enough (you get to decide what that means) then abort the test (or wait 1 minute and try again). Don't ever waste time running a benchmark on a machine that is busy, you'll get invalid numbers. This may not apply to hardware benchmarking but: We'd also have some general performance benchmarks that we'd run on each machine, with a known good range for each value. A machine returning a result outside the known good range would be disallowed from being used for testing and flagged for human intervention. This caught failing disks more than once, and stopped us from uploading invalid data.

I suppose if you're going to let us run the benchmark stuff at home you could at least do the idleness checking to make sure I'm not encoding videos or compiling in the background, etc. Recording fine details of the systems involved might let you differentiate from CPUs with the same model number that were released at different times and had updates, etc. in your results DB.

I'd love to be able to get the results in .csv format as well.

azeazezar · October 14, 2022

Use that fancy nvidia click to screen flash tool as well. Wanna know how many ms the response time is each time.

Bryan-10EC · October 14, 2022

When did you expect to make it available for personal use?

riklaunim · October 14, 2022

For comparison charts you could look at relative radar charts, assuming the data is good to compare in such format, like some of my attempts:

The faster test run is 100% and the slower one is a % of that. AMD started to use radars as well. If there is enough data points, the lines have clear separation it can give a quick view on how 2-3 things compare to each other.

GhostRoadieBL · October 14, 2022

potential for SSD testing? Program load time, file transfer time (how accurate is the MB/s on windows? does a different file transfer program work better?), game load times between levels (even just benchmark load time differences can inform if the end score is effected by background file calls)

I'd love to see latency testing for starting programs on different drives, (OS drive vs non-OS drive, sata ssd OS vs NVME OS vs Optane OS drive) most people won't care if an no-cache NVME drive takes 1-2 seconds longer than a with-cache Sata SSD to open a program but when you do it many times a day on multiple computers, those seconds add up.

Specs just don't cover this, IOPS doesn't dictate latency, bandwidth Gen3/4 don't make a huge difference (as far as I can tell trying multiple drives) even silicon lottery may come into play with controller speeds being different per drive. Noone tests this at all.

Iris104 · October 14, 2022

13 hours ago, GorujoCY said:

the problem is that unless your computer hibernates or sleeps during the battery that it runs out, there's no way to submit battery data when it's doing that, you are gonna have to do charging, sure one way to do it gathering from Logs of sort but that isn't an option, maybe MarkBench could detect when the computer slept or hibernated and when the computer is at 100% battery charge then give you the result and hopefully with my suggestion I have below:

to be able to submit that data for users to see...

i think its more easy to just ask the users to set the screen to never turn off and set sleep and hibernate to never in the battery settings on windows. also making sure the battery is at 100% when start benchmarking and not charging the laptop would also be important.

Lazareth · October 15, 2022

First off, thank you all for the countless hours put into this project. I'm excited to see the fruits of your labor. I have some thoughts, but they're more oriented toward the test bench diversity rather than Mark Bench features. I figured it would be silly not to share.

My thoughts:
While I love to see any and all GPUs paired with the fastest CPUs on the market to remove any potential bottlenecks and to better illustrate framerate differences, I wonder if this provides useful information for the vast majority of consumers. Many budget-oriented gamers (likely the majority) choose to allocate CPU budget towards the GPU, and these use-cases are often ignored in GPU reviews. Just how much will a user's CPU choice impact the performance of their shiny new RTX4080 or RX7800? Will a budget-tier CPU blur the performance difference between a 4070 and 4080?

Perhaps I'm off base here, and in that case, please dismiss. However, if this test scenario produced measurable differences that could better inform a consumer's pairing of CPU and GPU, then maybe it's not crazy.

Ya'll are doing amazing work and your relentless transparency is a beacon of ethical prowess rarely found in modern business. Thanks again.

Xatris · October 15, 2022

I would love if MarkBench would benchmark creative applications, the one I would hope for is Resolve. However practically every review I have seen benchmarks resolve with a pretty standard edit. As someone that does a ton of work in Fusion, and does a ton of complex grades, this really does not help me understand how the product will acutely work for my projects. I benchmarked my GPU with the black magic encoder benchmark, so my computer should be capable at rendering around 48fps (I forget what the exact number was). But in reality, my computers Vram gets completely pegged (because of fusion) and renders at less than 0.5 fps. Some sort of benchmark that tests fusion projects would be great.

Caspar Leo · October 15, 2022

Seems like; MarkBench and Present is what you are really after.

Some feedback;

.json is going to be a much richer and portable format over csv, it will allow PNG and screen shots ect, all to be transmitted directly into your data base. That is already fully web compatible. Therefore encapsulating a much wider richer data set for each entry. At the same time the data set is going to be truer to source, with no unnecessary tweeting or compressing. Keep em local or send them up. Push pull or however the DB is going to run. Promethius, Influx Db or PostgreSQL.
Azure or AWS PostgreSQL both perfect fit for .json files. Go with who has best customer service/support for the exact config.
Consider having it (local MarkBench software) packaged as a bootable system image, so that drivers are included and reset to baseline, before every new GPU is set. A few well packaged images would cover 90% of drivers configs. A small image editing tool would do. Updated with drivers as needed. At this level swapping a disk image is quicker than clicking though driver clean and write.
For the Automation of launching and closing games and setting capture points (IMO the biggest variable in the whole goal of trying to save time) Consider a mix of Microsoft Power Automate, Mouse Recorder, Python scripts, Game Launch Options. Going to need sometime to think about this problem because every game benchmarks a bit different. there maybe be at least some need to Schedule & Automatically Run Python Code in real time. Lots of little ends need to meet and be piped into the .json capsule. Again each game will need different set of approaches. this one got me thinking......
Redash is a more complete, easier and team based SQL rooted visualisation tool than Grafana Labs. More collaborative stuff should just happen, with less tinkering.

Just my first impressions based on what you published.

Retyl · October 15, 2022

If you need a beta tester, I have an ASUS ROG STRIX 3090ti.........this program sounds like something that has been needed for a looooooong time!! Props to you guys!

MurphySmash · October 15, 2022

I would love to see comparisons between dlss 1.0, dlss 2.0, fsr 1.0 and fsr 2.0 in multiple games.

dginovker · October 15, 2022

On 10/14/2022 at 9:36 AM, nonae said:

LibreHardwareMonitor seems to be specialized in Windows too. It doesn't support Linux arm (Raspberry Pi) and riscv. If it doesn't work look how btop does it's measurements on Linux.

I don't think they're going to be testing triple A games on Rasberry Pis anytime soon. LibreHardwareMonitor's Linux coverage is quite substantial

3lfk1ng · October 15, 2022

Hello, Drew from Smallformfactor.net here.

I used to compete competitively on Xtremesystem, OC.net, and on HWBOT.net. As a fellow hardware reviewer, I figured it's time I join up and add some input because it would be nice to have some automation.

1. Whatever you do, it needs to work on both Linux and Windows. It doesn't matter if it's Nobara, SteamOS, Manjaro, Pop_OS!, Windows 10, or Windows 11. Any operating system that works on PC hardware should be supported by the benchmarking tool. Like CPU-Z, the installer should collect all of the important system information but unlike CPU-Z, the hardware that it identifies, should be saved (see #4 and #8) to a user's profile for each machine that they have run the suite on.
2. Consistency. All the benchmarks or nothing. Nothing can be altered. Nothing can be changed. Otherwise the scoring won't matter. If you must, feel free to compartmentalize the suite into sections such as Productivity vs Games -each with it's own score/ranking system.
3. HWBot-like scoring/ranking system. It shouldn't be "who has the most money always wins" but it should be "how well is my hardware doing compared to others with the same hardware". If I want to pull out an old Intel 990X or Opteron 165, to see how it does globally, that's cool, but I should also be able to compete with other 990X or 165 owners as well, or take the crown until someone else shows up. This includes knowing how well my AIR setup compares to the same setup on H2O, Chiller, and under LN2? It's not a one size fits all when it comes to benchmarking. I don't envy whoever has to create this tool
4. Eventually there should be a central repository for simple hardware submissions, without benchmarking, similar to the Steam Hardware Survey, but completely agnostic (for people who may or may not use Steam when gaming -those weird Battle.net exclusive people). The games industry deserves to know what kind of hardware is being used, what OS are being used, what resolutions are being used, in addition to the information that Steam provides.
5. Consider working with or partnering with Phoronix. They have a test suite but it could be better... a lot better. Maybe even consider working with other tech-tubers...
6. Like 3DMark, the whole suite should be able to be run at 1080p, 1440p, or 2160p regardless of the monitor being used. This helps to test/find CPU and GPU bottlenecks.
7. It needs to be maintained and supported by a dedicated team.
8. Consider offering helpful opportunities for people to upgrade, perhaps in the form of affiliate links. If they see that their hardware is under performing compared to other users with the same exact hardware and your benchmark suite can help to identify that using a single stick of DDR4-2666 is what's causing the drop in performance, sell them on the idea of how swapping for 2x DDR4-3200 stick would provide a better bang for the buck than A. replacing the GPU or B. buying a new computer. Better yet, this platform could help to identify how ram speeds compare depending on the processor architecture.
9. Rules. In the competitive benchmarking scene people have found ways to use things like Nvidia Control Panel to force certain graphical features to downgrade in order to inflate their scores. Anomalies should be identified and regulated in the upper echelon of competitive benchmarks.
10. Consider having different leagues just like in HWBot. This helps to pit users evenly against users with similar skills/understanding. Bonus points if you makes the leagues seasonal so that users have a reason to return and compete.

I'm sure I could keep going but I've got a head to bed. Hope this helps to some degree and I cannot wait to see what you come up with -as long as all the Steves approve.

Tejeswar · October 15, 2022

Quote

Quote

Quote

Y'know, some of us would not have the games you have access to- I certainly don't have Shadow of the Tomb Raider or other new games, for example. Maybe the ability to streamline the process of creating new benchmarks for those who are that less fortunate (I'm in college and a VERY diligent student) should work wonders. Also, one feature would be to make it easy to read for those who don't use benchmarks as often- or at all- make it easier to understand what the specific stats are for the layman to understand- this way, you can eliminate the advertising of features or stats that aren't as appealing to others who just wanna- Iunno, edit videos? So many issues would be solved if we KNEW what the hell we're seeing and how it impacts our usage of our computer

Besensac · October 15, 2022

Feature request.

Include ultrawide resolutions in the testing.

These resolutions should not be ignored since higher FOV values like on 5120x1440 affect the performance.

A gpu doing good at 4K might not do so good at 5120x1440.

Sapphiree · October 15, 2022

Game testing shouldn't be done in built-in benchmarks, the're usually super unrealistic.

3lfk1ng · October 15, 2022

Normally I would agree but it's even more inconsistant otherwise.
At long as they are use the same built-in benchmarks, at least we get apples to apples.

Sapphiree · October 15, 2022

1 hour ago, 3lfk1ng said:

Normally I would agree but it's even more inconsistant otherwise.
At long as they are use the same built-in benchmarks, at least we get apples to apples.

Make a special tool for testing to then not test correctly anyway. So what's the point? It of course make sense for productivity benchmarks but for games you should always aim at finding the most intense GPU or CPU bound scenario in actual game depending if you are testing GPU or CPU. Hardware Unboxed are doing it this way and have no problem with that.

TiredHead · October 15, 2022

On 10/13/2022 at 11:40 PM, AlphaS said:

Great tool! I really want it for Linux.

Same here, then you can annoy Intel even more by letting an ARC GPU fetch from system RAM instead of VRAM....

So to catch operating system depending differences and bugs...

But also for people who run Linux daily (myself included)

qbits · October 15, 2022

Benchmarks are heavily focused on gaming and don't reflect true workloads that we may require.

In reality Benchmarks should be curated towards specific work loads. When we search for Benchmarks we should only see workloads we are interested in and the Benchmarks should reflect the best performance of hardware within those workloads. So we do not pay more for what we don't use. We don't 'need' a Quantum computer for Microsoft Word.

For example an Animator would be highly interested in 3D rendering of assets. A product designer or engineer may care about the performance of CAD tools.

A software developer wants to know compile times, load times, web service performance and virtualization capabilities.

A game developer wants to know how well game development engine might function like Unity or Unreal Engine during development.

Some questions that remain unanswered by existing Benchmarks is the performance between CUDA and OneAPI

Some newer features rarely Benchmarked is Mixed Reality or Virtual Reality or performance of DirectStorage API support for the newest version of DX12

Edited October 15, 2022 by qbits
Minor edit

demidizer · October 15, 2022

I would love to see some engineering related benchmarks. A CFD and a FEM benchmark would be awesome, and I am sure there are many that would like to see that too

Ergjan.Jaha · October 15, 2022

On 10/14/2022 at 1:32 PM, Shane Morris said:

This was my reasoning by putting it in a Docker container. You can run Docker on Windows, Mac and Linux. You don't need the GUI elements in Docker, since those could be hosted elsewhere. I was just thinking all the scripting elements would be easy to button up in Docker, and the CSV output could move elsewhere. I'm a Linux user for my daily driver, but use a Mac for work, and Windows for fun.

Are you talking about the end-user using docker on the same PC they are running the benchmark ?

matrucious · October 16, 2022

I've only just watched the video, but from the breakdown of how the data is processed to the postgres server, it seems like taking extra steps for no reason? If you're getting the data as a CSV, then converting it to a protobuf, and all of this is happening in python... Why are you not simply posting the data to the postgres via sqlalchemy or something similar? And perhaps some of the sensor data here would be better off stored in something like influxdb instead of objectifying the data, and converting it multiple times. Sure This could add a bit of complexity in the system, but I don't see any other down side than a little work for the devs?
Also.. please make sure it runs on linux, or at least open source it, so I can fix it myself

Sign In

MarkBench Development and Feature Requests

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites