nlhans

Member

View Profile See their activity

Posts
13
Joined
May 1, 2013
Last visited
December 18, 2023

Reputation Activity

nlhans got a reaction from GOTSpectrum in What do we do now? August 20, 2023

To be honest unreliable is an imprecise word for a problem.

And if sick days are a substitute for a cause, then thats a self-maintaining situation which is no good for everyone. A good boss should not start a conversation with a threat like 'you call in sick too often', but also ask if there is anything they can help with. There could be a considerable bump to cross, but that does not mean things cannot be resolved.

====

TBH, I view GN's comments are valid, but some things I could not fully agree with.

I think the overall hardware testing comments could be summarized to 1 area: qualification of test data. How to check a cooler is fitted properly. How to check if graphics quality settings are applied in game. How to check if the correct numbers are published in a chart etc.
I hope they can reserve the time to improve their processes/automations on that. Benchmarking should not be about getting the same number every run, and only retesting the outliers. It should be able about repeatable test conditions.

In a vacuum, card A being 300% faster than card B could be just what it is. But if you know that the cards hardware should allow only e.g. 40% faster (which you can't know if its a new architecture), and the videogame is not overallocating VRAM, or the card is throttling, etc. you come to a point where that data needs to be looked at. GN has this data so they are right in pointing out those outliers, but I think doing it only on data is not the way to go. It sets you up for a survivorship or selection bias. It should be about recreating the same test conditions and asserting that these have been met.

This extends to how same-spec parts also can have significant variation. Der8auer did a recent vid on that testing 13 Ryzen 7600s. There was quite a considerable variation. Parallelizing tests on multiple same-spec test benches could be another fault But the opposite is also true: minimizing the margin of error to inifinitesmall (by averaging multiple runs) with a hardware samplepool size of N=1 , is impossible. Averaging out run-to-run data can be like polishing poo, if that is cover up an unknown interferer without first asserting if the statistical expected value of that interferer is zero. Still this is common practice because, well, why exactly?

At this point, I think that benchmarking hardware as an user at "the best" accuracy to be a fallacy, because I don't think anyone will have the resources to test a large population of parts in extremely controlled conditions before a release, unless you're the manufacturer of said hardware.

Sign In

nlhans

Posts

Joined

Last visited

Reputation Activity

My Activity Streams