suggestion Improving LTT testing methodology

ChalkChalkson · September 21, 2018

Hey there,

In the last year or so we have seen many improvements to the LTT testing methodology. Many of them were probably due to input by the new writers @LMG Ivan, @AlexTheGreatish and @GabenJr , but I am sure just having more man-hours available per review also helps

There are a few things though that could still be improved, some wouldn't add much effort, some a lot, some would straight up increase accuracy, some would make the data more applicable. If you, the reader, have any ideas of your own or want to tell me that I am stupid and my ideas wouldn't work, please comment, I hate it when the internet is just a bunch of people screaming into an endless abyss

The following list is just what came to my mind in the last few months since the forum was decoupled from floatplane (this reduced engagement in the comments so freaking much ).

1: Power Draw (please adapt this, if anything)

2: Temperature (nice to have, especially when comparing intel to AMD chips)

3: Copute/Science Benchmarks (nice to have, but only applicable for few chips)

4: Low End Benchmarks (not 100% serious, but could actually be great)

1: Power draw

This one is pretty simple: measuring power as drawn from the PSU is hopelessly inaccurate, and the first method adds nearly no work:

Current-clamp
Additional work: 10s in windows calc
How to: Just put a current clamp around the EPS or PCIe power cables and multiply by your PSUs 12V rail voltage (usually 11.9 to 12.1V) and boom.
Accuracy: over by ~10% for consumer CPUs with decent VRMs, more inaccurate for GPUs, over by ~10% again and under by up to 75W, so kinda bad.
Cost: Around 75bucks if you want a decent current clamp
Would I recommend it? Yes, definitely.
Current-clamp + correction
Additional work: lots of maths and look stuff up, I'd guess an hour per board.
How to: Check the data-sheets of the controller and MOSFETs for efficiency (well, you can at least find the switching loss and the efficiency, but that maths shouldn't be too hard). For GPUs, it's also good to know whether they use the PCIe slot for the GPU, or just the other components, if the latter, perfect, your measurement just got more accurate with less work. Alternatively just use boards for which the VRM losses are known (I think Buildzoid includes them in his PCB breakdowns, not sure though).
Accuracy: My guess is that you should be able to get it down to a +/- 5W interval for pretty much every board/chip
Cost: Maybe 30 bucks in employee time?
Would I recommend it? No. This is way too much effort.

2: CPU/GPU Temperature

Software tools are pretty good these days, but you can improve upon them significantly with decent thermocouples.

Additional work: installing the thermocouple (~30min) once per board, plus 2min to get a multimeter when you start testing

How to: on the back of the socket, you will find several VCore connection points (eg capacitors). Attach your couple to that point (ideally with liquid metal or something similar) and hold it in place with silicone, plastydip, or whatever you like. This gives you a connection directly to the silicon with ~200W/mK over the entire distance.

Accuracy: you should be under by like 2K tops, probably significantly less

Cost: Less than 10 bucks for a decent thermocouple

3: Benchmarks for Compute/Sience Hardware:

Since you guys are the Top Gear of tech and that is where the insane hardware lives now, no wonder you have been testing this hardware in the last months. But I am sure you will agree that your reviews didn't always test what the things are actually made for. You do include some decent synthetic workloads since I think the Titan V review (?) but there aren't too many science/compute real world tests you are doing (I recall one video where you called in a scientist though...). Here are a few things I'd suggest (GPU and CPU loads are mixed here):

Neural Net training time
Get some LSTM Tensor Flow example code and measure training time for a given data set. Tensor Flow is widely used in the field, so this would be very applicable and means there is a ton of really good example code.
Linear Equations
In theoretical physics, we often reduce problems to large linear equation systems with sparse matrices. The Pardiso Solver is decently common and very easy to use. If you want to go deep, try using different types of matrices.
Database performance
Probably don't need to explain this one, I suggest SQL and generating a huge database first (filled with dummy accounts each having an RSA key if possible) then run some selects.
JuPyTer
JuPyTer is awesome for science and thus used a lot. Luckily for you there is this awesome resource of actually useful JuPyTer notebooks you can use for benchmarking. And if all universities are operating like mine, you'd be suprised how much of that exact code is running on scientists computers (or uni compute servers) right now

I would be really happy if you could include some of those benchmarks in your reviews, not only the high end compute stuff, but also higher end consumer stuff, for me and some people I know, stuff like that actually plays a big role in purchasing decisions and currently all we have to go by is guesswork.

I am (as you can probably tell) decently passionate about getting better benchmarks for those kinds of workloads into videos about that kind of hardware (the Xeon Phi video(s) made me a bit sad), so I would be willing to create some benchmarks for you guys (including the forum people), but only if anyone actually cares. For my stuff I just run whaever projects I am currently working on, so I always have a 100% applicable benchmark to hand, but I think I am not the only one who cares. What I'd make would probably come down to a bunch of example code in a package (maybe some of my projects, but I am 70% sure I can't use the two coolest ones because of uni IP stuff) with nothing added to each one but some code to measure and output the runtime. I am sure there are people at LMG who could do this just as well though.

4: Benchmarking Low End Devices

When you are testing low end devices, you tend to focus at low end games (which is understandable) but I think you are missing a significant chunk of what people are actually doing with those things. Here are some really dumb ideas (that acually mean something to lay people and are fun for enthusiasts):

Google Chrome
Create a folder of random, downloaded webpages you open one by one, until the first one gets unloaded
Power Point
Add rectangles to a slide until lag is noticable
Photoshop
The smudge tool is notorious to cause people on laptops headaches, maybe measure the largest size smudge tool that can perform in real time (standardized background of course)
Excel
On a large worksheet with tons of formulas, how long does the update take? (Maybe use one of @AlexTheGreatish 's old spreadsheets? I recall you mentioned one once)

Spotty · September 21, 2018

This is like asking The Grand Tour to do fuel economy tests on a Kia.

This level of information simply won't appeal to the majority of ltt viewers who are looking more for tech based entertainment videos rather than detailed technical information. There are other channels out there which have a more detailed and thorough testing methodology that provide a lot more information. Check out YouTube channels such as Gamers Nexus and Level1Techs for more detailed and informative content.

Sfekke · September 21, 2018

15 minutes ago, Spotty said:

This is like asking The Grand Tour to do fuel economy tests on a Kia.

This level of information simply won't appeal to the majority of ltt viewers who are looking more for tech based entertainment videos rather than detailed technical information. There are other channels out there which have a more detailed and thorough testing methodology that provide a lot more information. Check out YouTube channels such as Gamers Nexus and Level1Techs for more detailed and informative content.

I'd have to agree with Spotty here.

While I love to see such detailed information Steve (GamersNexus) is where I go for that.

Wasn't the whole point of LTT for Linus to explain things in such a way we'd all understand?

Great idea but not for LTT imho.

ChalkChalkson · September 21, 2018

16 minutes ago, Spotty said:

This is like asking The Grand Tour to do fuel economy tests on a Kia.

Well, I'd say it is more akin to asking them to use a proper race track instead of the ebola-drome when they test cars

16 minutes ago, Spotty said:

This level of information simply won't appeal to the majority of ltt viewers who are looking more for tech based entertainment videos rather than detailed technical information.

I am asking them to change their methodology a bit, change/add benchmarks and/or change how they take specific measurements. Not asking them to tell us in great length about the details of the chip they are testing.

16 minutes ago, Spotty said:

There are other channels out there which have a more detailed and thorough testing methodology that provide a lot more information. Check out YouTube channels such as Gamers Nexus and Level1Techs for more detailed and informative content.

Jup, GN has really good tests, especially their thermal test setup is pretty decent, and I think they use current clamps instead of wall power meters. But especially with the latter one there is literally 0 downside to switching from one tool to the other

ChalkChalkson · September 21, 2018

4 minutes ago, Sfekke said:

I'd have to agree with Spotty here.

While I love to see such detailed information Steve (GamersNexus) is where I go for that.

4 minutes ago, Sfekke said:

Wasn't the whole point of LTT for Linus to explain things in such a way we'd all understand?

I listed several suggestions here and I think you guys are targeting mostly 3. 4 for example would be way easier to understand to a layperson than a 7zip and a cinebench score. 1 and 2 ask for no further information, just to change the way they gather the information they give us to reflect reality more accuretly.

4 minutes ago, Sfekke said:

Great idea but not for LTT imho.

Why shouldn't LTT try to make their methodology more solid? Doesn't mean they have to change they way they relay their results. And in the case of 3 I am literally asking them to swap out benchmarks, do you really get a better idea of what a product does when they tell you the LINPACK Gigaflops vs the time it took to solve a specific problem?

AlexTheGreatish · September 21, 2018

3 hours ago, ChalkChalkson said:

~snippidy snip~

1: Power draw

This one is pretty simple: measuring power as drawn from the PSU is hopelessly inaccurate, and the first method adds nearly no work:

Current-clamp
Seems doable
Current-clamp + correction
No. This is way too much effort. <-This

2: CPU/GPU Temperature

We have a thermal couple so I see no reason not to try this, although given the amount of work involved and that I don't entirely trust the thermal couple using the software tools will probably be good enough.

3: Benchmarks for Compute/Sience Hardware:

For GPUs we currently use SpecViewPerf which covers a fair number of professional programs. For CPUs I don't see why we wouldn't try other things if they are free and don't take too long. This is really Anthony's court though since he is the one that does the testing.

4: Benchmarking Low End Devices

Google Chrome
Create a folder of random, downloaded webpages you open one by one, until the first one gets unloaded <- Did something similar to this in the "How Much RAM Do You Need" video. It sucked hard to do that testing and I'm not doing it again.
For the rest we do have a PC Work test that covers the things that you mentioned but we normally only use it for battery life tests

Spotty · September 21, 2018

5 minutes ago, ChalkChalkson said:

I am asking them to change their methodology a bit, change/add benchmarks and/or change how they take specific measurements. Not asking them to tell us in great length about the details of the chip they are testing.

In this context I agree with you 100%. Absolutely nothing wrong with expecting a higher level of quality in their testing methodology.

I just think more in depth testing would not necessarily translate to actually seeing that detailed level of information be presented to the audience in their videos, or at least not in the same meaningful way where the findings are discussed in the detail that some of the aforementioned channels would do. It's just not really in the theme or style of LTTs channels content.

For this reason, it may not be practical for LTT to go through the effort of performing detailed testing, for example, using thermal couples to measure temperature on the VRMs of a GPU, if they have no intention of including the information in the video, or if using simpler and faster software based measurements for temperatures is good enough for their purposes. (And to be honest, if they had any intention of doing a detailed & technical video [as a once off], I'd rather see them fly Steve Burke up for a collab video where Steve does the testing for LTT and in return LTT gives a shout out to their channel... Or whoever it may be that does more detailed content for whichever particular field they are looking at, such as Wendell for Linux related content)

Maybe some time in the distant future we could see a spin off channel from LMG which goes in to deep analysis of products and does extremely thorough testing with more solid methodology. At the end of the normal LTT video Linus could just shout out "And don't forget to check out our other channel where we go in to much more detail on this graphics card and we do a full tear down to the PCB". Might also be interesting to see a bit more behind the scenes on how LTT tests their products as well since LTT could presumably just borrow the final results from that channel and simplify them for the main channel. Not sure how successful such a channel would be, however. For now though, even if they had the time to do the more thorough testing and tear downs, I don't think the writers and editors have time to be producing content for another channel alongside LTT, TechQuickie, TechLinked, and CSF (F to pay respects).

9 minutes ago, ChalkChalkson said:

Jup, GN has really good tests, especially their thermal test setup is pretty decent, and I think they use current clamps instead of wall power meters. But especially with the latter one there is literally 0 downside to switching from one tool to the other

In one of GNs RTX2080/ti videos they were testing power through current delivered over the PCIe power connectors and calculating power usage from that to compare with the power usage from the 10 series cards, and noted that it doesn't account for the power delivered over the PCIe slot from the motherboard. (https://youtu.be/jUM_eINGUl4?t=1361)

GN is also setting up to do detailed PSU testing soon once their new office is in order which is what I'm waiting to see.

ChalkChalkson · September 21, 2018

4 minutes ago, AlexTheGreatish said:

2: CPU/GPU Temperature

We have a thermal couple so I see no reason not to try this, although given the amount of work involved and that I don't entirely trust the thermal couple using the software tools will probably be good enough.

Could be a fun video in an of itself "The Workshop: How good are software tools" with the result probably being very workshop like

4 minutes ago, AlexTheGreatish said:

3: Benchmarks for Compute/Sience Hardware:

For GPUs we currently use SpecViewPerf

It does cover a bunch of professional programs like catia (no words can describe how much I hate catia...) but it doesn't really cover compute or typical science applications, even medical and energy are rendering tests AFAIK. Python, JuPyTer and TensorFlow are free, as are the Pardiso solver and SQL. Quickest way to get the Pardiso running is through mathematica though (since it comes bundled) and that does cost 100 bucks I think, but it is possible to get that runnning without mathematica, too.

4 minutes ago, AlexTheGreatish said:

~~snip~~

But you got to admit, would've been funny to see a video about some cheapo Laptop where Linus says "You can have both a youtube video, a few articles and an HD Netflix stream runnning at the same time, but if you want to play Farmville while watching 4K Netflix/Youtube go for the 8GB model" or "It can handle around 80 graphic elements in powerpoint, so if you are a poweruser you need a more powerful machine"

ChalkChalkson · September 21, 2018

13 minutes ago, Spotty said:

I just think more in depth testing would not necessarily translate to actually seeing that detailed level of information be presented to the audience in their videos

Well accuracy isn't the same thing as detail, the current clamp on the EPS increases accuracy while providing the same amount of detail.

13 minutes ago, Spotty said:

it doesn't account for the power delivered over the PCIe slot from the motherboard.

That's why I said in the Current Clamp + correction thing that you'd need to check what acutally happens to that power, some GPUs just use it for the minor rails. But as I said in the Current Clamp part, this method really insn't that great for GPUs. If you wanted to be super accurate you could use a PCIe riser that gets power via molex or sata and add that to the current clamp (need to measure one voltage at a time of course).

13 minutes ago, Spotty said:

Not sure how successful such a channel would be, however.

You mean GN or Actually Hardcore Overclocking with higher production values? I bet that'd flop hard But if LTT were to add another channel that targeted in depth stuff, I'd hope for a channel that looks at the science of PCs in a TQ style. A lot of stuff to explore there just in physics alone and I don't really see that nieche filled by any other channel.

13 minutes ago, Spotty said:

CSF

F for NickyV

Sfekke · September 21, 2018

2 hours ago, ChalkChalkson said:

Quote

CSF

F for NickyV

Still miss the guy, F!

You brought me some amazing laughs

Emily Young · September 21, 2018

Power draw

Agreed with Alex. We should get a clamp meter
That second part would take quite a while to do… So unless we're specifically looking for that data for something in-depth, probably not.

Thermal testing

We should probably look at getting a new thermal probe, but
I actually am not sure it would be worth the time and effort to do this every time. Possible video fodder?

Compute / Science

Yes
I think if I were to pick one each of GPU and CPU, I'd do TensorFlow and JuPyTer.
The main issue is finding a dataset and methodology that would be applicable and quick to run.
The other issue is figuring out the fastest way to run the test…
Sadly, time is often at a premium, particularly when it comes to new releases. Unlike written publications, we need to have all of our testing and conclusions done within ~2 days of launch in order to give us time to shoot and edit the video.
- I have no idea how Steve Burke does it.
  - Probably putting in ruinously long work days in a big burst up until release. Guy works hard.

Low performance testing

I'll defer to Alex on this one, since he's usually the one testing the low-end devices (laptops).

Spotty · September 21, 2018

1 hour ago, GabenJr said:

I have no idea how Steve Burke does it.

He rests on the seventh day.

Spoiler

Edited September 21, 2018 by Spotty

ChalkChalkson · October 18, 2018

Sorry for the long wait, since FP migrated I don't find myself over here as regularly

On 9/21/2018 at 5:07 PM, GabenJr said:

Compute / Science

Yes

I think if I were to pick one each of GPU and CPU, I'd do TensorFlow and JuPyTer.

The main issue is finding a dataset and methodology that would be applicable and quick to run.

The other issue is figuring out the fastest way to run the test…

Sadly, time is often at a premium, particularly when it comes to new releases. Unlike written publications, we need to have all of our testing and conclusions done within ~2 days of launch in order to give us time to shoot and edit the video.
I have no idea how Steve Burke does it.
Probably putting in ruinously long work days in a big burst up until release. Guy works hard.

Yeah, Steve's testing is almost up to scientific standard... Must take a looong time to do.

Pretty sure JuPyTer and TensorFlow would be as fair as it gets. The page I linked with the useful Notebooks also has examples for those. These aren't really taxing for high end systems, but if you want that, simple Possion solving can do the trick if you make your state space large enough, even a 10000x10000 2D grid with a quadrupol should do the trick for most things. Tensor Flow could use the fantastic MNIST set, which is pretty well known and often used during the first experiements with NN, if you want something closer to state of the art, you could set up a recursive NN that you feed with your YT comments or something (would also be pretty funny, did that once and what it thought a typical comment looks like was a bit depressing)

Sign In

suggestion Improving LTT testing methodology

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I shouldn’t have kept the $1,000,000 computer

Latest From Tech Quickie:

This Guy BUILT His Own Graphics Card!

Latest From TechLinked:

Microsoft, Give Up Already.

Latest From GameLinked:

Roblox and Walmart... Are One

Latest From ShortCircuit:

Dell Has Destroyed the XPS - Dell XPS 16 (2024)

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!