Jump to content

Improving LTT testing methodology

Hey there,

In the last year or so we have seen many improvements to the LTT testing methodology. Many of them were probably due to input by the new writers @LMG Ivan, @AlexTheGreatish and @GabenJr , but I am sure just having more man-hours available per review also helps :P

There are a few things though that could still be improved, some wouldn't add much effort, some a lot, some would straight up increase accuracy, some would make the data more applicable. If you, the reader, have any ideas of your own or want to tell me that I am stupid and my ideas wouldn't work, please comment, I hate it when the internet is just a bunch of people screaming into an endless abyss :D

 

The following list is just what came to my mind in the last few months since the forum was decoupled from floatplane (this reduced engagement in the comments so freaking much :/).

1: Power Draw (please adapt this, if anything)

2: Temperature (nice to have, especially when comparing intel to AMD chips)

3: Copute/Science Benchmarks (nice to have, but only applicable for few chips)

4: Low End Benchmarks (not 100% serious, but could actually be great)

 

1: Power draw

This one is pretty simple: measuring power as drawn from the PSU is hopelessly inaccurate, and the first method adds nearly no work:

  • Current-clamp
    Additional work: 10s in windows calc
    How to: Just put a current clamp around the EPS or PCIe power cables and multiply by your PSUs 12V rail voltage (usually 11.9 to 12.1V) and boom.
    Accuracy: over by ~10% for consumer CPUs with decent VRMs, more inaccurate for GPUs, over by ~10% again and under by up to 75W, so kinda bad.
    Cost: Around 75bucks if you want a decent current clamp
    Would I recommend it? Yes, definitely.
  • Current-clamp + correction
    Additional work: lots of maths and look stuff up, I'd guess an hour per board.
    How to: Check the data-sheets of the controller and MOSFETs for efficiency (well, you can at least find the switching loss and the efficiency, but that maths shouldn't be too hard). For GPUs, it's also good to know whether they use the PCIe slot for the GPU, or just the other components, if the latter, perfect, your measurement just got more accurate with less work. Alternatively just use boards for which the VRM losses are known (I think Buildzoid includes them in his PCB breakdowns, not sure though).
    Accuracy: My guess is that you should be able to get it down to a +/- 5W interval for pretty much every board/chip
    Cost: Maybe 30 bucks in employee time?
    Would I recommend it? No. This is way too much effort.

2: CPU/GPU Temperature

Software tools are pretty good these days, but you can improve upon them significantly with decent thermocouples.

Additional work: installing the thermocouple (~30min) once per board, plus 2min to get a multimeter when you start testing

How to: on the back of the socket, you will find several VCore connection points (eg capacitors). Attach your couple to that point (ideally with liquid metal or something similar) and hold it in place with silicone, plastydip, or whatever you like. This gives you a connection directly to the silicon with ~200W/mK over the entire distance.

Accuracy: you should be under by like 2K tops, probably significantly less

Cost: Less than 10 bucks for a decent thermocouple

 

3: Benchmarks for Compute/Sience Hardware:

Since you guys are the Top Gear of tech and that is where the insane hardware lives now, no wonder you have been testing this hardware in the last months. But I am sure you will agree that your reviews didn't always test what the things are actually made for. You do include some decent synthetic workloads since I think the Titan V review (?) but there aren't too many science/compute real world tests you are doing (I recall one video where you called in a scientist though...). Here are a few things I'd suggest (GPU and CPU loads are mixed here):

  • Neural Net training time
    Get some LSTM Tensor Flow example code and measure training time for a given data set. Tensor Flow is widely used in the field, so this would be very applicable and means there is a ton of really good example code.
  • Linear Equations
    In theoretical physics, we often reduce problems to large linear equation systems with sparse matrices. The Pardiso Solver is decently common and very easy to use. If you want to go deep, try using different types of matrices.
  • Database performance
    Probably don't need to explain this one, I suggest SQL and generating a huge database first (filled with dummy accounts each having an RSA key if possible) then run some selects.
  • JuPyTer
    JuPyTer is awesome for science and thus used a lot. Luckily for you there is this awesome resource of actually useful JuPyTer notebooks you can use for benchmarking. And if all universities are operating like mine, you'd be suprised how much of that exact code is running on scientists computers (or uni compute servers) right now

I would be really happy if you could include some of those benchmarks in your reviews, not only the high end compute stuff, but also higher end consumer stuff, for me and some people I know, stuff like that actually plays a big role in purchasing decisions and currently all we have to go by is guesswork.

 

I am (as you can probably tell) decently passionate about getting better benchmarks for those kinds of workloads into videos about that kind of hardware (the Xeon Phi video(s) made me a bit sad), so I would be willing to create some benchmarks for you guys (including the forum people), but only if anyone actually cares. For my stuff I just run whaever projects I am currently working on, so I always have a 100% applicable benchmark to hand, but I think I am not the only one who cares. What I'd make would probably come down to a bunch of example code in a package (maybe some of my projects, but I am 70% sure I can't use the two coolest ones because of uni IP stuff) with nothing added to each one but some code to measure and output the runtime. I am sure there are people at LMG who could do this just as well though.

 

4: Benchmarking Low End Devices

When you are testing low end  devices, you tend to focus at low end games (which is understandable) but I think you are missing a significant chunk of what people are actually doing with those things. Here are some really dumb ideas (that acually mean something to lay people and are fun for enthusiasts):

  • Google Chrome
    Create a folder of random, downloaded webpages you open one by one, until the first one gets unloaded
  • Power Point
    Add rectangles to a slide until lag is noticable
  • Photoshop
    The smudge tool is notorious to cause people on laptops headaches, maybe measure the largest size smudge tool that can perform in real time (standardized background of course)
  • Excel
    On a large worksheet with tons of formulas, how long does the update take? (Maybe use one of @AlexTheGreatish 's old spreadsheets? I recall you mentioned one once)

 

Link to comment
Share on other sites

Link to post
Share on other sites

This is like asking The Grand Tour to do fuel economy tests on a Kia.

 

This level of information simply won't appeal to the majority of ltt viewers who are looking more for tech based entertainment videos rather than detailed technical information. There are other channels out there which have a more detailed and thorough testing methodology that provide a lot more information. Check out YouTube channels such as Gamers Nexus and Level1Techs for more detailed and informative content.

CPU: Intel i7 6700k  | Motherboard: Gigabyte Z170x Gaming 5 | RAM: 2x16GB 3000MHz Corsair Vengeance LPX | GPU: Gigabyte Aorus GTX 1080ti | PSU: Corsair RM750x (2018) | Case: BeQuiet SilentBase 800 | Cooler: Arctic Freezer 34 eSports | SSD: Samsung 970 Evo 500GB + Samsung 840 500GB + Crucial MX500 2TB | Monitor: Acer Predator XB271HU + Samsung BX2450

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Spotty said:

This is like asking The Grand Tour to do fuel economy tests on a Kia.

 

This level of information simply won't appeal to the majority of ltt viewers who are looking more for tech based entertainment videos rather than detailed technical information. There are other channels out there which have a more detailed and thorough testing methodology that provide a lot more information. Check out YouTube channels such as Gamers Nexus and Level1Techs for more detailed and informative content.

I'd have to agree with Spotty here.

While I love to see such detailed information Steve (GamersNexus) is where I go for that.

Wasn't the whole point of LTT for Linus to explain things in such a way we'd all understand? 

Great idea but not for LTT imho.

When the PC is acting up haunted,

who ya gonna call?
"Monotone voice" : A local computer store.

*Terrible joke I know*

 

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Spotty said:

This is like asking The Grand Tour to do fuel economy tests on a Kia.

Well, I'd say it is more akin to asking them to use a proper race track instead of the ebola-drome when they test cars :)

16 minutes ago, Spotty said:

This level of information simply won't appeal to the majority of ltt viewers who are looking more for tech based entertainment videos rather than detailed technical information. 

I am asking them to change their methodology a bit, change/add benchmarks and/or change how they take specific measurements. Not asking them to tell us in great length about the details of the chip they are testing.

16 minutes ago, Spotty said:

There are other channels out there which have a more detailed and thorough testing methodology that provide a lot more information. Check out YouTube channels such as Gamers Nexus and Level1Techs for more detailed and informative content.

Jup, GN has really good tests, especially their thermal test setup is pretty decent, and I think they use current clamps instead of wall power meters. But especially with the latter one there is literally 0 downside to switching from one tool to the other

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Sfekke said:

I'd have to agree with Spotty here.

While I love to see such detailed information Steve (GamersNexus) is where I go for that.

 

4 minutes ago, Sfekke said:

Wasn't the whole point of LTT for Linus to explain things in such a way we'd all understand? 

I listed several suggestions here and I think you guys are targeting mostly 3. 4 for example would be way easier to understand to a layperson than a 7zip and a cinebench score. 1 and 2 ask for no further information, just to change the way they gather the information they give us to reflect reality more accuretly.

4 minutes ago, Sfekke said:

Great idea but not for LTT imho.

Why shouldn't LTT try to make their methodology more solid? Doesn't mean they have to change they way they relay their results. And in the case of 3 I am literally asking them to swap out benchmarks, do you really get a better idea of what a product does when they tell you the LINPACK Gigaflops vs the time it took to solve a specific problem?

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, ChalkChalkson said:

~snippidy snip~

     

    1: Power draw

    This one is pretty simple: measuring power as drawn from the PSU is hopelessly inaccurate, and the first method adds nearly no work:

    • Current-clamp
      Seems doable
    • Current-clamp + correction
      No. This is way too much effort. <-This

    2: CPU/GPU Temperature

    We have a thermal couple so I see no reason not to try this, although given the amount of work involved and that I don't entirely trust the thermal couple using the software tools will probably be good enough.

     

    3: Benchmarks for Compute/Sience Hardware:

    For GPUs we currently use SpecViewPerf which covers a fair number of professional programs. For CPUs I don't see why we wouldn't try other things if they are free and don't take too long. This is really Anthony's court though since he is the one that does the testing.

     

    4: Benchmarking Low End Devices

    • Google Chrome
      Create a folder of random, downloaded webpages you open one by one, until the first one gets unloaded <- Did something similar to this in the "How Much RAM Do You Need" video.  It sucked hard to do that testing and I'm not doing it again.
    • For the rest we do have a PC Work test that covers the things that you mentioned but we normally only use it for battery life tests
    Link to comment
    Share on other sites

    Link to post
    Share on other sites

    5 minutes ago, ChalkChalkson said:

    I am asking them to change their methodology a bit, change/add benchmarks and/or change how they take specific measurements. Not asking them to tell us in great length about the details of the chip they are testing.

    In this context I agree with you 100%. Absolutely nothing wrong with expecting a higher level of quality in their testing methodology.


    I just think more in depth testing would not necessarily translate to actually seeing that detailed level of information be presented to the audience in their videos, or at least not in the same meaningful way where the findings are discussed in the detail that some of the aforementioned channels would do. It's just not really in the theme or style of LTTs channels content.

    For this reason, it may not be practical for LTT to go through the effort of performing detailed testing, for example, using thermal couples to measure temperature on the VRMs of a GPU, if they have no intention of including the information in the video, or if using simpler and faster software based measurements for temperatures is good enough for their purposes. (And to be honest, if they had any intention of doing a detailed & technical video [as a once off], I'd rather see them fly Steve Burke up for a collab video where Steve does the testing for LTT and in return LTT gives a shout out to their channel... Or whoever it may be that does more detailed content for whichever particular field they are looking at, such as Wendell for Linux related content)

     

    Maybe some time in the distant future we could see a spin off channel from LMG which goes in to deep analysis of products and does extremely thorough testing with more solid methodology. At the end of the normal LTT video Linus could just shout out "And don't forget to check out our other channel where we go in to much more detail on this graphics card and we do a full tear down to the PCB". Might also be interesting to see a bit more behind the scenes on how LTT tests their products as well since LTT could presumably just borrow the final results from that channel and simplify them for the main channel. Not sure how successful such a channel would be, however. For now though, even if they had the time to do the more thorough testing and tear downs, I don't think the writers and editors have time to be producing content for another channel alongside LTT, TechQuickie, TechLinked, and CSF (F to pay respects).

     

    9 minutes ago, ChalkChalkson said:

    Jup, GN has really good tests, especially their thermal test setup is pretty decent, and I think they use current clamps instead of wall power meters. But especially with the latter one there is literally 0 downside to switching from one tool to the other

    In one of GNs RTX2080/ti videos they were testing power through current delivered over the PCIe power connectors and calculating power usage from that to compare with the power usage from the 10 series cards, and noted that it doesn't account for the power delivered over the PCIe slot from the motherboard. (https://youtu.be/jUM_eINGUl4?t=1361)

    GN is also setting up to do detailed PSU testing soon once their new office is in order which is what I'm waiting to see.

     

    CPU: Intel i7 6700k  | Motherboard: Gigabyte Z170x Gaming 5 | RAM: 2x16GB 3000MHz Corsair Vengeance LPX | GPU: Gigabyte Aorus GTX 1080ti | PSU: Corsair RM750x (2018) | Case: BeQuiet SilentBase 800 | Cooler: Arctic Freezer 34 eSports | SSD: Samsung 970 Evo 500GB + Samsung 840 500GB + Crucial MX500 2TB | Monitor: Acer Predator XB271HU + Samsung BX2450

    Link to comment
    Share on other sites

    Link to post
    Share on other sites

    4 minutes ago, AlexTheGreatish said:

    2: CPU/GPU Temperature

    We have a thermal couple so I see no reason not to try this, although given the amount of work involved and that I don't entirely trust the thermal couple using the software tools will probably be good enough.

    Could be a fun video in an of itself "The Workshop: How good are software tools" with the result probably being very workshop like :D

     

    4 minutes ago, AlexTheGreatish said:

    3: Benchmarks for Compute/Sience Hardware:

    For GPUs we currently use SpecViewPerf

    It does cover a bunch of professional programs like catia (no words can describe how much I hate catia...) but it doesn't really cover compute or typical science applications, even medical and energy are rendering tests AFAIK. Python, JuPyTer and TensorFlow are free, as are the Pardiso solver and SQL. Quickest way to get the Pardiso running is through mathematica though (since it comes bundled) and that does cost 100 bucks I think, but it is possible to get that runnning without mathematica, too.

     

    4 minutes ago, AlexTheGreatish said:

    snip

    But you got to admit, would've been funny to see a video about some cheapo Laptop where Linus says "You can have both a youtube video, a few articles and an HD Netflix stream runnning at the same time, but if you want to play Farmville while watching 4K Netflix/Youtube go for the 8GB model" or "It can handle around 80 graphic elements in powerpoint, so if you are a poweruser you need a more powerful machine" :P

    Link to comment
    Share on other sites

    Link to post
    Share on other sites

    13 minutes ago, Spotty said:

    I just think more in depth testing would not necessarily translate to actually seeing that detailed level of information be presented to the audience in their videos

    Well accuracy isn't the same thing as detail, the current clamp on the EPS increases accuracy while providing the same amount of detail.

     

    13 minutes ago, Spotty said:

    it doesn't account for the power delivered over the PCIe slot from the motherboard.

    That's why I said in the Current Clamp + correction thing that you'd need to check what acutally happens to that power, some GPUs just use it for the minor rails. But as I said in the Current Clamp part, this method really insn't that great for GPUs. If you wanted to be super accurate you could use a PCIe riser that gets power via molex or sata and add that to the current clamp (need to measure one voltage at a time of course).

    13 minutes ago, Spotty said:

     Not sure how successful such a channel would be, however.

    You mean GN or Actually Hardcore Overclocking with higher production values? I bet that'd flop hard :D  But if LTT were to add another channel that targeted in depth stuff, I'd hope for a channel that looks at the science of PCs in a TQ style. A lot of stuff to explore there just in physics alone and I don't really see that nieche filled by any other channel.

    13 minutes ago, Spotty said:

    CSF

    F for NickyV

    Link to comment
    Share on other sites

    Link to post
    Share on other sites

    2 hours ago, ChalkChalkson said:
    Quote

    CSF

    F for NickyV

    Still miss the guy, F!

    You brought me some amazing laughs 

    When the PC is acting up haunted,

    who ya gonna call?
    "Monotone voice" : A local computer store.

    *Terrible joke I know*

     

    Link to comment
    Share on other sites

    Link to post
    Share on other sites

    Power draw

    • Agreed with Alex. We should get a clamp meter
    • That second part would take quite a while to do… So unless we're specifically looking for that data for something in-depth, probably not.

    Thermal testing

    • We should probably look at getting a new thermal probe, but
    • I actually am not sure it would be worth the time and effort to do this every time. Possible video fodder?

    Compute / Science

    • Yes
    • I think if I were to pick one each of GPU and CPU, I'd do TensorFlow and JuPyTer.
    • The main issue is finding a dataset and methodology that would be applicable and quick to run.
    • The other issue is figuring out the fastest way to run the test…
    • Sadly, time is often at a premium, particularly when it comes to new releases. Unlike written publications, we need to have all of our testing and conclusions done within ~2 days of launch in order to give us time to shoot and edit the video.
      • I have no idea how Steve Burke does it.
        • Probably putting in ruinously long work days in a big burst up until release. Guy works hard.

    Low performance testing

    • I'll defer to Alex on this one, since he's usually the one testing the low-end devices (laptops).

    Emily @ LINUS MEDIA GROUP                                  

    congratulations on breaking absolutely zero stereotypes - @cs_deathmatch

    Link to comment
    Share on other sites

    Link to post
    Share on other sites

    1 hour ago, GabenJr said:

    I have no idea how Steve Burke does it.

    He rests on the seventh day.

     

    Spoiler

    image.png.fb97dfeb267ee1fbf7c3c0c99f66168c.png

     

    Edited by Spotty

    CPU: Intel i7 6700k  | Motherboard: Gigabyte Z170x Gaming 5 | RAM: 2x16GB 3000MHz Corsair Vengeance LPX | GPU: Gigabyte Aorus GTX 1080ti | PSU: Corsair RM750x (2018) | Case: BeQuiet SilentBase 800 | Cooler: Arctic Freezer 34 eSports | SSD: Samsung 970 Evo 500GB + Samsung 840 500GB + Crucial MX500 2TB | Monitor: Acer Predator XB271HU + Samsung BX2450

    Link to comment
    Share on other sites

    Link to post
    Share on other sites

    • 4 weeks later...

    Sorry for the long wait, since FP migrated I don't find myself over here as regularly :/

    On 9/21/2018 at 5:07 PM, GabenJr said:

    Compute / Science

    • Yes
    • I think if I were to pick one each of GPU and CPU, I'd do TensorFlow and JuPyTer. 
    • The main issue is finding a dataset and methodology that would be applicable and quick to run.
    • The other issue is figuring out the fastest way to run the test…
    • Sadly, time is often at a premium, particularly when it comes to new releases. Unlike written publications, we need to have all of our testing and conclusions done within ~2 days of launch in order to give us time to shoot and edit the video.
      • I have no idea how Steve Burke does it.
        • Probably putting in ruinously long work days in a big burst up until release. Guy works hard.

    Yeah, Steve's testing is almost up to scientific standard... Must take a looong time to do.

    Pretty sure JuPyTer and TensorFlow would be as fair as it gets. The page I linked with the useful Notebooks also has examples for those. These aren't really taxing for high end systems, but if you want that, simple Possion solving can do the trick if you make your state space large enough, even a 10000x10000 2D grid with a quadrupol should do the trick for most things. Tensor Flow could use the fantastic MNIST set, which is pretty well known and often used during the first experiements with NN, if you want something closer to state of the art, you could set up a recursive NN that you feed with your YT comments or something (would also be pretty funny, did that once and what it thought a typical comment looks like was a bit depressing)

    Link to comment
    Share on other sites

    Link to post
    Share on other sites

    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

    ×