Jump to content

Cinebench R15 breaking, end of a benchmark?

Prysin
Quote

Cinebench has been a popular rendering benchmark for some time. Much of its popularity is due to the fact that it can be run on even the lowliest of Atom machines and ran (until today) fairly well into dual and quad socket systems. Today, we are seeing evidence of Cinebench R15 simply breaking.

Last week we published a video that we did before the Skylake-SP launch called Crushing Cinebench R15 V4. We did that video prior to the official Intel Xeon Scalable Processor launch. As a result, we had old drivers on our installation.

While testing another Xeon system in the lab, we noticed there were some new chipset/ BMC drivers available so we loaded them onto the quad Intel Xeon Platinum 8180 system. Not only did the new drivers help improve performance, but we saw something very strange. Cinebench appears to be breaking.

Video: 

 

 

Quote

One area you will note is that the entirety of runs is in the 5-7 second range. That is important as modern CPUs will typically hit an all core turbo mode. Along those lines, we are seeing fairly massive differences, at times greater than 20% peak to valley on some runs. Generally when we see that level of variance, and consistent variance (e.g. not one of 100 runs but every run moving significantly), we know it is time to take a look at a benchmark.
 

As an example, even though our c-ray “hard” setting we developed in 2012 is starting to run into the same 5-7 window, it is still producing repeatable runs with well under 5% benchmark variance. We are still going to be introducing our 8K version soon simply to get longer run times on large machines.

5-7 seconds is an extremely short time to run a “benchmark” on a modern CPU. We have run significantly longer workloads on this machine, the types that take days to run and they are extremely consistent. Even tasks like doing large compile jobs in linux are predictable where we have a sub 1% test variance over 100 runs. 20% is enormous in comparison.
 

What Can be Done?

We are making the suggestion that Maxon increase the test render scene size. From what we can see, the benchmark is pushing work to all 224 threads. At the same time, with such a short runtime and extremely inconsistent results, the workload needs to run longer to make any initialization negligible. In the professional rendering industry, people do not optimize for 6-7 second renders. It is the multi hour and day (sometimes longer) renders that creative professionals are trying to reduce.

 

 

Now this is highly interesting, and may explain why almost every time i run CB R15 on any system (AMD FX, AMD A10, Intel i7) the results vary from run to run, sometimes a lot, sometimes not. For a long time i thought it was simply background processes stealing processing time, but perhaps, there is also something iffy inside CB code, and how it responds to turbo steppings/overclocking?

 

Source: https://www.servethehome.com/cinebench-r15-is-now-a-broken-as-a-benchmark-and-11-5k-surpassed/

Link to comment
Share on other sites

Link to post
Share on other sites

WTF?! $10K for one of them. :o 

 

 

CPU: AMD Ryzen 5 5600X | CPU Cooler: Stock AMD Cooler | Motherboard: Asus ROG STRIX B550-F GAMING (WI-FI) | RAM: Corsair Vengeance LPX 16 GB (2 x 8 GB) DDR4-3000 CL16 | GPU: Nvidia GTX 1060 6GB Zotac Mini | Case: K280 Case | PSU: Cooler Master B600 Power supply | SSD: 1TB  | HDDs: 1x 250GB & 1x 1TB WD Blue | Monitors: 24" Acer S240HLBID + 24" Samsung  | OS: Win 10 Pro

 

Audio: Behringer Q802USB Xenyx 8 Input Mixer |  U-PHORIA UMC204HD | Behringer XM8500 Dynamic Cardioid Vocal Microphone | Sound Blaster Audigy Fx PCI-E card.

 

Home Lab:  Lenovo ThinkCenter M82 ESXi 6.7 | Lenovo M93 Tiny Exchange 2019 | TP-LINK TL-SG1024D 24-Port Gigabit | Cisco ASA 5506 firewall  | Cisco Catalyst 3750 Gigabit Switch | Cisco 2960C-LL | HP MicroServer G8 NAS | Custom built SCCM Server.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I seen this before. the issue is that the test scores based on time. so the closer it gets to 0 seconds the more variance you will see.

 

to make it better they need to make a new version that is hard to run so it takes long enough to become consistent. 

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

The problem is that if they make it much harder then it will not be "useful" for low end cpus like shit P4s or first gen atoms

I spent $2500 on building my PC and all i do with it is play no games atm & watch anime at 1080p(finally) watch YT and write essays...  nothing, it just sits there collecting dust...

Builds:

The Toaster Project! Northern Bee!

 

The original LAN PC build log! (Old, dead and replaced by The Toaster Project & 5.0)

Spoiler

"Here is some advice that might have gotten lost somewhere along the way in your life. 

 

#1. Treat others as you would like to be treated.

#2. It's best to keep your mouth shut; and appear to be stupid, rather than open it and remove all doubt.

#3. There is nothing "wrong" with being wrong. Learning from a mistake can be more valuable than not making one in the first place.

 

Follow these simple rules in life, and I promise you, things magically get easier. " - MageTank 31-10-2016

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Hmm, this is interesting. I never noticed anything like this with Cinebench before. I'm curious about the '8k' cinebench coming.

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, Bananasplit_00 said:

The problem is that if they make it much harder then it will not be "useful" for low end cpus like shit P4s or first gen atoms

Then use another version or have it run an easier test. Like how 3DMark lets you run tests meant for mobile devices up to systems with a bazillion cores and an eye-gouging amount of video cards.

Link to comment
Share on other sites

Link to post
Share on other sites

R15 is fine as it is. A new Cinebench would be a better move.

.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, The Benjamins said:

I seen this before. the issue is that the test scores based on time. so the closer it gets to 0 seconds the more variance you will see.

 

to make it better they need to make a new version that is hard to run so it takes long enough to become consistent. 

All benchmarks in Computers eventually get "broken". Cinebench R15 had a good run, but we'll see R19 or whatever eventually.

 

Though the fact they got it pushed out to 224 threads is pretty impressive of a multi-threaded task!

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Bananasplit_00 said:

The problem is that if they make it much harder then it will not be "useful" for low end cpus like shit P4s or first gen atoms

and what's the problem with that? How many people do you see still benchmarking their new PCs and workstations with 3DMark '03? Benchmarks are not meant to last forever. They have to evolve along with the technologies and hardware in the market or they die. Honestly how useful is it to compare a P4 from 15 years ago to a modern 8+ core CPU these days? if you've stuck with a P4 for that long, you don't need an 8 core CPU.

Corsair 600T | Intel Core i7-4770K @ 4.5GHz | Samsung SSD Evo 970 1TB | MS Windows 10 | Samsung CF791 34" | 16GB 1600 MHz Kingston DDR3 HyperX | ASUS Formula VI | Corsair H110  Corsair AX1200i | ASUS Strix Vega 56 8GB Internet http://beta.speedtest.net/result/4365368180

Link to comment
Share on other sites

Link to post
Share on other sites

I don't see how the benchmark is at fault for Intel's speedstepping/boost to be too slow and irregular. The fact that the damn thing can scale from 1 thread to 224 (and probably more) is damn impressive. Intel get your shit together.

Watching Intel have competition is like watching a headless chicken trying to get out of a mine field

CPU: Intel I7 4790K@4.6 with NZXT X31 AIO; MOTHERBOARD: ASUS Z97 Maximus VII Ranger; RAM: 8 GB Kingston HyperX 1600 DDR3; GFX: ASUS R9 290 4GB; CASE: Lian Li v700wx; STORAGE: Corsair Force 3 120GB SSD; Samsung 850 500GB SSD; Various old Seagates; PSU: Corsair RM650; MONITOR: 2x 20" Dell IPS; KEYBOARD/MOUSE: Logitech K810/ MX Master; OS: Windows 10 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

is there a benchmark that actually tests multitasking(real world like)

I know bit tech use to include multi tasking in their reviews

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, The Benjamins said:

I seen this before. the issue is that the test scores based on time. so the closer it gets to 0 seconds the more variance you will see.

 

to make it better they need to make a new version that is hard to run so it takes long enough to become consistent. 

Yup. This problem has been a thing for years, and has been brought up on HWbot several times. Most people are able to extrapolate where their CPU's should perform though, so it's fairly easy to spot an invalid result. Plenty of other benches have broken elements to them that can be exploited as well. Some of those were patched though. Using ramdisks on benchmarks that tested storage speeds was one way to inflate scores on benchmarks that tested ones entire system. I believe there were exploits with 3Dmark as well, in which one could tweak specific settings in Nvidia Inspector/NVCP to obtain significantly higher scores than others (though I believe that was fixed as well).

 

Hard to consider this an "end of a benchmark", just another thing people need to take into consideration before using these benchmark results as a be all, end all indicator of performance.

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, MageTank said:

Yup. This problem has been a thing for years, and has been brought up on HWbot several times. Most people are able to extrapolate where their CPU's should perform though, so it's fairly easy to spot an invalid result. Plenty of other benches have broken elements to them that can be exploited as well. Some of those were patched though. Using ramdisks on benchmarks that tested storage speeds was one way to inflate scores on benchmarks that tested ones entire system. I believe there were exploits with 3Dmark as well, in which one could tweak specific settings in Nvidia Inspector/NVCP to obtain significantly higher scores than others (though I believe that was fixed as well).

 

Hard to consider this an "end of a benchmark", just another thing people need to take into consideration before using these benchmark results as a be all, end all indicator of performance.

ya, that youtuber also showed that on dual Epyc 7601 windows server 2016 got a 15% better score then on windows server 2012 R2, so with these high core count CPU set ups we will need a revised benchmark tuned for higher core counts.

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

54 minutes ago, Terodius said:

and what's the problem with that? How many people do you see still benchmarking their new PCs and workstations with 3DMark '03? Benchmarks are not meant to last forever. They have to evolve along with the technologies and hardware in the market or they die. Honestly how useful is it to compare a P4 from 15 years ago to a modern 8+ core CPU these days? if you've stuck with a P4 for that long, you don't need an 8 core CPU.

It gives you a relative performance figure that can be easier to relate too. Like with my god damn fist gen atom scoring in the single digits compared to my I7 4790K hitting around 870 points. Is it really useful? No, not extremely, but for a test that has let you laugh at this kind of stuff for so long to need to move on makes me a bit sad to be completely honest. 

 

1 hour ago, M.Yurizaki said:

Then use another version or have it run an easier test. Like how 3DMark lets you run tests meant for mobile devices up to systems with a bazillion cores and an eye-gouging amount of video cards.

This is probably what's going to happen here. Would be nice if you could directly compare the scores over with the current test if they do though so you don't super need to rebench every cpu from 2010 onwards 

I spent $2500 on building my PC and all i do with it is play no games atm & watch anime at 1080p(finally) watch YT and write essays...  nothing, it just sits there collecting dust...

Builds:

The Toaster Project! Northern Bee!

 

The original LAN PC build log! (Old, dead and replaced by The Toaster Project & 5.0)

Spoiler

"Here is some advice that might have gotten lost somewhere along the way in your life. 

 

#1. Treat others as you would like to be treated.

#2. It's best to keep your mouth shut; and appear to be stupid, rather than open it and remove all doubt.

#3. There is nothing "wrong" with being wrong. Learning from a mistake can be more valuable than not making one in the first place.

 

Follow these simple rules in life, and I promise you, things magically get easier. " - MageTank 31-10-2016

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, corsairian said:

Hmm, this is interesting. I never noticed anything like this with Cinebench before. I'm curious about the '8k' cinebench coming.

you ever use 104C/208T before? This is a situation in which Cinebench is probably used at most by 0.001% of users

Link to comment
Share on other sites

Link to post
Share on other sites

The OP, and some of the comments, concern me. :(

 

I'd like to see a benchmark that will run on EVERYTHING, and give meaningful results.

 

For example, maybe I might want to find out just exactly how much faster my as-yet-future Threadripper 4 or post-Tigerlake (on DDR5 & PCI-E 5.0) CPU is faster than my dad's first 286-10.

 

Or,

  • GTX 2080 Ti SLI vs ATI CGA Wonder,
  • 8x Epyc 7601 (or whatever is top SKU) vs Intel 4004, or
  • 39x Intel DC P4800X in Raid 0 vs a tape drive from the 1970s or so.
  • Also Galaxy Note 8 vs Pentium III + Riva TNT.

What would it take to be able to directly compare modern hardware with multi decade old hardware, and server vs mobile on different architectures?

 

Also it shouldn't take agonizingly long on the ancient / slow hardware.  Maybe instead of how long does it take to complete X task, it should measure how many times to complete X task (or how far it progresses through 1 time) in Y time frame.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, RagnarokDel said:

you ever use 104C/208T before? This is a situation in which Cinebench is probably used at most by 0.001% of users

No. But now I kind of want to see that in person. Lol

Link to comment
Share on other sites

Link to post
Share on other sites

47 minutes ago, PianoPlayer88Key said:

The OP, and some of the comments, concern me. :(

 

I'd like to see a benchmark that will run on EVERYTHING, and give meaningful results.

 

For example, maybe I might want to find out just exactly how much faster my as-yet-future Threadripper 4 or post-Tigerlake (on DDR5 & PCI-E 5.0) CPU is faster than my dad's first 286-10.

 

Or,

  • GTX 2080 Ti SLI vs ATI CGA Wonder,
  • 8x Epyc 7601 (or whatever is top SKU) vs Intel 4004, or
  • 39x Intel DC P4800X in Raid 0 vs a tape drive from the 1970s or so.
  • Also Galaxy Note 8 vs Pentium III + Riva TNT.

What would it take to be able to directly compare modern hardware with multi decade old hardware, and server vs mobile on different architectures?

 

Also it shouldn't take agonizingly long on the ancient / slow hardware.  Maybe instead of how long does it take to complete X task, it should measure how many times to complete X task (or how far it progresses through 1 time) in Y time frame.

Linpack....

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, The Benjamins said:

I seen this before. the issue is that the test scores based on time. so the closer it gets to 0 seconds the more variance you will see.

 

to make it better they need to make a new version that is hard to run so it takes long enough to become consistent. 

Hmm, couldn't the cinebench team simply add a setting where the benchmark would run X times in a row without stopping, then get the final score by multiplying the total score by X?

 

This way lowly atoms can still run the benchmark at 1x, while 112 core machines can still get accurate scores by extending the benchmark so core initializing and other factors won't effect the score too much.

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, crystal6tak said:

Hmm, couldn't the cinebench team simply add a setting where the benchmark would run X times in a row without stopping, then get the final score by multiplying the total score by X?

 

This way lowly atoms can still run the benchmark at 1x, while 112 core machines can still get accurate scores by extending the benchmark so core initializing and other factors won't effect the score too much.

It would be safer to make the model more complex.

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, pas008 said:

is there a benchmark that actually tests multitasking(real world like)

I know bit tech use to include multi tasking in their reviews

i'm still curious about this

reason I ask is because 8350 better multithread than 2500k but these old review results have me curious about actual multitasking

https://www.bit-tech.net/reviews/tech/amd-fx-8350-review/5/

http://techreport.com/review/23750/amd-fx-8350-processor-reviewed/9

Link to comment
Share on other sites

Link to post
Share on other sites

this is a really obvious thing. its like running cinebench 2003 on a 5960x and saying its broken................. no lol its not broken it just cant scale on 1 million cores for ever. .

Rig Specs:

AMD Threadripper 5990WX@4.8Ghz

Asus Zenith III Extreme

Asrock OC Formula 7970XTX Quadfire

G.Skill Ripheartout X OC 7000Mhz C28 DDR5 4X16GB  

Super Flower Power Leadex 2000W Psu's X2

Harrynowl's 775/771 OC and mod guide: http://linustechtips.com/main/topic/232325-lga775-core2duo-core2quad-overclocking-guide/ http://linustechtips.com/main/topic/365998-mod-lga771-to-lga775-cpu-modification-tutorial/

ProKoN haswell/DC OC guide: http://linustechtips.com/main/topic/41234-intel-haswell-4670k-4770k-overclocking-guide/

 

"desperate for just a bit more money to watercool, the titan x would be thankful" Carter -2016

Link to comment
Share on other sites

Link to post
Share on other sites

I wanna see how well a quad 32 core epyc system would do.

"You don't need headphones, all you need is willpower!" ~MicroCenter employee

 

How to use a WiiMote and Nunchuck as your mouse!


Specs:
Graphics Card: EVGA 750 Ti SC
PSU: Corsair CS450M
RAM: A-Data XPG V1.0 (1x8GB) (Red)
Procrastinator: Intel i5 4690k @ 4.4GHz 1.3V
Case: NZXT Source 210 Elite (Black)
Speakers and Headphones: Monitor Speakers and Phlips SHP9500s
MoBo: MSI Z97 PC MATE
SSD: SanDisk Ultra II (240GB)
Monitor: LG 29UM68-P
Mouse: Mionix Naos 7000
Keyboard: Corsair K70 RGB (2016) (Browns)

Webcam/mic: Logitech C270
 

Link to comment
Share on other sites

Link to post
Share on other sites

It's still pretty good to stress test your system, I suppose a new version will be released soon enough anyway.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×