Jump to content

I've made a shitty open-source (python) multithreaded cpu benchmark based on thermodynamical simulations, wanna try it?

I hope this is the right place to post this

 

DO NOT RUN ANY CODE WRITTEN BY OTHERS IF YOU CAN'T READ CODE AND VERIFY IT DOES EXACTLY WHAT IT SAYS IT DOES

 

tl;dr here's the link, my 12650H scored ~7000 and a Xeon server from 2012 scored ~3200; try to beat it and post your results down below.

 

When running this code it will completely saturate all the cores of your CPU as the code does 128 simultaneous simulations (for 16 threaded CPUs this does 8 simulations sequentially on every thread, at least that's what it should). The score is just 100000/time in seconds. My laptop spends ~15 seconds to complete a run. I hope you guys will have fun. Numba and Numpy are the python packages needed to run this.

 

I'm a theoretical physicist and as such I usually use Kinetic Monte Carlo for simulations. I got wasted one day and wrote a "whole ass engine" for that on python using numba (a pretty good compilation tool) to do so using this engine consistently. This isn't the best code for KMC but it works well enough and it's good to have something like this so I can quickly troubleshoot the problems of my students that try KMC.

 

A couple of days ago I decided to have fun with it and check whether my new laptop cpu will do a WSME simulation faster than an old server I still have access to. It ran ~2.5 times faster. That's where this dumb idea came from.

Link to comment
Share on other sites

Link to post
Share on other sites

For those of us not having Python installed on my system...how does one install/run this code?

NOTE: I no longer frequent this site. If you really need help, PM/DM me and my e.mail will alert me. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Radium_Angel said:

For those of us not having Python installed on my system...how does one install/run this code?

Start by installing python.

Download the files.
Run them.

Link to comment
Share on other sites

Link to post
Share on other sites

Very interesting! Have a star.

 

My Ryzen 7 4800h scored 7971.

 

20 minutes ago, Radium_Angel said:

For those of us not having Python installed on my system...how does one install/run this code?

First install Python and download the benchmark code from the GitHub link. Then, navigate to the downloaded folder, open up a terminal/command prompt, type "python main.py", and press enter. If it comes up with a ModuleNotFoundError, then type "pip install <module name>.

Computer engineering grad student, cybersecurity researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 16GB DDR4 3200MHz C16

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, dcgreen2k said:

Very interesting! Have a star.

 

My Ryzen 7 4800h scored 7971.

 

First install Python and download the benchmark code from the GitHub link. Then, navigate to the downloaded folder, open up a terminal/command prompt, type "python main.py", and press enter. If it comes up with a ModuleNotFoundError, then type "pip install <module name>.

Nice, you beat my score. Glad you enjoyed it. I assume AMD in general should do better than intel cuz all the cores are "performance" cores.

 

To add to the "how to run" part you also need to install pip if it's not installed and add pip and python to the Path environment variable.

Link to comment
Share on other sites

Link to post
Share on other sites

I'm waiting for someone with a threadripper to come and get a 5 digit score. That'll make my week

Link to comment
Share on other sites

Link to post
Share on other sites

How did you handle threads for small big core on intel 12th+ gen ? E core do not get their highest peak with the same instructions as the P core do. I have been playing with all my 12th-13th gen and managing to saturate the core optimally is not easy. single and double precision complete much faster on E core but if in a tight loop that is too big the P core wins. I had some test dominate with 12700k 4 E core vs Ryzen 5600x 6 full (P core).

 

In short, on what kind of workload are you running the test ?

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, dcgreen2k said:

My Ryzen 7 4800h scored 7971.

5900x – 8155, I would've expected a bigger difference…

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Franck said:

How did you handle threads for small big core on intel 12th+ gen ? E core do not get their highest peak with the same instructions as the P core do. I have been playing with all my 12th-13th gen and managing to saturate the core optimally is not easy. single and double precision complete much faster on E core but if in a tight loop that is too big the P core wins. I had some test dominate with 12700k 4 E core vs Ryzen 5600x 6 full (P core).

 

In short, on what kind of workload are you running the test ?

see that's why I called it shitty. with numba I don't really have control over how it manages the threads (afaik). I don't think I'm remotely competent enough to do it all myself rather than just using numba for that.

 

The way I think this should work (cuz that's what I think I wrote :D) is it just takes all the threads and runs one simulatuon per thread, not differentiating between P/E cores. It doesn't (at least I think it doesn't) do one type calculation on an E core and another type on a P core. Although I think that should be slower cuz that would require each core to use the same data while this way each core has it's own data to work with and doesn't share it with anyone

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Eigenvektor said:

5900x – 8155, I would've expected a bigger difference…

same honestly 😕 I remember ryzen 5800 outperformed the Xeon server ~3 times. But that was a way bigger load (2 hours on the 5800)

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Var003 said:

same honestly 😕 I remember ryzen 5800 outperformed the Xeon server ~3 times. But that was a way bigger load (2 hours on the 5800)

With 50% more cores (vs 4800h), a ~2% performance increase doesn't seem remotely realistic for a multi-core load (clock speed and generational advantages aside). Looking at System Monitor, it doesn't seem to distribute load evenly. One core seems to be hit much sooner and longer than the others.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, Eigenvektor said:

With 50% more cores (vs 4800h), a ~2% performance increase doesn't seem remotely realistic for a multi-core load (clock speed and generational advantages aside). Looking at System Monitor, it doesn't seem to distribute load evenly. One core seems to be hit much sooner and longer than the others.

I think I know the reason for that. I did some digging a bit earlier.

 

So ofc the normal use case of this code takes 30 mins if not more and most of it is simulation. Therefore when writing it I didn't account for anything else taking any amount of time cuz those couple of seconds don't make a difference. But when I turned it into this "benchmark" I drastically lowered the load so that it takes ~15 sec. I now realize that by doing that now the numba "compilation" of the code (which is not parallel afaik) takes a significant amount of time and affects the score a lot.

 

I'll test this in ~7 hours and if that's the case I'll fix it and post it here. Didn't think of this before cuz usually there's no need to compile a benchmark in the middle of running it (if you write it on something that is not python ofc)

 

I made this just for some lulz but I feel like maintaining it and fixing it as long as it's still fun to mess around. I still hope to pull this off the easy way using python.

 

Thanks everyone for the feedback

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Eigenvektor said:

5900x – 8155, I would've expected a bigger difference…

And I would have expected the 5900x to score higher than my 3600.

Ryzen 3600 - Score:  8272

Link to comment
Share on other sites

Link to post
Share on other sites

Alright so I tried to fix the "compiling" issue by compiling everything before starting the run (painfully obvious now that I did it). The issue was that compilation in numba is done during the run and it took ~4 seconds so doubling the load would change the runtime from 9secs to 14 secs.
image.png.c99de01c9aea361b39fd75cd7df2bd3d.png
It seems resolved as now doubling the load multiplies the score with 0.508769 which seems accurate enough.
image.png.f936e7c3ef15eb3077da8649be1e09d9.png
I have pushed this to the github. Thanks @Eigenvektor for the helpful feedback. But this still doesn't fully explain 3600 getting a higher score than 5900x cuz Nayr438 did their run before my push. This might be some weird interaction between the drivers and numba 😕 I don't know honestly, if someone can chime in and help that'd be appreciated

The scores should change a bit now and it will take a couple of seconds longer to run. My 12650H gets 7900--8100 now. Try to beat it again? 😄

Link to comment
Share on other sites

Link to post
Share on other sites

My Ryzen 7 4800h scores 10980 with the new code.

 

image.png.9a7c5b32635043bc2105824d4caece89.png

Computer engineering grad student, cybersecurity researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 16GB DDR4 3200MHz C16

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Var003 said:

I have pushed this to the github. 

New Run after commit.

Ryzen 3600 - Score:  10692

 

6 hours ago, Var003 said:

 But this still doesn't fully explain 3600 getting a higher score than 5900x cuz Nayr438 did their run before my push. This might be some weird interaction between the drivers and numba 😕 I don't know honestly, if someone can chime in and help that'd be appreciated

Could be that I am on Linux and Eigenvektor is possibly on Windows, idk. Even on Linux Benchmarks tend to vary between distributions. I still would have expected the 5900x in the initial run to score higher regardless however.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Nayr438 said:

New Run after commit.

Ryzen 3600 - Score:  10692

 

Could be that I am on Linux and Eigenvektor is possibly on Windows, idk. Even on Linux Benchmarks tend to vary between distributions. I still would have expected the 5900x in the initial run to score higher regardless however.

Oh it's definitely linux. My 2core i3 11gen ArchLinux pc scores as high as my 12650 😄

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Var003 said:

Oh it's definitely linux. My 2core i3 11gen ArchLinux pc scores as high as my 12650 😄

4 hours ago, Nayr438 said:

Could be that I am on Linux and Eigenvektor is possibly on Windows, idk. Even on Linux Benchmarks tend to vary between distributions. I still would have expected the 5900x in the initial run to score higher regardless however.

No, I'm on Manjaro. With the new version I'm now at 8761 🤔

 

My understanding of Python is very rudimentary. Could you explain a bit how work is created and distributed across cores? Is this a fixed amount of work, or does it scale up with number of cores somehow? My suspicion is that something scales up with number of cores, and that initial overhead reduces the score on more cores, since you're measuring overall time.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, Eigenvektor said:

No, I'm on Manjaro. With the new version I'm now at 8761 🤔

 

My understanding of Python is very rudimentary. Could you explain a bit how work is created and distributed across cores? Is this a fixed amount of work, or does it scale up with number of cores somehow? My suspicion is that something scales up with number of cores, and that initial overhead reduces the score on more cores, since you're measuring overall time.

no it's a fixed load 192 simulations spread across all threads. I changed the code so it runs a very light simulation to jit compile everything and then start the time with the second simulation (actual load). There shouldn't be an initial overhead

 

I don't understand where this issue comes from. I'll ask my friend with a 5900X to run it on his system to see what he gets

Edited by Var003
Link to comment
Share on other sites

Link to post
Share on other sites

Interesting project. I got 9353 on the first run and around 9100 on subsequent runs but it varies quite a lot (+/-100 points) on my 6700k @ 4.4GHz with a browser in the background. This is on Windows 10 and I'm sure it will be at the very least slightly faster on Linux since the Python interpreter compiles bytecode and launches faster there and numba might warm up quicker.

 

On 3/28/2023 at 7:04 PM, Var003 said:

see that's why I called it shitty. with numba I don't really have control over how it manages the threads (afaik). I don't think I'm remotely competent enough to do it all myself rather than just using numba for that.

 

The way I think this should work (cuz that's what I think I wrote :D) is it just takes all the threads and runs one simulatuon per thread, not differentiating between P/E cores. It doesn't (at least I think it doesn't) do one type calculation on an E core and another type on a P core. Although I think that should be slower cuz that would require each core to use the same data while this way each core has it's own data to work with and doesn't share it with anyone

Actually that's why I'm not a fan of how both Intel big-little and AMD (especially X3D) chiplet architectures works, at least right now. It is widely know that software developers cannot possibly optimize for every hardware out there, let alone for the future. This is the job of the compiler, the OS or the underlying architecture itself to decide the most efficient way of executing individual instructions, but we've seen how that goes when OS-specific schedulers and Windows' thread director screw up. Both AMD and Intel systems took up to 30% performance hit on earlier Windows 11 versions for example. I'm not a CPU architecture engineer after all, just dumping my random rant.

 

Anyway, I don't have Linux natively installed right now as I use VMs for pretty much everything but I would like to compare the score to my current Windows system when I can.

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, alexitx said:

This is the job of the compiler, the OS or the underlying architecture itself to decide the most efficient way of executing individual instructions, but we've seen how that goes when OS-specific schedulers and Windows' thread director screw up. Both AMD and Intel systems took up to 30% performance hit on earlier Windows 11 versions for example. I'm not a CPU architecture engineer after all, just dumping my random rant.

That is correct. This is too low level for most userland application developers to have the expertise or the willingness to bother with. You will need some intricate knowlege of the cpu instructions and the hardwares and, if using python for example, you will probably need to write this performance critical part of the code in low level assembly and compile it to a .dll/.so shared library so your python script can load it at runtime to be executed.  This is assuming whatever os level scheduler or whatnot does not interfere or already does a better job at optimizing than you possibly could. 

Sudo make me a sandwich 

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, alexitx said:

Interesting project. I got 9353 on the first run and around 9100 on subsequent runs but it varies quite a lot (+/-100 points) on my 6700k @ 4.4GHz with a browser in the background. This is on Windows 10 and I'm sure it will be at the very least slightly faster on Linux since the Python interpreter compiles bytecode and launches faster there and numba might warm up quicker.

 

Actually that's why I'm not a fan of how both Intel big-little and AMD (especially X3D) chiplet architectures works, at least right now. It is widely know that software developers cannot possibly optimize for every hardware out there, let alone for the future. This is the job of the compiler, the OS or the underlying architecture itself to decide the most efficient way of executing individual instructions, but we've seen how that goes when OS-specific schedulers and Windows' thread director screw up. Both AMD and Intel systems took up to 30% performance hit on earlier Windows 11 versions for example. I'm not a CPU architecture engineer after all, just dumping my random rant.

 

Anyway, I don't have Linux natively installed right now as I use VMs for pretty much everything but I would like to compare the score to my current Windows system when I can.

maybe it's the python versions? what version are you on

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Var003 said:

maybe it's the python versions? what version are you on

Python 3.10.9 since numba doesn't have a wheel for 3.11 yet

Link to comment
Share on other sites

Link to post
Share on other sites

49 minutes ago, alexitx said:

Anyway, I don't have Linux natively installed right now as I use VMs for pretty much everything but I would like to compare the score to my current Windows system when I can.

Just tested on a VirtualBox VM with Ubuntu Server 22.04.2 5.15.0-69-generic, 4C/4T, 4 GiB RAM, Python 3.10.6 and got 7958 highest score of 5 runs. Don't know if it's Python or the kernel making the difference but wow.

Link to comment
Share on other sites

Link to post
Share on other sites

Python 3.10.10 / Windows 11 22H2
numba 0.56.4

QQLS CPU 8 Cores 8 Threads @ 1.8Ghz
Score:  2660.4603568220514

QQLS CPU 8 Cores 8 Threads @ 4.4Ghz
Score:  7089.680305130976

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×