Jump to content

Why are these calculations so slow to be performed?

AlTech
Go to solution Solved by Nineshadow,
1 minute ago, AluminiumTech said:

You're using C++ though......

It shouldn't make much of a difference tbh.

Are you by any chance printing the numbers in each iteration of the loop? That would explain why it takes you so long.

So I wanted to make a stress test and I did. I made it calculate Pythagoras 1.1 million times.

 

And I set it up so that it counts how long It takes to calculate this.

 

Why is it taking ,a quad core i5, 110 Seconds to finish it?

 

This seems so slow.......

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Doh007 said:

because it has to do it 1.1 million times

But a CPU should be able to output at least 800 Million calculations per second if it is 800Mhz........... This is 3.2GHz so theoretically it should be able to do this in less than a second..............

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, AluminiumTech said:

But a CPU should be able to output at least 800 Million calculations per second if it is 800Mhz........... This is 3.2GHz so theoretically it should be able to do this in less than a second..............

actually it can theoretically do more than that due to it assigning several instructions per clock, but assuming that what you're talking about is a^2+b^2=c^2, it will have to do many operations on each "pythagoras".

4k the dream

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Doh007 said:

actually it can theoretically do more than that due to it assigning several instructions per clock, but assuming that what you're talking about is a^2+b^2=c^2, it will have to do many operations on each "pythagoras".

.......... If I use random numbers each time, will it affect the speed?

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, AluminiumTech said:

.......... If I use random numbers each time, will it affect the speed?

Each Pythagoras runthrough is more than 1 calculation, and choosing a random number is even more. 

 | CPU: AMD FX 8350 + H100i | GPU: AMD R9 290X + NZXT Kraken | RAM: HyperX Beast 2033 16GB | PSU: EVGA G2 | MOBO: ASRock 970M |

| CASE: Corsair Carbide 88R |STORAGE: 1x WD Black | KEYBOARD: Corsair K70 | MOUSE: R.A.T 9 |

SOMETIMES LOSING THE BATTLE, MEANS YOU CAN WIN THE WAR

 

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, AluminiumTech said:

But a CPU should be able to output at least 800 Million calculations per second if it is 800Mhz........... This is 3.2GHz so theoretically it should be able to do this in less than a second..............

Calculating Pythagoras' Theorum even once takes far, far more than one calculation.

And yes, choosing random numbers should slow it down even more.

Project White Lightning (My ITX Gaming PC): Core i5-4690K | CRYORIG H5 Ultimate | ASUS Maximus VII Impact | HyperX Savage 2x8GB DDR3 | Samsung 850 EVO 250GB | WD Black 1TB | Sapphire RX 480 8GB NITRO+ OC | Phanteks Enthoo EVOLV ITX | Corsair AX760 | LG 29UM67 | CM Storm Quickfire Ultimate | Logitech G502 Proteus Spectrum | HyperX Cloud II | Logitech Z333

Benchmark Results: 3DMark Firestrike: 10,528 | SteamVR VR Ready (avg. quality 7.1) | VRMark 7,004 (VR Ready)

 

Other systems I've built:

Core i3-6100 | CM Hyper 212 EVO | MSI H110M ECO | Corsair Vengeance LPX 1x8GB DDR4  | ADATA SP550 120GB | Seagate 500GB | EVGA ACX 2.0 GTX 1050 Ti | Fractal Design Core 1500 | Corsair CX450M

Core i5-4590 | Intel Stock Cooler | Gigabyte GA-H97N-WIFI | HyperX Savage 2x4GB DDR3 | Seagate 500GB | Intel Integrated HD Graphics | Fractal Design Arc Mini R2 | be quiet! Pure Power L8 350W

 

I am not a professional. I am not an expert. I am just a smartass. Don't try and blame me if you break something when acting upon my advice.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

...why are you still reading this?

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, AluminiumTech said:

.......... If I use random numbers each time, will it affect the speed?

to some degree, but also consider that it might not be able to fully utilize all 4 cores of your CPU.

4k the dream

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Doh007 said:

to some degree, but also consider that it might not be able to fully utilize all 4 cores of your CPU.

I know it's only using 1 core xD.

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

Are you by any chance printing the results each iteration?

Because something like this :

#include <iostream>
#include <fstream>
#include <random>
#include <chrono>
float a, b, c;
int main()
{
	std::random_device rd;
	std::mt19937 mt(rd());
	std::uniform_real_distribution<double> dist(0.0, 1999999973.0);
	std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
	for (int i = 0; i < 1100000; ++i)
	{
		a = dist(mt), b = dist(mt);
		c = a*a + b*b;
	}
	std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
	std::cout << "Time : " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << std::endl;
	std::cin.get();
    return 0;
}

Takes me around ~750000 microseconds. But going with random numbers each time isn't exactly a great way of doing it, since results can vary a lot.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Nineshadow said:

Are you by any chance printing the results each iteration?

Because something like this :


#include <iostream>
#include <fstream>
#include <random>
#include <chrono>
float a, b, c;
int main()
{
	std::random_device rd;
	std::mt19937 mt(rd());
	std::uniform_real_distribution<double> dist(0.0, 1999999973.0);
	std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
	for (int i = 0; i < 1100000; ++i)
	{
		a = dist(mt), b = dist(mt);
		c = a*a + b*b;
	}
	std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
	std::cout << "Time : " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << std::endl;
	std::cin.get();
    return 0;
}

Takes me around ~750000 microseconds. But going with random numbers each time isn't exactly a great way of doing it, since results can vary a lot.

I'm doing it in C#, And I'm square rooting it at the end just like Pythagoras is done in real life.........

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, AluminiumTech said:

I'm doing it in C#, And I'm square rooting it at the end just like Pythagoras is done in real life.........

Whops, silly mistake.

Anyway, I've made a dataset of randomly generated real numbers in a uniform distribution between 0 and 1999999973 and it takes around 60000 microseconds for the calculations to be done.

#include <iostream>
#include <fstream>
#include <random>
#include <chrono>
std::ifstream in("database.in");
float a[1100000], b[1100000], c;
int main()
{
	std::cout << "Loading dataset...\n";
	for (int i = 0; i < 1100000; ++i)
	{
		in >> a[i] >> b[i];
	}
	std::cout << "Done loading dataset\n";
	std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
	for (int i = 0; i < 1100000; ++i)
	{
		c = sqrt(a[i]*a[i] + b[i]*b[i]);
	}
	std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
	std::cout << "Time for calculations : " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << std::endl;
	std::cin.get();
    return 0;
}

 

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Nineshadow said:

Whops, silly mistake.

Anyway, I've made a dataset of randomly generated real numbers in a uniform distribution between 0 and 1999999973 and it takes around 60000 microseconds for the calculations to be done.


#include <iostream>
#include <fstream>
#include <random>
#include <chrono>
std::ifstream in("database.in");
float a[1100000], b[1100000], c;
int main()
{
	std::cout << "Loading dataset...\n";
	for (int i = 0; i < 1100000; ++i)
	{
		in >> a[i] >> b[i];
	}
	std::cout << "Done loading dataset\n";
	std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
	for (int i = 0; i < 1100000; ++i)
	{
		c = sqrt(a[i]*a[i] + b[i]*b[i]);
	}
	std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
	std::cout << "Time for calculations : " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << std::endl;
	std::cin.get();
    return 0;
}

 

You're using C++ though......

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, AluminiumTech said:

You're using C++ though......

It shouldn't make much of a difference tbh.

Are you by any chance printing the numbers in each iteration of the loop? That would explain why it takes you so long.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

Jeez, another rookie mistake. I was compiling my program in Debug mode.

 

Release version takes around ~600 nanoseconds.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Nineshadow said:

It shouldn't make much of a difference tbh.

Are you by any chance printing the numbers in each iteration of the loop? That would explain why it takes you so long.

............................

 

That was embarrassing................. After I fixed it, it took 3 seconds to do 2 million calculations....

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, AluminiumTech said:

............................

 

That was embarrassing................. After I fixed it, it took 3 seconds to do 2 million calculations....

Told you.

;)

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

 

6 hours ago, Nineshadow said:

Jeez, another rookie mistake. I was compiling my program in Debug mode.

 

Release version takes around ~600 nanoseconds.

600ns is way too short (1800 cycles at 3GHz).

 

Seems like something else is going on:

In release mode the whole calculation will get optimized away because the result (variable c) is disgarded.

Make an additional array "c" and store the results there, that should hopefully get rid of the compiler optimization.

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, mathijs727 said:

Make an additional array "c" and store the results there, that should hopefully get rid of the compiler optimization.

Or just make the result variable volatile.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, mathijs727 said:

 

600ns is way too short (1800 cycles at 3GHz).

 

Seems like something else is going on:

In release mode the whole calculation will get optimized away because the result (variable c) is disgarded.

Make an additional array "c" and store the results there, that should hopefully get rid of the compiler optimization.

I was thinking about something about those lines but didn't go further with since I had to go into town.

But yup, that's the reason.

It's around 3000 microseconds now.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

sqrt(a[i]*a[i] + b[i]*b[i]);

 might be too complex for the compiler to vectorize. Try performing 

a*a + b*b in their own loop and store the results to another array before taking the square root. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Unimportant said:

sqrt(a[i]*a[i] + b[i]*b[i]);

 might be too complex for the compiler to vectorize. Try performing 

a*a + b*b in their own loop and store the results to another array before taking the square root. 

I was kinda wondering if that could happen and I tested it out. And as I expected, nope, it's actually slower. Compilers are really smart these days.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

Replace that sqrt  with one of the alternative square root functions that gives you enough precision for your needs but is much faster than the default one : http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

 

You should try to use at least Intel Threading Building Blocks to parallelize your code to run on all cores: https://software.intel.com/en-us/node/506045

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, mariushm said:

Replace that sqrt  with one of the alternative square root functions that gives you enough precision for your needs but is much faster than the default one : http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

 

You should try to use at least Intel Threading Building Blocks to parallelize your code to run on all cores: https://software.intel.com/en-us/node/506045

 

 

 

 

Why use Intel Threading Blocks?

Easiest way to parallize this code is to use OpenMP.

That requires you to add the line "#pragma omp parallel for" before the for loop.

Thats it, if you now compile it with OpenMP flag it is magically parallel.

 

NOTE: only do this if the result variable C is an array

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

Intel TBB was the first thing that came to mind. I guess OpenMP could be just as good for this particular case or you could just code your own thread pool with a bunch of threads that basically do only that square root and return the result.

Both have issues, it's enough to search for openmp vs tbb and you'll find plenty of answers.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×