Why are these calculations so slow to be performed?

vorticalbox · August 27, 2016

On 25/08/2016 at 0:29 PM, AluminiumTech said:

............................

That was embarrassing................. After I fixed it, it took 3 seconds to do 2 million calculations....

just did it python and took about 3 seconds on my 8350 with 1 thread but 10 threads took about 30.

Quote

D:\vbox\Documents\Projects\Threads>python threads.py
starting thread 0 at 2016-08-27 19:35:22.034958
took 2.960297 seconds

D:\vbox\Documents\Projects\Threads>python threads.py
starting thread 0 at 2016-08-27 19:36:13.347054
starting thread 1 at 2016-08-27 19:36:13.353051
starting thread 2 at 2016-08-27 19:36:13.363071
starting thread 3 at 2016-08-27 19:36:13.410558
starting thread 4 at 2016-08-27 19:36:13.510572
starting thread 5 at 2016-08-27 19:36:13.552088
starting thread 6 at 2016-08-27 19:36:13.647092
starting thread 7 at 2016-08-27 19:36:13.673596
starting thread 8 at 2016-08-27 19:36:13.700095
starting thread 9 at 2016-08-27 19:36:13.783609
took 28.743302 seconds

import time
import math
from datetime import datetime
from threading import Thread
total=0
numThreads=10

def myfunc(i, total,thread):
    startTime = datetime.now()
    #print("starting thread {0} at {1}".format(i, startTime)),
    for x in range(1,2000000):
        c = math.sqrt((x**2)+(x**2))
    total += (datetime.now() - startTime).total_seconds()
    if i == (thread-1):
        print("took {0} seconds".format(total))
for i in range(numThreads):
    t = Thread(target=myfunc, args=(i,total,numThreads,))
    t.start()
    if i == (numThreads-1):
        print("{} threads satrted".format(numThreads))

fizzlesticks · August 27, 2016

26 minutes ago, vorticalbox said:

just did it python and took about 3 seconds on my 8350 with 1 thread but 10 threads took about 30.

Makes sense since you're doing 2,000,000 per thread. 10x the work 10x the time.

vorticalbox · August 27, 2016

29 minutes ago, fizzlesticks said:

Makes sense since you're doing 2,000,000 per thread. 10x the work 10x the time.

yeah though it would scale a little better but hey. Weirdly it hardly increases cpu usage at all.

fizzlesticks · August 27, 2016

Just now, vorticalbox said:

yeah though it would scale a little better but hey. Weirdly it hardly increases cpu usage at all.

Python doesn't have the same kind of threading as other languages. Due to the GIL only 1 thread can run at a time, the most you'll get out of a Python program is 1 core's worth of CPU usage. So by adding more threads you're actually slowing the program down, it would be faster to do all 20,000,000 calculations in a single thread to avoid the extra work of switching between threads.

vorticalbox · August 27, 2016

2 minutes ago, fizzlesticks said:

Python doesn't have the same kind of threading as other languages. Due to the GIL only 1 thread can run at a time, the most you'll get out of a Python program is 1 core's worth of CPU usage. So by adding more threads you're actually slowing the program down, it would be faster to do all 20,000,000 calculations in a single thread to avoid the extra work of switching between threads.

yeah I will look into multiprocessing

https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing

see if I can get it to scale better.

fizzlesticks · August 27, 2016

35 minutes ago, vorticalbox said:

yeah I will look into multiprocessing

https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing

see if I can get it to scale better.

That will certainly scale much better up to your thread count but probably not worth the work when things like numpy exist.

import time
import numpy as np

size = 20000000
start = time.perf_counter()

xs = np.square(np.random.rand(size))
ys = np.square(np.random.rand(size))
zs = np.sqrt(np.add(xs, ys))

print(time.perf_counter() - start)

20,000,000 calculations takes 0.5 seconds for me.

mariushm · August 27, 2016

I got bored and opened notepad and wrote a quick php code, to do this multithreaded ( latest php 7 plus pthreads extension installed and enabled)

<?php
ini_set('memory_limit','4096M'); // only needed if you want to store the results in some array somewhere.

$threads = 8; // at least 1 required
$chunks = 2; // how many chunks should each thread do
$chunksize = 10000000; // how many sqrt operations should each thread do for each chunk

 
class WorkerThreads extends Thread
{
    private $workerId;
    private $chunks;
    private $chunksize;
 
    public function __construct($id,$chunk_cnt,$size)
    {
    $this->workerId = $id;
    $this->chunks = $chunk_cnt;
    $this->chunksize = $size;
    }
 
    public function run()
    {
    echo "Worker {$this->workerId} running " . PHP_EOL;
    for ($i=1;$i<=$this->chunks;$i++) {
     	$time_s = microtime(true);
        for ($j=0;$j<$this->chunksize;$j++) {
         $a = mt_rand();
         $b = mt_rand();
         $d = sqrt($a * $a + $b * $b);
        }    

     $time_e = microtime(true);
     $time_f = $time_e-$time_s;
     echo "Worker {$this->workerId} worked chunk $i in $time_f seconds" . PHP_EOL;
    }
    }
}

$time_s = microtime(true);
 
// Worker pool
$workers = [];
 
// Initialize and start the threads
foreach (range(1, $threads) as $i) {
    $workers[$i] = new WorkerThreads($i,$chunks,$chunksize);
    $workers[$i]->start();
}
 
// Let the threads come back
foreach (range(1, $threads) as $i) {
    $workers[$i]->join();
}

$time_e = microtime(true);
$time_f = $time_e-$time_s;

echo "Finished processing ". number_format($threads*$chunks*$chunksize)." square roots on $threads threads in $time_f seconds.".PHP_EOL;

The code above gets all my fx-8320 cores going and outputs something like this (it's 10 million per chunk, i wanted it to take a long time to see the cpu usage actually go up on all cores):

d:\Programs\php>php test.php
Worker 1 running
Worker 2 running
Worker 3 running
Worker 4 running
Worker 5 running
Worker 6 running
Worker 7 running
Worker 8 running
Worker 1 worked chunk 1 in 5.2623009681702 seconds
Worker 6 worked chunk 1 in 5.1532950401306 seconds
Worker 2 worked chunk 1 in 5.2923021316528 seconds
Worker 4 worked chunk 1 in 5.3003029823303 seconds
Worker 8 worked chunk 1 in 5.2202990055084 seconds
Worker 3 worked chunk 1 in 5.4963138103485 seconds
Worker 7 worked chunk 1 in 5.5463171005249 seconds
Worker 5 worked chunk 1 in 6.1193499565125 seconds
Worker 6 worked chunk 2 in 5.1422939300537 seconds
Worker 4 worked chunk 2 in 5.160295009613 seconds
Worker 8 worked chunk 2 in 5.0782899856567 seconds
Worker 2 worked chunk 2 in 5.2553009986877 seconds
Worker 3 worked chunk 2 in 5.3773081302643 seconds
Worker 7 worked chunk 2 in 5.3383049964905 seconds
Worker 1 worked chunk 2 in 5.9453399181366 seconds
Worker 5 worked chunk 2 in 5.1042909622192 seconds
Finished processing 160,000,000 square roots on 8 threads in 11.27564406395 seconds.

Nineshadow · August 27, 2016

50 million calculations on floating point numbers in roughly 70k microseconds. Without openMP it takes roughly 160k microseconds.

#include <iostream>
#include <random>
#include <chrono>
const int n = 50'000'000;
float a[n], b[n], c[n];
int main()
{
	std::random_device rd;
	std::mt19937 mt(rd());
	std::uniform_real_distribution<float> dist(0.0, 1999999973.0);
	std::cout << "Generating dataset...\n";
	for (int i = 0; i < n; ++i)
	{
		a[i] = dist(mt);
		b[i] = dist(mt);
	}
	std::cout << "Done generating dataset.\n";
	std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
	#pragma omp parallel for
	for (int i = 0; i < n; ++i)
	{
		c[i] = sqrt(a[i] * a[i] + b[i] * b[i]);
	}
	std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
	std::cout << "Executed "<<n<<" P. theorems in " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << " microseconds.\n";
	std::cin.get();
	return 0;
}

vorticalbox · August 28, 2016

10 hours ago, fizzlesticks said:
That will certainly scale much better up to your thread count but probably not worth the work when things like numpy exist.
import time
import numpy as np

size = 20000000
start = time.perf_counter()

xs = np.square(np.random.rand(size))
ys = np.square(np.random.rand(size))
zs = np.sqrt(np.add(xs, ys))

print(time.perf_counter() - start)
20,000,000 calculations takes 0.5 seconds for me.

new to python I will have a look at numpy not heard of it before. I was actually using this topic as a way to learn threading in python.

vorticalbox · August 28, 2016

@fizzlesticks having a look at your code doesn't that square 2 random numbers from the 20 m rather that calculating all 20 m values?

Edit:

new an improved code that uses threads better than before. Due to memory issues i coudl only make 10 million numbers in a list so added units to loop over the 10 million.

from multiprocessing import Pool
from datetime import datetime
import math
#set to cpu cores
threads = 8
#10 million numbers
calculations = 10 ** 7
#times to loop number above
units = 1

def f(x):
     for i in range(units):
          c = math.sqrt((x**2)+(x**2))


if __name__ == '__main__':
    myList = []
    print("creating list")
    for i in range(calculations):
         myList.append(i)
    print("List created")
    print("Starting {} calculations".format(calculations * units))
    for i in range(1,threads+1):
        print("Started using {} thread(s)".format(i))
        startTime = datetime.now()
        p = Pool(i)
        p.map(f, myList)
        print("Took {} with {} thread(s)".format((datetime.now() - startTime).total_seconds(), i))
    print("end")

Output

creating list
List created
Starting 10000000 calculations
Started using 1 thread(s)
Took 25.403118 with 1 thread(s)
Started using 2 thread(s)
Took 13.44821 with 2 thread(s)
Started using 3 thread(s)
Took 9.273681 with 3 thread(s)
Started using 4 thread(s)
Took 7.438988 with 4 thread(s)
Started using 5 thread(s)
Took 6.19076 with 5 thread(s)
Started using 6 thread(s)
Took 5.531793 with 6 thread(s)
Started using 7 thread(s)
Took 5.097657 with 7 thread(s)
Started using 8 thread(s)
Took 4.846125 with 8 thread(s)
end

As you can see scales very well with more threads

Unimportant · August 28, 2016

21 hours ago, Nineshadow said:
const int n = 50'000'000;

What's up with the '000' ? That should not compile. If it does on your compiler the value might not be fifty million. Or is it a typo ?

fizzlesticks · August 28, 2016

11 hours ago, vorticalbox said:

@fizzlesticks having a look at your code doesn't that square 2 random numbers from the 20 m rather that calculating all 20 m values?

Nope, passing an array to most numpy function will do the operation element wise for each item in the array. For example here's the code with some print statement to show what's going on.

Code:

Spoiler


import numpy as np

xs = np.array(list(range(11)))
ys = np.array(list(reversed(range(11))))

print(xs)
print(ys)
print()

xs = np.square(xs)
ys = np.square(ys)

print(xs)
print(ys)
print()

zs = np.add(xs, ys)

print(zs)
print()

zs = np.sqrt(zs)

print(zs)

Output:

Spoiler


[ 0  1  2  3  4  5  6  7  8  9 10]
[10  9  8  7  6  5  4  3  2  1  0]

[  0   1   4   9  16  25  36  49  64  81 100]
[100  81  64  49  36  25  16   9   4   1   0]

[100  82  68  58  52  50  52  58  68  82 100]

[ 10.           9.05538514   8.24621125   7.61577311   7.21110255
   7.07106781   7.21110255   7.61577311   8.24621125   9.05538514  10.        ]

fizzlesticks · August 28, 2016

6 minutes ago, Unimportant said:

What's up with the '000' ? That should not compile. If it does on your compiler the value might not be fifty million. Or is it a typo ?

It's new in C++14, you can use ' to separate groups of digits to make it more readable than trying to count how many zeros there are. Like we use a coma or period in the real world but those wouldn't work for various reasons so they chose '.

Nineshadow · August 28, 2016

59 minutes ago, Unimportant said:

What's up with the '000' ? That should not compile. If it does on your compiler the value might not be fifty million. Or is it a typo ?

Digit separators are a new feature in C++ 14, as @fizzlesticks mentioned.

auto integer_literal = 1'000'000;
auto floating_point_literal = 0.000'015'3;
auto binary_literal = 0b0100'1100'0110;

vorticalbox · August 29, 2016

22 hours ago, Nineshadow said:
Digit separators are a new feature in C++ 14, as @fizzlesticks mentioned.
auto integer_literal = 1'000'000;
auto floating_point_literal = 0.000'015'3;
auto binary_literal = 0b0100'1100'0110;

that is so awesome I hope python copies it

fizzlesticks · August 29, 2016

6 minutes ago, vorticalbox said:

that is so awesome I hope python copies it

They're adding underscore for it in 3.6

Midnight · September 7, 2016

On 8/25/2016 at 7:05 AM, Nineshadow said:

Are you by any chance printing the results each iteration?

Because something like this :


#include <iostream>
#include <fstream>
#include <random>
#include <chrono>
float a, b, c;
int main()
{
	std::random_device rd;
	std::mt19937 mt(rd());
	std::uniform_real_distribution<double> dist(0.0, 1999999973.0);
	std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
	for (int i = 0; i < 1100000; ++i)
	{
		a = dist(mt), b = dist(mt);
		c = a*a + b*b;
	}
	std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
	std::cout << "Time : " << std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count() << std::endl;
	std::cin.get();
    return 0;
}

Takes me around ~750000 microseconds. But going with random numbers each time isn't exactly a great way of doing it, since results can vary a lot.

omg, use a namespace

Sign In

Why are these calculations so slow to be performed?

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The BRIGHTEST Monitor We've EVER Seen - Sun Vision rE rLCD Display

Latest From Tech Quickie:

Nutrition Facts…for your Internet Connection?

Latest From TechLinked:

Microsoft’s “M1” Moment is Here