Jump to content

Hello,

 

I just learned that people are using GPU's for python codes to make it faster.

 

Can you do it on all code, or just spesefic? I have a really slow for loop that does a few if checks then appends it to a list, got over 30k items and takes up to half an hour to complete. If I could do it on that it would be awesome, I do have a few 980Ti's and a 1080Ti that I would love to put to use.

 

Cheers!

Back-end developer, electronics "hacker"

Link to comment
https://linustechtips.com/topic/884760-python-use-gpu-for-slow-stuff/
Share on other sites

Link to post
Share on other sites

1 hour ago, Joveice said:

Hello,

 

I just learned that people are using GPU's for python codes to make it faster.

 

Can you do it on all code, or just spesefic? I have a really slow for loop that does a few if checks then appends it to a list, got over 30k items and takes up to half an hour to complete. If I could do it on that it would be awesome, I do have a few 980Ti's and a 1080Ti that I would love to put to use.

 

Cheers!

I would look Into multiprocessing documentation this will allow you to spread the work load across cpu cores. 

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to post
Share on other sites

2 hours ago, Joveice said:

Hello,

 

I just learned that people are using GPU's for python codes to make it faster.

 

Can you do it on all code, or just spesefic? I have a really slow for loop that does a few if checks then appends it to a list, got over 30k items and takes up to half an hour to complete. If I could do it on that it would be awesome, I do have a few 980Ti's and a 1080Ti that I would love to put to use.

GPUs are really only good for some fairly specific types of tasks, also, there is a huge learning curve behind what you are trying to do in general. In fact, concurrent writes tend to cause problems, especially if ordering matters.

 Seems like a mighty long execution time for that few objects. What exactly are you doing with them?

ENCRYPTION IS NOT A CRIME

Link to post
Share on other sites

7 hours ago, straight_stewie said:

GPUs are really only good for some fairly specific types of tasks, also, there is a huge learning curve behind what you are trying to do in general. In fact, concurrent writes tend to cause problems, especially if ordering matters.

 Seems like a mighty long execution time for that few objects. What exactly are you doing with them?

checking if they contain a key, if they do add them to a list and write it to a file

EDIT:

And by adding them I re create them to get the names on the keys correct

Back-end developer, electronics "hacker"

Link to post
Share on other sites

1 hour ago, Joveice said:

checking if they contain a key, if they do add them to a list and write it to a file

EDIT:

And by adding them I re create them to get the names on the keys correct

I assume you have a list of things to check and a function you send them too?

 

If so just map the list and function. Sorry about formatting and mistakes in on the bus to work. 




	From multiprocessing pool



	Mylist = [1,2,3,4,5,6?7,8,9]



	Def checking_function(item):



	    //stuff



	 



	P = pool()



	P.map(checking_function, Mylist)



	[/Code]


 

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to post
Share on other sites

39 minutes ago, vorticalbox said:

I assume you have a list of things to check and a function you send them too?

 

If so just map the list and function. Sorry about formatting and mistakes in on the bus to work. 


	From multiprocessing pool



	Mylist = [1,2,3,4,5,6?7,8,9]



	Def checking_function(item):



	    //stuff



	 



	P = pool()



	P.map(checking_function, Mylist)



	[/Code]


 

Damn, that was way easier than everything I found about multiprocessing

Back-end developer, electronics "hacker"

Link to post
Share on other sites

1 hour ago, Joveice said:

Damn, that was way easier than everything I found about multiprocessing

Yeah people make it out to be super hard but it's quite simple in python.

 

Down side to this is that each process has a copy of the data so it can be very ram intensive with large work loads. 

 

Another option is to use green threads with a module called grevent. 

 

This threads will get switch too when a function is ready to be processed this has a cpu over head but much lighter on ram.

 

example of green threads as a pool

 

import gevent.monkey; gevent.monkey.patch_all()
from gevent.pool import Pool
from random import randint

def check_things(number):
    #sleep randomly 1-3 seconds
    #simulating "work"
    gevent.sleep(randint(1,3))
    print('I am number %d' % number)

#list of numbers 0 - 50
my_list = [n for n in range(50)]

#create a pool of 50 threads
#they will complete when ready
p = Pool(50)
p.map(check_things, my_list)

as the work is complete python switches to that task so the quicker ones get done first without needing to wait. I created one thread for each number but you don't need to. This way also assumes you don't need to return anything from the function if you do you can do

 

for item in p.map(check_things, my_list):
    print(item)

You could also look into gevent green threads that will let you make the threads yourself in an array then once they are complete process the results.

 

Output from my example will be something like this.

I am number 9
I am number 2
I am number 0
I am number 7
I am number 4
I am number 5
I am number 1
I am number 3
I am number 6
I am number 8

 

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to post
Share on other sites

58 minutes ago, vorticalbox said:

Yeah people make it out to be super hard but it's quite simple in python.

 

Down side to this is that each process has a copy of the data so it can be very ram intensive with large work loads. 

 

Another option is to use green threads with a module called grevent. 

 

This threads will get switch too when a function is ready to be processed this has a cpu over head but much lighter on ram.

 

example of green threads as a pool

 


import gevent.monkey; gevent.monkey.patch_all()
from gevent.pool import Pool
from random import randint

def check_things(number):
    #sleep randomly 1-3 seconds
    #simulating "work"
    gevent.sleep(randint(1,3))
    print('I am number %d' % number)

#list of numbers 0 - 50
my_list = [n for n in range(50)]

#create a pool of 50 threads
#they will complete when ready
p = Pool(50)
p.map(check_things, my_list)

as the work is complete python switches to that task so the quicker ones get done first without needing to wait. I created one thread for each number but you don't need to. This way also assumes you don't need to return anything from the function if you do you can do

 


for item in p.map(check_things, my_list):
    print(item)

You could also look into gevent green threads that will let you make the threads yourself in an array then once they are complete process the results.

 

Output from my example will be something like this.


I am number 9
I am number 2
I am number 0
I am number 7
I am number 4
I am number 5
I am number 1
I am number 3
I am number 6
I am number 8

 

I got 32GB of 3200MHz so I guess I'm fine by that. But with "has a copy of the data" will they all start from nr 1 or will they do their own? I mean, I can't have it do the same list 12 times making the same output 12 times :P

Back-end developer, electronics "hacker"

Link to post
Share on other sites

1 hour ago, Joveice said:

I got 32GB of 3200MHz so I guess I'm fine by that. But with "has a copy of the data" will they all start from nr 1 or will they do their own? I mean, I can't have it do the same list 12 times making the same output 12 times :P

So unlike compiled languages (such as C/C++), Python is an interpreted language.

That means that when you run "python myscript.py", it will start up the Python interpreter.

The Python interpreter parses and executes your code during runtime.

With compiled languages the code gets converted ahead of time (compile time) to something that the computer can understand.

 

The disadvantage of this approach is the global interpreter lock (GIL).

Even when running multiple threads, the interpreter can only work on one thread at a time.

So if you're program is doing a lot of calculations then it wont get any speed up.

I/O (such as reading/writing a file) is executed by the operating system in which case the interpreter can switch to another thread, resulting in a speed-up over sequential execution.

 

If you want to speed up your compute heavy workloads then you need the multiprocessing library.

It starts up multiple instances of the Python interpreter so they can work side-by-side.

The disadvantage of this approach is that the interpeters all run in their own process which means that they cant share memory (and thus the need for duplication).

In my experience the biggest problem with this is that the data needs to be serializable.

All standard data objects in Python (numbers, strings, lists, dicts, classes (which are dicts)) but you might have problems passing instances of a class from a third party library that was written in C/C++.

 

By the way: I dont think the map function actually duplicates the full array on each process. But instead it probably uses two queues (input and output). Each process takes an item from the input queue, executes the function and adds the result to the output queue.

This is also the kind of data flow that you would want to implement if you dont use the map/apply functions.

 

 

Also, if you're interested in working on very large lists then make sure to read up on iterators.

Iterators generate values as they are needed (instead of creating one massive list of input data before the map function).

If you are only working on lists of numbers then you should look at numpy (and other libraries that extend on numpy).

Its a widely used Python library for working on (multi dimensional) arrays of numbers.

Because it's written in C++ it will be much (in my experience more than 50 times) faster than writing the same code in Python. 

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×