Jump to content

python remove, not working

Go to solution Solved by fizzlesticks,
1 hour ago, vorticalbox said:

So how come I could append the working ones to a new list?

Pool.map returns a list of all the return values from the function.

 

So for example, to remove all the odd numbers in a list you can do:

from multiprocessing import Pool
from itertools import compress


def f(s):
    return s % 2 == 0

if __name__ == '__main__':
    r = [1,2,3,4,5,6]
    print(r)
    with Pool(5) as p:
        ret = p.map(f, r)
    print(ret)
    r = list(compress(r, ret))
    print(r)

 

So i am creating a proxy scraper and I am trying to remove not working :/

 

so all the proxies are scraped and stored in list called proxies I then try open a webpage (this case facebook.com) using the proxy and then I want it to remove if it doesn't work.

 

The proxy list is map for speed.

 

def proxyCheck(p):
    global proxies
    try:
        session.get('https://facebook.com', proxies={'https' :p}, timeout=1)
    except:
        print "%s removed" % p
        proxies.remove(p)


print "removing down proxies (%d)" % len(proxies)
p = Pool(cpu_count())
p.map(proxyCheck, proxies[:]) #[:] to make a copy, I read you can't remove as your iterarting 
print "using %d" % len(proxies)

but the length stays the same and nothing is removed, i've tried appends to a new list and that doesn't worth either.

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

When you use Multiprocessing.Pool each process get it's own copy of the data. You're altering that copy not the original in the main process. 

From your proxyCheck function you can return a bool saying whether or not that proxy worked then at the end use that list of bools to filter the proxy list.

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, fizzlesticks said:

When you use Multiprocessing.Pool each process get it's own copy of the data. You're altering that copy not the original in the main process. 

From your proxyCheck function you can return a bool saying whether or not that proxy worked then at the end use that list of bools to filter the proxy list.

So how come I could append the working ones to a new list? my work around was to save to a file, clear the list then load and delete the file.

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, vorticalbox said:

So how come I could append the working ones to a new list?

Pool.map returns a list of all the return values from the function.

 

So for example, to remove all the odd numbers in a list you can do:

from multiprocessing import Pool
from itertools import compress


def f(s):
    return s % 2 == 0

if __name__ == '__main__':
    r = [1,2,3,4,5,6]
    print(r)
    with Pool(5) as p:
        ret = p.map(f, r)
    print(ret)
    r = list(compress(r, ret))
    print(r)

 

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

45 minutes ago, fizzlesticks said:

r = list(compress(r, ret))

could you explain this line? also never thought of using with with a pool ^_^

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, vorticalbox said:

could you explain this line? also never thought of using with with a pool ^_^

Compress takes 2 lists the first is the one you want to remove items from and the second is a list of flags. If the element in list 2 evaluates to true the corresponding element is kept in list 1, if it's false it is removed. It returns a generator like most functions in Py3 so you need to convert to a list to actually get the values. 

 

Pools use resources, after you're done with them they need to be closed by calling pool.close(). Using with is the best way to make sure it gets done.

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, fizzlesticks said:

Compress takes 2 lists the first is the one you want to remove items from and the second is a list of flags. If the element in list 2 evaluates to true the corresponding element is kept in list 1, if it's false it is removed. It returns a generator like most functions in Py3 so you need to convert to a list to actually get the values. 

 

Pools use resources, after you're done with them they need to be closed by calling pool.close(). Using with is the best way to make sure it gets done.

 

so it keeps all the items that match the second list? 

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, vorticalbox said:

so it keeps all the items that match the second list? 

It keeps the items that have a true value in the second list. Essentially it looks something like

def compress(values, flags):
    new_list = []
    for i in range(len(values)):
        if flages[i]:
             new_list.append(values[i])
    return new_list
	

 

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

@fizzlesticks woudl there be a way to access the length of ret in the mapped function? thinking I could use it's length as a process indicator.

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, vorticalbox said:

@fizzlesticks woudl there be a way to access the length of ret in the mapped function? thinking I could use it's length as a process indicator.

Sure is. Using tqdm for the progress bar would be like this

 

from multiprocessing import Pool
from itertools import compress
import tqdm

def f(s):
    return s % 2 == 0

if __name__ == '__main__':
    r = list(range(100000))
    print(r)
    with Pool(5) as p:
        ret = []
        for i in tqdm.tqdm(p.map(f, r), total=len(r)):
            ret.append(i)
    print(ret)
    r = list(compress(r, ret))
    print(r)

If you don't want to use tqdm, just get rid of those and loop over the p.map directly then update your progress somehow by the length of ret.

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, fizzlesticks said:

Sure is. Using tqdm for the progress bar would be like this

 


from multiprocessing import Pool
from itertools import compress
import tqdm

def f(s):
    return s % 2 == 0

if __name__ == '__main__':
    r = list(range(100000))
    print(r)
    with Pool(5) as p:
        ret = []
        for i in tqdm.tqdm(p.map(f, r), total=len(r)):
            ret.append(i)
    print(ret)
    r = list(compress(r, ret))
    print(r)

If you don't want to use tqdm, just get rid of those and loop over the p.map directly then update your progress somehow by the length of ret.


for i in p.map(f,r), total=len(r):

   print('{0}/{1}'.format(i,total)

 

???

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, vorticalbox said:

 


for i in p.map(f,r), total=len(r):

   print('{0}/{1}'.format(i,total)

 

 

???

total=len(r) is part of the tqdm call. The 'i' in the loop is the return value from one of the function calls, not a counter. You'll need to add 'i' to some list then get the length of that list.

 

And remember that printing is horribly slow, so doing it like that is going to significantly slow down the program.

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, fizzlesticks said:

total=len(r) is part of the tqdm call. The 'i' in the loop is the return value from one of the function calls, not a counter. You'll need to add 'i' to some list then get the length of that list.

 

And remember that printing is horribly slow, so doing it like that is going to significantly slow down the program.

I will look into tqdm.

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, fizzlesticks said:

Sure is. Using tqdm for the progress bar would be like this

 


from multiprocessing import Pool
from itertools import compress
import tqdm

def f(s):
    return s % 2 == 0

if __name__ == '__main__':
    r = list(range(100000))
    print(r)
    with Pool(5) as p:
        ret = []
        for i in tqdm.tqdm(p.map(f, r), total=len(r)):
            ret.append(i)
    print(ret)
    r = list(compress(r, ret))
    print(r)

If you don't want to use tqdm, just get rid of those and loop over the p.map directly then update your progress somehow by the length of ret.

just tried this and it doesn't show a bar until 100%, any ideas?

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, vorticalbox said:

just tried this and it doesn't show a bar until 100%, any ideas?

Try running it in a command line if you're not already. And increase the size of the 'r' list to make sure it's actually running long enough to see anything happen.

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

add sleep(1) and it is showing a bar now :) guess it's just going to fast

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

@fizzlesticks even if i do 1 thread, without sleep() there is no bar.

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, vorticalbox said:

@fizzlesticks even if i do 1 thread, without sleep() there is no bar.

In the tqdm call try setting the parameter mininterval=0

If that doesn't help I'm not sure what's wrong.

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, fizzlesticks said:

In the tqdm call try setting the parameter mininterval=0

If that doesn't help I'm not sure what's wrong.

I think the list is just too small :( it has ~600 and it doesn't show unless I slow it down. this working for the big list though :)

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, fizzlesticks said:

In the tqdm call try setting the parameter mininterval=0

If that doesn't help I'm not sure what's wrong.

I fix it by doing 

 

with Pool(threads) as p:
            ret = []
            for i in tqdm.tqdm(p.imap_unordered(f, r), total=len(r)):
                ret.append(i)

 

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, vorticalbox said:

I fix it by doing 

 


with Pool(threads) as p:
            ret = []
            for i in tqdm.tqdm(p.imap_unordered(f, r), total=len(r)):
                ret.append(i)

 

That fixes the problem because imap_unordered is probably running a lot slower than normal map, not sure I'd call that a real solution. I tried a bunch of stuff and just can't reproduce the problem you're having so I have no idea what a real solution would be.

 

Edit: Also if you're doing what I suggested with returning true/false depending on whether the proxy works or not, using unordered won't work because the values in the return list won't match up with the correct elements in the proxy list.

1474412270.2748842

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, fizzlesticks said:

That fixes the problem because imap_unordered is probably running a lot slower than normal map, not sure I'd call that a real solution. I tried a bunch of stuff and just can't reproduce the problem you're having so I have no idea what a real solution would be.

 

Edit: Also if you're doing what I suggested with returning true/false depending on whether the proxy works or not, using unordered won't work because the values in the return list won't match up with the correct elements in the proxy list.

oh :( didn't think of that lol

 

edit. When I'm on the pc ill take a video, basically it shows nothing for a while then jumps to 100%

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×