Python Web Crawler

noobhawkia · October 30, 2016

I made a simple web crawler to download images from a website but I dont know how to have the default filename of the image that is download.

I want to keep the default filename of the image any way to do that?

and I also had a doubt, what if while downloading two filenames are the same? Does python replace that file or does it rename the file which is being downloaded?

This is the program I made which basically downloads an image from the website and gives it a random name.

[code]
import requests
from bs4 import BeautifulSoup
import urllib.request
import random

def trade_spider(max_pages):
    page=1
    while page<=max_pages:
        url='http://konachan.net/post?page='+str(page)
        source_code=requests.get(url)
        plain_text=source_code.text
        soup=BeautifulSoup(plain_text)

        for link in soup.findAll('a',{'class':'directlink largeimg'}):
            href=link.get('href')
            print(href)
            name=random.randrange(1,1001)
            fullname= str(name) + ".jpg"
            urllib.request.urlretrieve(href,fullname)
        for link in soup.findAll('a',{'class':'directlink smallimg'}):
            href=link.get('href')
            print(href)
            name=random.randrange(1,1001)
            fullname= str(name) + ".jpg"
            urllib.request.urlretrieve(href,fullname)
        page+=1



trade_spider(2)