Jump to content

Python Web Crawler

noobhawkia

I made a simple web crawler to download images from a website but I dont know how to have the default filename of the image that is download.

 

I want to keep the default filename of the image any way to do that?

and I also had a doubt, what if while downloading two filenames are the same? Does python replace that file or does it rename the file which is being downloaded?

 

This is the program I made which basically downloads an image from the website and gives it a random name.

[code]
import requests
from bs4 import BeautifulSoup
import urllib.request
import random

def trade_spider(max_pages):
    page=1
    while page<=max_pages:
        url='http://konachan.net/post?page='+str(page)
        source_code=requests.get(url)
        plain_text=source_code.text
        soup=BeautifulSoup(plain_text)

        for link in soup.findAll('a',{'class':'directlink largeimg'}):
            href=link.get('href')
            print(href)
            name=random.randrange(1,1001)
            fullname= str(name) + ".jpg"
            urllib.request.urlretrieve(href,fullname)
        for link in soup.findAll('a',{'class':'directlink smallimg'}):
            href=link.get('href')
            print(href)
            name=random.randrange(1,1001)
            fullname= str(name) + ".jpg"
            urllib.request.urlretrieve(href,fullname)
        page+=1



trade_spider(2)

[/code]

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×