Jump to content

Website Crawler?

Go to solution Solved by N0ps32,

Here I wrote you a script that can do it.

https://dl.tutorial.technology/yandere.zip

 

Install nodeJS on your PI.

"cd" into the script folder and run "npm install".

When you run the script it'll crawl the website and if a new build is found it will download it into a directory named "builds".

You can run the script with "node App.js".

 

Now just create a cronjob to run the script every X hours and you are done.

I want to crawl this website and whenever there is a new update to this website:

http://yanderegame.com/latest/

It downloads the latest build and puts it in a folder with the other builds, let it be my own FTP server or on google drive/ anywhere you know how to do this... So if you know FTP but not google drive, just tell me anyways, i want to know.

Anyone know how to do this?

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/
Share on other sites

Link to post
Share on other sites

There is nothing on the website to indicate there is a new build unless you went by file size which wouldn't really work so you won't be able to do it

I mean

Could i make it download every time i pressed a button/ on a time management?

like every 15 days?

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6681863
Share on other sites

Link to post
Share on other sites

Set the download link as a bookmark? Its: http://163.172.13.17/

I know

but how do i download that link every 15 days?!?!

into google drive or somewhere else.!??!?!?!?!

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6681932
Share on other sites

Link to post
Share on other sites

You could create a program that runs on start up... It will read a text file for a number... If the number is 15 then open the download link/ download the file... else increment the number in the text file by one

what programming language?

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6682078
Share on other sites

Link to post
Share on other sites

You could create a program that runs on start up... It will read a text file for a number... If the number is 15 then open the download link/ download the file... else increment the number in the text file by one

I know that crontab on my rasperry pi has a time slot that downloads when a given time is reached

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6682084
Share on other sites

Link to post
Share on other sites

The file has the date in the name.

Even more accurate then!

How do i make it read the name change every day

if (

http://yanderegame.com/latest/ = changed

)

then (

save http://163.172.13.17 to folder "yansimbuilds"

 

OR

compare (

http://yanderegame.com/latest/ = latestwebsitedownload.html

)

then ( 

do nothing

)

else(

download http://163.172.13.17/

then update latestwebsitedownload.html

)

anyone know how to make this code work?

I dont care what language as long as it works

 

@Brenz

@BlueDragon

@looney

@Whiskers

@Blade of Grass

@prolemur

(tagging people who might know about this type of stuff)

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6682735
Share on other sites

Link to post
Share on other sites

I would write something that downloads the file in question at a set interval, say every day at 12:00 AM. For the first download it will download the file and write it's MD5 checksum to a log. Then after 24 hours it will download it again and check-sum the file to see if it is the same. If it's the same, discard the newly downloaded file. If the check does not match then it will move the file to the desired directory. After it moves the newer file it will log the new check-sum value.

 

You could also have it download every file twice before check-summing, just to make sure the first one wasn't a corrupted download.

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6687040
Share on other sites

Link to post
Share on other sites

I would write something that downloads the file in question at a set interval, say every day at 12:00 AM. For the first download it will download the file and write it's MD5 checksum to a log. Then after 24 hours it will download it again and check-sum the file to see if it is the same. If it's the same, discard the newly downloaded file. If the check does not match then it will move the file to the desired directory. After it moves the newer file it will log the new check-sum value.

 

You could also have it download every file twice before check-summing, just to make sure the first one wasn't a corrupted download.

Wouldnt that kill the computer faster? I am going to be running this on a raspberry pi, remember...

 

But i will definitely get whatever i can get, so if you could program that, i would be very thankful!

 

It doesnt have to be on a RPi but if it could run on there i would be happier

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6688178
Share on other sites

Link to post
Share on other sites

Here I wrote you a script that can do it.

https://dl.tutorial.technology/yandere.zip

 

Install nodeJS on your PI.

"cd" into the script folder and run "npm install".

When you run the script it'll crawl the website and if a new build is found it will download it into a directory named "builds".

You can run the script with "node App.js".

 

Now just create a cronjob to run the script every X hours and you are done.

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6692579
Share on other sites

Link to post
Share on other sites

Here I wrote you a script that can do it.

https://dl.tutorial.technology/yandere.zip

 

Install nodeJS on your PI.

"cd" into the script folder and run "npm install".

When you run the script it'll crawl the website and if a new build is found it will download it into a directory named "builds".

You can run the script with "node App.js".

 

Now just create a cronjob to run the script every X hours and you are done.

shoot!

crontab for some odd reason doesnt work!

):

Thanks though

I greatly appreciate the help!!!

If you have the time, then could you find a way to implement this without crontab?

I dont mind if you cant, but it would be so nice of you if you could!

:)

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6693111
Share on other sites

Link to post
Share on other sites

shoot!

crontab for some odd reason doesnt work!

):

Thanks though

I greatly appreciate the help!!!

If you have the time, then could you find a way to implement this without crontab?

I dont mind if you cant, but it would be so nice of you if you could!

:)

 

There is no way that crontab doesn't work that's a core feature of Linux.

How did you set-up your crontab file? What does it look like?

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6714109
Share on other sites

Link to post
Share on other sites

There is no way that crontab doesn't work that's a core feature of Linux.

How did you set-up your crontab file? What does it look like?

it throws me an error

im not at home RN soooo

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6715133
Share on other sites

Link to post
Share on other sites

There is no way that crontab doesn't work that's a core feature of Linux.

How did you set-up your crontab file? What does it look like?

but yes

it says something about a mailing system

i cant remember right now

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6715141
Share on other sites

Link to post
Share on other sites

There is no way that crontab doesn't work that's a core feature of Linux.

How did you set-up your crontab file? What does it look like?

BTW raspberry pi's run a ported version of linux that is called raspbian

its a little differnet

OFF TOPIC: I suggest every poll from now on to have "**CK EA" option instead of "Other"

Link to comment
https://linustechtips.com/topic/499978-website-crawler/#findComment-6715146
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×