Jump to content

Download and sort a large number of files with wget

Hulkstern

So this might be an odd question. But I was wondering if anyone would know of anyway to download all the server.jar files listed here (just the stable releases category) into folders for each version? I have a bit of anxiety over things disappearing/being lost and wanted to make sure I could start a collection of all the older server versions of Minecraft so that I could be sure I can start one up if need be. Obviously it wouldn't be too hard to maintain however the initial downloading seems like it would benefit from some sort of way to download them all whether it's via a bash script or some other method. I had initially though of using wget with wild card to nab all the versions and just delete the ones I don't want but I found out none of the jar files are named uniquely (all are server.jar). So any tips?

Spoiler

Primary System: Big Rave
CPU: R7 2700x
Cooler: EK-AIO Basic 360
RAM: CORSAIR Vengeance LPX 64GB (4x16GB) DDR4 2400 CL14 (OC @2733Mhz)

Mobo: Asus ROG STRIX X470-F Gaming
GPU-1: Asus ROG STRIX GTX 1070 8G Gaming

Case: Lian Li PC-O11 Dynamic

Windows 10 Boot Disk: Samsung 970 EVO Plus 2TB

Ubuntu 20.04 Boot Disk: Samsung 970 EVO plus 500GB

Fans: Corsair LL120 RGB (x6)
PSU: EVGA BQ 850 W 80+ Bronze Certified


Secondary System: Razer Blade 15 (RTX)
CPU: i7-9750H

RAM: 16GB (Dual Channel, 2667Mhz)
GPU: RTX 2070 Max-Q
SSD: Samsung 970 EVO Plus 2TB

Display: 15.6 inch IPS - 240Hz

RGB: Glorious

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I wrote a small php script for you and also attached the output in the zip

 

Save the html contents of that page into the same folder as that extract.php script as "input.html"

edit the $download_folder variable first if you want to (set to %%FOLDER%% by default, so you could easily do a search and replace in any text editor ... use  / instead of \ in Windows)

run the script  ex php.exe extract.php in Windows

 

It will create in the same folder

list.csv (all downloads on page for clients and servers in a spreadsheet)

download.bat (rename extension on linux) ,

folders.bat (mkdir commands to create subfolders in your download folder) and

download.bat has wget download commands  ex wget "url" -O "download folder / version / filename.jar"

 

script.zip

Link to comment
Share on other sites

Link to post
Share on other sites

It looks like you can match with this to find each server entry:

curl https://mcversions.net/ | grep "minecraft_server-"

Which then you could iterate through that with a for loop (make sure to do IFS=$'\n' so it iterates on each line).

#!/bin/bash
IFS=$'\n'
DOWNLOADS=$(curl https://mcversions.net/ | grep "minecraft_server-")
for i in $DOWNLOADS; do
    href=$(echo $i | grep -o -P 'href=".*?"')
    url=${href:6:-1}
    
    download=$(echo $i | grep -o -P 'download=".*?"')
    version=${download:27:-5}
    
    mkdir -p $version
    curl "$url" -o "$version/server.jar"
done

That said, this uses curl, not wget, though most environments with bash probably have curl, too.

 

edit: If you somehow have wget but not curl, you can use this instead

#!/bin/bash
IFS=$'\n'
DOWNLOADS=$(wget -O - https://mcversions.net/ | grep "minecraft_server-")
for i in $DOWNLOADS; do
    href=$(echo $i | grep -o -P 'href=".*?"')
    url=${href:6:-1}
    
    download=$(echo $i | grep -o -P 'download=".*?"')
    version=${download:27:-5}
    
    mkdir -p $version
    wget "$url" -O "$version/server.jar"
done

 

Edited by Kavawuvi
Added a script for wget, too
Link to comment
Share on other sites

Link to post
Share on other sites

On 1/12/2020 at 11:47 PM, mariushm said:

I wrote a small php script for you and also attached the output in the zip

 

Save the html contents of that page into the same folder as that extract.php script as "input.html"

edit the $download_folder variable first if you want to (set to %%FOLDER%% by default, so you could easily do a search and replace in any text editor ... use  / instead of \ in Windows)

run the script  ex php.exe extract.php in Windows

 

It will create in the same folder

list.csv (all downloads on page for clients and servers in a spreadsheet)

download.bat (rename extension on linux) ,

folders.bat (mkdir commands to create subfolders in your download folder) and

download.bat has wget download commands  ex wget "url" -O "download folder / version / filename.jar"

 

script.zip 8.11 kB · 0 downloads

 

On 1/12/2020 at 11:58 PM, Kavawuvi said:

It looks like you can match with this to find each server entry:


curl https://mcversions.net/ | grep "minecraft_server-"

Which then you could iterate through that with a for loop (make sure to do IFS=$'\n' so it iterates on each line).


#!/bin/bash
IFS=$'\n'
DOWNLOADS=$(curl https://mcversions.net/ | grep "minecraft_server-")
for i in $DOWNLOADS; do
    href=$(echo $i | grep -o -P 'href=".*?"')
    url=${href:6:-1}
    
    download=$(echo $i | grep -o -P 'download=".*?"')
    version=${download:27:-5}
    
    mkdir -p $version
    curl "$url" -o "$version/server.jar"
done

That said, this uses curl, not wget, though most environments with bash probably have curl, too.

 

edit: If you somehow have wget but not curl, you can use this instead


#!/bin/bash
IFS=$'\n'
DOWNLOADS=$(wget -O - https://mcversions.net/ | grep "minecraft_server-")
for i in $DOWNLOADS; do
    href=$(echo $i | grep -o -P 'href=".*?"')
    url=${href:6:-1}
    
    download=$(echo $i | grep -o -P 'download=".*?"')
    version=${download:27:-5}
    
    mkdir -p $version
    wget "$url" -O "$version/server.jar"
done

 

Thank you both for the reply! You'll have to forgive me for the delayed reply as life got a little busy.

I'll try both and let you know how it goes (=

 

Oh and @mariushm I looked at the files that you sent with your script and from what I can tell you already have it set up to where all I would have to do is run the folders and then download scripts. Is that right or should I go ahead and run the extract script with the input.html?

Spoiler

Primary System: Big Rave
CPU: R7 2700x
Cooler: EK-AIO Basic 360
RAM: CORSAIR Vengeance LPX 64GB (4x16GB) DDR4 2400 CL14 (OC @2733Mhz)

Mobo: Asus ROG STRIX X470-F Gaming
GPU-1: Asus ROG STRIX GTX 1070 8G Gaming

Case: Lian Li PC-O11 Dynamic

Windows 10 Boot Disk: Samsung 970 EVO Plus 2TB

Ubuntu 20.04 Boot Disk: Samsung 970 EVO plus 500GB

Fans: Corsair LL120 RGB (x6)
PSU: EVGA BQ 850 W 80+ Bronze Certified


Secondary System: Razer Blade 15 (RTX)
CPU: i7-9750H

RAM: 16GB (Dual Channel, 2667Mhz)
GPU: RTX 2070 Max-Q
SSD: Samsung 970 EVO Plus 2TB

Display: 15.6 inch IPS - 240Hz

RGB: Glorious

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

You can run the script to update the batch files  (run it when new downloads are added to the page)

 

For the current set of downloads, you can use the bundled bat files and just do a search and replace to replace %%FOLDER%% with an actual path.

You can add a parameter to the wget command to skip existing files (don't download previously downloaded files)

 

Oh... another option you have is to use a website downloader like httrack : https://www.httrack.com/

You can make a custom configuration file to restrict download only to the html file and the downloads on the page (limit to domain and subdomain, to html, exe and zip, jar etc), set depth to only start page and links on page...

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×