Jump to content

This probably sounds stupid but I want to download Wikipedia...all of it. I started with trying to use Kiwix, but apparently that's outdated - about a year outdated. The only two other things I know are XOWA and wikitaxi. I don't know much about them. wikitaxi looks the most simple so I'm trying that. What is confusing is the xml to download. Would I download the 11.3gb xml.bz2 from here: http://dumps.wikimedia.org/enwiki/20141008/ or do I download one from here: http://dumps.wikimedia.org/enwiki/latest/

Any other suggestions would be great. Thanks! 

 

CPU - FX 8350 @ 4.5GHZ GPU - Radeon 5700  Mobo - M5A99FX Pro R2.0 RAM - Crucial Ballistix 16GB @ 1600 PSU - Corsair CX600M CPU Cooler - Hyper 212 EVO Storage - Samsung EVO 250GB, WD Blue 1TB

Link to comment
https://linustechtips.com/topic/243755-downloading-wikipedia-help/
Share on other sites

Link to post
Share on other sites

never heard of downloading it. but damn, that's huge to download all of wikipedia.

"If it has tits or tires, at some point you will have problems with it." -@vinyldash303

this is probably the only place i'll hang out anymore: http://linustechtips.com/main/topic/274320-the-long-awaited-car-thread/

 

Current Rig: Intel Core 2 Quad Q6600, Abit IN9-32MAX nForce 680i board, Galaxy GT610 1GB DDR3 gpu, Cooler Master Mystique 632S Full ATX case, 1 2TB Seagate Barracuda SATA and 1x200gb Maxtor SATA drives, 1 LG SATA DVD drive, Windows 10. All currently runs like shit :D 

Link to post
Share on other sites

You know, there was an XKCD article about this very thing.

 

I wouldn't bother about it because Wikipedia is constantly changing.

Unless you had a way to automatically download any changes made to Wikipedia.

"It pays to keep an open mind, but not so open your brain falls out." - Carl Sagan.

"I can explain it to you, but I can't understand it for you" - Edward I. Koch

"I didn't die! I performed a tactical reset!" - Apollolol

Link to post
Share on other sites

Unless you had a way to automatically download any changes made to Wikipedia.

I guess you could try and figure out an RSS thing to update a file by URL path but that would be a headache to setup. Would make a nice offline encyclopedia at least.

.

Link to post
Share on other sites

Unless you had a way to automatically download any changes made to Wikipedia.

Write a document to download Wikipedia one a week/day and replace the existing one that you had already.

Someone told Luke and Linus at CES 2017 to "Unban the legend known as Jerakl" and that's about all I've got going for me. (It didn't work)

 

Link to post
Share on other sites

You know, there was an XKCD article about this very thing.

 

I wouldn't bother about it because Wikipedia is constantly changing.

 

never heard of downloading it. but damn, that's huge to download all of wikipedia.

 

Progress! I decided not to download the one with pictures. It's still 12gb oPaZn5Y.png

 

CPU - FX 8350 @ 4.5GHZ GPU - Radeon 5700  Mobo - M5A99FX Pro R2.0 RAM - Crucial Ballistix 16GB @ 1600 PSU - Corsair CX600M CPU Cooler - Hyper 212 EVO Storage - Samsung EVO 250GB, WD Blue 1TB

Link to post
Share on other sites

I don't think you have 20 petabytes of storage on your PC, so no.

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to post
Share on other sites

I don't think you have 20 petabytes of storage on your PC, so no.

It's not as big as you might think. The whole wiki is 40gb total with pictures. 

 

CPU - FX 8350 @ 4.5GHZ GPU - Radeon 5700  Mobo - M5A99FX Pro R2.0 RAM - Crucial Ballistix 16GB @ 1600 PSU - Corsair CX600M CPU Cooler - Hyper 212 EVO Storage - Samsung EVO 250GB, WD Blue 1TB

Link to post
Share on other sites

Good luck!

 

Wikipedia is mostly text?

I am genuinely surprised that Wikipedia, even in pure text form is only 12 GB....

EDIT: Only 40GB with pictures? They could be using a freaking laptop for a server, lol....

My account is almost entirely dormant. Hope you all are having a grand time. Many years of fun were had here.

Link to post
Share on other sites

Wikipedia is mostly text?

It's not as big as you might think. The whole wiki is 40gb total with pictures. 

 

The database was 133GB in 2008. Can you guess how large it is today?

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to post
Share on other sites

It's not as big as you might think. The whole wiki is 40gb total with pictures. 

Better use of 40 gb than cod haha

Case: Phanteks Evolve X with ITX mount  cpu: Ryzen 3900X 4.35ghz all cores Motherboard: MSI X570 Unify gpu: EVGA 1070 SC  psu: Phanteks revolt x 1200W Memory: 64GB Kingston Hyper X oc'd to 3600mhz ssd: Sabrent Rocket 4.0 1TB ITX System CPU: 4670k  Motherboard: some cheap asus h87 Ram: 16gb corsair vengeance 1600mhz

                                                                                                                                                                                                                                                          

 

 

Link to post
Share on other sites

There's a program called Httrack that is a web crawler that can do just that. I tried it on Linus tech tips but it was litterally downloading to much, so I made it replace all Links to Images to Linus tech tips links.

But it was taking forever since Linus tech tips isn't the most responsive and I was saving it into a slow NAS so it was bottlenecking.

If you want to try that that would work well since it essentially makes a copy of a website, that can actually be run on a server. Minus logins and Other things that require external information.

A riddle wrapped in an enigma , shot to the moon and made in China

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×