Jump to content

Best way to store a Linux repository locally

Hi everyone, here comes a probably odd question.

I have the need to keep a local mirror of Debian 10 repositories on my hard drive. Due to internet access difficulties and various reasons using the repos out off the internet it’s not an option for me. I managed to mirror the official Repos with debmirror easily but now my question is this: How is it better to keep all of this files on my HHD? The repository it’s about 150Gb and it is made of lots of small files as you might already know. So is it ok for my Hdd to have all of this files there or would it be better to keep them somehow in an big file that contains it all like an .iso, .img or .dng? I’m wandering about this because I don’t want my hard drive to be more prone to fail (it’s already an old drive with more that 5 years of use) and to buy a hard drive in my country right now it’s very expensive so not an option.

I would appreciate your help, thanks in advance.

Link to comment
Share on other sites

Link to post
Share on other sites

No, it's better to store it uncompressed.

Using a container, if the container failed, you may loose all the files.

Just copy the files and then unplug the drive store it in a safe dry place.

You may want to have another copy in another drive for redundancy.

It's only 150gb not too big.

Ryzen 5700g @ 4.4ghz all cores | Asrock B550M Steel Legend | 3060 | 2x 16gb Micron E 2666 @ 4200mhz cl16 | 500gb WD SN750 | 12 TB HDD | Deepcool Gammax 400 w/ 2 delta 4000rpm push pull | Antec Neo Eco Zen 500w

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, Lsantiesteban.cuba said:

Hi everyone, here comes a probably odd question.

I have the need to keep a local mirror of Debian 10 repositories on my hard drive. Due to internet access difficulties and various reasons using the repos out off the internet it’s not an option for me. I managed to mirror the official Repos with debmirror easily but now my question is this: How is it better to keep all of this files on my HHD? The repository it’s about 150Gb and it is made of lots of small files as you might already know. So is it ok for my Hdd to have all of this files there or would it be better to keep them somehow in an big file that contains it all like an .iso, .img or .dng? I’m wandering about this because I don’t want my hard drive to be more prone to fail (it’s already an old drive with more that 5 years of use) and to buy a hard drive in my country right now it’s very expensive so not an option.

I would appreciate your help, thanks in advance.

Thinking purely from a performance perspective, Keeping the repo as small files will be better then combining it to an image file. Especially if you defrag your disc at all, the smaller files will allow your OS to re-organise the data easier. 

Fine you want the PSU tier list? Have the PSU tier list: https://linustechtips.com/main/topic/1116640-psu-tier-list-40-rev-103/

 

Stille (Desktop)

Ryzen 9 3900XT@4.5Ghz - Cryorig H7 Ultimate - 16GB Vengeance LPX 3000Mhz- MSI RTX 3080 Ti Ventus 3x OC - SanDisk Plus 480GB - Crucial MX500 500GB - Intel 660P 1TB SSD - (2x) WD Red 2TB - EVGA G3 650w - Corsair 760T

Evoo Gaming 15"
i7-9750H - 16GB DDR4 - GTX 1660Ti - 480GB SSD M.2 - 1TB 2.5" BX500 SSD 

VM + NAS Server (ProxMox 6.3)

1x Xeon E5-2690 v2  - 92GB ECC DDR3 - Quadro 4000 - Dell H310 HBA (Flashed with IT firmware) -500GB Crucial MX500 (Proxmox Host) Kingston 128GB SSD (FreeNAS dev/ID passthrough) - 8x4TB Toshiba N300 HDD

Toys: Ender 3 Pro, Oculus Rift CV1, Oculus Quest 2, about half a dozen raspberry Pis (2b to 4), Arduino Uno, Arduino Mega, Arduino nano (x3), Arduino nano pro, Atomic Pi. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, BrinkGG said:

Thinking purely from a performance perspective, Keeping the repo as small files will be better then combining it to an image file. Especially if you defrag your disc at all, the smaller files will allow your OS to re-organise the data easier. 

Thanks @brinkgg

i haven’t thought about that. Fragmentation it’s not something I have had in mind for quite some time since supposedly ext4 shouldn’t present much problems with it. But it is true that making the os might have a better performance if reads the data directly from the drive and not from an image that it has to keep mounted.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, SupaKomputa said:

No, it's better to store it uncompressed.

Using a container, if the container failed, you may loose all the files.

Just copy the files and then unplug the drive store it in a safe dry place.

You may want to have another copy in another drive for redundancy.

It's only 150gb not too big.

I would be using the Repos rather frequently from that drive and it would be connected to my Linux nas server. I’ll try to keep it in at least to drives for redundancy. 150Gb it’s not too much but it contains hundreds thousands of small files of few KB each. That’s my concern.

Link to comment
Share on other sites

Link to post
Share on other sites

On a healthy drive, you will have no problems.

Put it in a raid 1, and you should be safe.

Ryzen 5700g @ 4.4ghz all cores | Asrock B550M Steel Legend | 3060 | 2x 16gb Micron E 2666 @ 4200mhz cl16 | 500gb WD SN750 | 12 TB HDD | Deepcool Gammax 400 w/ 2 delta 4000rpm push pull | Antec Neo Eco Zen 500w

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, SupaKomputa said:

On a healthy drive, you will have no problems.

Put it in a raid 1, and you should be safe.

Again, Raid it’s not an option at this moment. Hard drive prices in my country has grown exponentially since we relay on back market offers with supply coming from abroad and the Covid 19 situation has us on total lockdown.

For some context a 120Gb SSD it’s around $100 right now.

Link to comment
Share on other sites

Link to post
Share on other sites

@Lsantiesteban.cuba: I gather that you have a very slow / intermittent internet connection, and you need to contact the Debian repositories quite often (from perhaps many different computers?). One question is (in case there are many computers using this mirror), do they need to be in sync?

 

I would think about setting up a proxy for your needs, especially if there is (at least somewhat) consistent, but slow, internet connection. That could reduce the disk space requirements drasticallỵ.

 

I.e. instead of a full Debian mirror host your own (HTTP/FTP) proxy server. I have never done something similar but it should be possible. However, not all computers might be in sync if this is the case - i.e. they will be using different versions of said packages. In case internet connection is intermittent / not always available, a proxy might not work well; but it might still suite your needs.

 

By choosing more stable Debian release it might no update that often; or if it does, it's mostly security updates. Which means, a package will not be updated often in the main repository and is more likely to reside in your proxy, when a client computer tries to upgrade.

 

One can even configure a caching proxy just for APT, it seems. I searched by ducduckgo and this popped up: https://kifarunix.com/configure-apt-proxy-on-debian-10-buster/(I have not tried this!) That has the same problem as a general HTTP proxy, but 1) will not proxy/cache all HTTP and 2) is "aware" of the repository, and as such, I would assume, much easier to configure so that only the repositories are cached at the proxy.

 

Compression/container does not make sense, as the main repositories are already compressed. Also, in case you (or clients) need to access the mirror/proxy often, you are going to need to re-create the whole archive, in case the mirror is upgraded. It would bring more problems and gain nothing.

 

Fragmentation is not a problem on Linux/Unix filesystems, so that advice wouldn't really solve anything IMHO (if defragmentation utilities even exist for your filesystem). It is very unlikely a client would make a sequential copy of (parts of the) repository, so random disk access will be happening in any case.

 

You may also want to ask at some Debian-specific forum/mailing list, as this is quite an advanced /specific thing ... there might be others who have had a similar use case.

 

EDIT: Also, see apt-cacher-ng. It might be something which could work for you.

Edited by Wild Penquin
apt-cacher-ng; emphasis
Link to comment
Share on other sites

Link to post
Share on other sites

Also, in case you are really worried about hard disk failure: having lots of files is not an issue. What matters is read/write rate and start/stop cycles - actually, many disks prefer to be spinning all the time instead of start/stop cycles - even if the uptime is long! Depending on disk it might be better to keep it spinning for 24 hours than cycle once per day, or even for a full week; depends on for what kind of usage the disk was optimized for. For SSDs start/stop cycles is not a factor.

 

Overheating is another issue, but don't overcool mechanical drives, either! They usually have some optimal temperature range, which is somewhat warm/up to quite hot; just don't keep them in a compartment where there is no air circulation whatsoever. SSDs are more prone to overheating - especially M.2 drives; SATA drives usually have a larger casing and as such more surface area, which helps quite a bit.

 

It might make sense to try to reduce seeks (for mech drives to reduce wear, for SSDs to reduce heat). Have a lot of RAM (if possible, but I would presume that is expensive, too) to increase caching changes and choose a sensible file system designed for many small files (ext4 should be fine).

Edited by Wild Penquin
Some minor clarifications and tweaks
Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Wild Penquin said:

Also, in case you are really worried about hard disk failure: having lots of files is not an issue. What matters is read/write rate and start/stop cycles - actually, many disks prefer to be spinning all the time instead of start/stop cycles - even if the uptime is long! Depending on disk it might be better to keep it spinning for 24 hours than cycle once per day, or even for a full week; depends on for what kind of usage the disk was optimized for. For SSDs start/stop cycles is not a factor.

 

Overheating is another issue, but don't overcool mechanical drives, either! They usually have some optimal temperature range, which is somewhat warm/up to quite hot; just don't keep them in a compartment where there is no air circulation whatsoever. SSDs are more prone to overheating - especially M.2 drives; SATA drives usually have a larger casing and as such more surface area, which helps quite a bit.

 

It might make sense to try to reduce seeks (for mech drives to reduce wear, for SSDs to reduce heat). Have a lot of RAM (if possible, but I would presume that is expensive, too) to increase caching changes and choose a sensible file system designed for many small files (ext4 should be fine).

Thanks a lot @Wild Penquin

I’ve copied the repository to my hard drive as is and thanks to your comments now I’m less worried of it making the hdd more prone to fail. My situation here it’s that we’re trying to promote the use of Linux and OpenSource software so we connected two apartment buildings and installed Linux and the computers of many of the people living there. In my country Internet access its limited at our houses. None of the buildings nor my whole neighborhood has DSL so we rely only on LTE witch is more expensive and not everyone would be able to hotspot into their machines. So by hosting repositories that I’ve been able to mirror myself or copied from universities that already do the same thing now all of this users can install whatever package they need and we can tinker more easily without expending 4-8 dollars every time we set up a whole new system.

I really appreciate the help.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×