Jump to content

[SOLVED] unzip large file - identify the bottleneck

Hi 🙂

 

I am currently trying to open (not unzip) a large file on my computer in order to browse its content

The operation takes approx. 11 minutes to complete.

I'm working on the current version of Win10 x64.

 

During the operation (according to task manager):

 - CPU usage: 5%

 - RAMusage: 5-10%

 - SSD* usage: 0%

 - network usage: 0%

 - GPU uage: 0%

 

What is the bottleneck?

 

Thank you very much in advance for your help.

 

Best,

-a-

 

* M.2 NVMe SSD. PCIe x4

 

 

EDIT: It takes virtually the same time if I try to open the zip file from an internal HDD rather than the system SSD.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Archer42 said:

Single CPU core, most likely.

indeed.

single thread actually... (so "half a core" ?)

 

Is there anything I could do about it?

I guess not, right?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, asheenlevrai said:

Is there anything I could do about it?

I guess not, right?

Well, it depends on software.

If you have to work with zip I am not aware of any multi-threaded implementations.

There are other formats though, with packers/unpackers which can use multiple cores. 7zip will be the simplest option for windows.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Archer42 said:

Well, it depends on software.

If you have to work with zip I am not aware of any multi-threaded implementations.

There are other formats though, with packers/unpackers which can use multiple cores. 7zip will be the simplest option for windows.

Yes I need to work with zip files...

And I am not trying to unzip them (create a copy of the file that is uncompressed) but only to browse the content of the zip file.

 

Thanks anyway.

-a-

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes is a bit excessive. 

ZIP files have an index at the end, usually (it's optional but most archivers add it), so the software would only have to read the last few KB of data from the file and get the index. 

Without that index, the software has to jump from place to place in the file to get each file information  ... you basically have something like  filename, uncompressed size, compressed size , compressed data ... so the software can read the compressed size and jump that many bytes to the next file information. 

if your hard drive is very fragmented, it could take a long time for the archive to be parsed. 

 

I'd suspect you have a very eager antivirus software which unpacks the archive as the software you use tries to open it, causing the slow down.

 

Another possibility is that the archive is using some kind of "solid compression mode " where all the files are compressed as one big single file for improved compression ... so if you want to decompress one file, all files before that file have to be decompressed.

 

edit... something else that could be ... the software you use could have a very stupid or badly optimized way of storing information about the files inside the zip, in its memory. 

If the zip has a huge amount of files, the software may take a lot of time to extract the file information. 

For example, let's say the developer assumed majority of zip files won't have more than 5000 files inside, so it reserves room in memory for 5000 files. 

Then as he fills those 5000 "spots" in memory he realizes there's still more files in the zip. So, a bad developer would reserve room for 5000 files + some amount, let's say 6000 files, copy contents from the original location to the new location and erase the old reservation. 

If you have 100k files in the archive and the developer was stupid to increase the memory reservation by small amounts, that could explain taking a very long time to read the archive contents.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, DriftMan said:

No file info, no PC info, no program info

What do you expect us to say?

I didn't believe it would be much relevant. Sorry

 

Ryzen9 3900X

128GB DDR4 3600

1TB FireCuda SSD (M.2 NVMe PCIe4.0 x4)

Asus prime x570 MB

GTX Titan X 12GB

Windows10 x64 EduN with Media Feature Pack installed. Current version

 

I'm using the Windows embedded unzip functionality (just double-click on the zip file) since that's what most users will end up doing instinctively.

 

zip files where generated with 7zip using "ultra", "Defalte64" and word size=128

zip file sizes are 400-500GB

 

Best,

-a-

 

note:

 

Link to comment
Share on other sites

Link to post
Share on other sites

Open the file with 7zip instead of file explorer.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, mariushm said:

11 minutes is a bit excessive. 

ZIP files have an index at the end, usually (it's optional but most archivers add it), so the software would only have to read the last few KB of data from the file and get the index. 

Without that index, the software has to jump from place to place in the file to get each file information  ... you basically have something like  filename, uncompressed size, compressed size , compressed data ... so the software can read the compressed size and jump that many bytes to the next file information. 

if your hard drive is very fragmented, it could take a long time for the archive to be parsed. 

 

I'd suspect you have a very eager antivirus software which unpacks the archive as the software you use tries to open it, causing the slow down.

 

Another possibility is that the archive is using some kind of "solid compression mode " where all the files are compressed as one big single file for improved compression ... so if you want to decompress one file, all files before that file have to be decompressed.

I don't understand all of what you say due to my ignorance on the matter.

I can add that I am not running any particular AV except for Windows embedded AV

opening those zip files takes the same time from the system SSD, an otherwise empty 3TB HDD or an otherwise empty 1TB USB3.0 HDD

Link to comment
Share on other sites

Link to post
Share on other sites

Use 7-zip to open the archives. The zip inside Windows is following an older version of zip standard, which may not support some of the extensions (like file index at the end of the archive), so the windows zip may have to slowly go through all those 500 GB extracting file names.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Kilrah said:

Open the file with 7zip instead of file explorer.

I'll try that and report here.

However, this will require that I tell users to use 7zip instead of file explorer. Or maybe i can associate zip.files with 7zip by default...

Link to comment
Share on other sites

Link to post
Share on other sites

I think you shouldn't distribute such massive zips in the first place, there must be a more appropriate method.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

Users shouldn't deal with 400 - 500 GB archives. 

 

Try to keep them below 100 GB...

 

Also, when you're dealing with such huge amounts,  the difference between Normal / Slow and Ultra could be just 1-2  GB of disk space saved out of 500 GB ... think hard if you really need to save that 1 GB and piss people off my making decompression 5-10 times slower. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, mariushm said:

Users shouldn't deal with 400 - 500 GB archives. 

 

Try to keep them below 100 GB...

 

Also, when you're dealing with such huge amounts,  the difference between Normal / Slow and Ultra could be just 1-2  GB of disk space saved out of 500 GB ... think hard if you really need to save that 1 GB and piss people off my making decompression 5-10 times slower. 

 

 

Thanks but that is not an option.

I am not in charge of this part of the story.

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, DriftMan said:

No file info, no PC info, no program info

What do you expect us to say?

I disagree with the file info portion of your argument; what if it contains sensitive data? you can't expect a person to possibly intentionally leak personal info onto the internet, right?

Link to comment
Share on other sites

Link to post
Share on other sites

I'm sure those users will be happy to download or copy 400 GB and then get a corrupt file because of a flipped bit somewhere during download.

 

Also, if their download tool doesn't support Resume they'll be really happy to download tens of GB and then have their connection fail. or their stupid download program tries to save the download as a temporary file in C:\ which doesn't have enough space, or some stupid crap like that.

 

(this is where torrent files are handy and useful, as the file is split in chunks that can be downloaded indepently and there's checksums for each chunk, so if the checksum fails, that chunk can be downloaded again without downloading the whole big file)

 

Maybe consider using Quickpar - http://www.quickpar.org.uk/ - or some other software to generate some error recovery information, if the file is corrupted you can scan the file and use the recovery info to repair it.

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, linuxChips2600 said:

I disagree with the file info portion of your argument; what if it contains sensitive data? you can't expect a person to possibly intentionally leak personal info onto the internet, right?

Nobody cares about the content of the files, but type, number and the purpose would help give advice.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

Opening these zip files with 7zip takes less than 1s (instantaneous).

 

Actually with 7zip we can even open them from the file server (at least on our LAN), no need to download anything anymore 🙂

 

-> problem solved.

Thanks a lot 🙂

 

NOW, bonus $100K question: What could I use to open these files on macOS too ? There is no 7zip for macOS...

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/4/2021 at 3:06 PM, mariushm said:

A quick google search says Keka - https://www.keka.io/en/ -  , or maybe "The unarchiver"  - https://apps.apple.com/in/app/the-unarchiver/id425424353?mt=12 - ... didn't test either of them personally, so don't be upset if they don't work right 

Thanks a lot 😄

 

Keka is not free so that's a dead end for me.

The unarchiver is cool, I was actually already using it in the past.

However, I need to find a way to "browse" the content of zip files rather than unzip/unarchive them.

Otherwise it requires too much disk space and it is way too slow (when we only need to look at the files inside without actually open any of them).

 

Thanks again 🙂

Best,

-a-

 

EDIT:

I'll look into these:

https://www.howtogeek.com/308468/how-to-open-and-browse-zip-files-on-macos-without-unarchiving-them/

https://osxdaily.com/2013/06/17/view-zip-archive-contents-without-extracting-mac-os-x/

https://www.cnet.com/news/reading-the-contents-of-a-zip-file-in-os-x/

 

B1 free archiver seems to do the trick

Link to comment
Share on other sites

Link to post
Share on other sites

-> Moved to Programs, Apps and Websites

^^^^ That's my post ^^^^
<-- This is me --- That's your scrollbar -->
vvvv Who's there? vvvv

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×