.Zip It Good!
I conducted some tests to see which compression formats squeeze certain file formats better. Because apparently even my backups need to be optimized.
PeaZip 7.32
Initial testing was conducted in July 12020 with PeaZip, which incorporates its own PEA format alongside 7-Zip (gzip, bzip2, xz, zip, 7z) and other FOSS algorithms (BCM, Brotli, FreeArc, ZPaq, zstandard).
Text
A folder containing mostly HTML files downloaded from AO3, plus some ASCII plaintext files:
LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION pea, zpaq, zip, gz, xz, br, zst, 7z, bz2, arc, bcm SLOWEST ---------------------------------- FASTEST bz2, gz, zip, 7z, zst, arc, br, bcm, zpaq, xz, pea
BYTES ^ TIME (SECONDS) 5108105 0.000 .html/.txt 1856844 0.591 .pea 1801952 0.691 .zpaq 1766751 5.900 .zip 1675464 6.400 .tar.gz 1567204 0.620 .tar.xz 1451784 1.200 .tar.br 1345611 2.800 .tar.zst 1332424 3.000 .7z 1280880 9.500 .tar.bz2 1072873 1.300 .arc 1072769 1.100 .tar.bcm
Images
PNGs and JPGs plus the occasional GIF:
LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION zpaq, pea, gz, zip, xz, bcm, bz2, arc, br, 7z, zst SLOWEST ---------------------------------- FASTEST bz2, br, zip, 7z, gz, bcm, arc, zst, pea, xz, zpaq
KB ^ TIME (M:SS) 141021 0:00 .gif/.png/.jpg 140488 0:02 .zpaq 138922 0:12 .pea 138773 1:00 .tar.gz 138768 1:28 .zip 138544 0:06 .tar.xz 138493 0:40 .tar.bcm 138150 9:00 .tar.bz2 137783 0:30 .arc 135883 1:50 .tar.br 135742 1:09 .7z 135560 0:13 .tar.zst
Videos
Mostly MP4S with the occasional FLV and WEBM:
LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION zpaq, arc, pea, xz, gz, bz2, zip, 7z, br, zst, bcm SLOWEST ---------------------------------- FASTEST bz2, br, zip, gz, 7z, arc, bcm, zst, pea, xz, zpaq
MB ^ TIME (MM:SS) 728.0 00:00 .flv/.mp4/.webm 719.5 00:09 .zpaq 717.0 04:26 .arc 711.0 01:14 .pea 710.3 00:51 .tar.xz 709.9 09:16 .zip 709.7 09:11 .tar.gz 709.2 49:25 .tar.bz2 708.0 07:30 .7z 707.3 17:19 .tar.br 706.4 01:26 .tar.zst 702.0 04:03 .tar.bcm
Flash
Purely SWF animations:
LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION zpaq, bcm, pea, bz2, zip, gz, arc, xz, br, zst, 7z SLOWEST ---------------------------------- FASTEST bz2, br, zip, gz, 7z, bcm, arc, zst, pea, xz, zpaq
MB ^ TIME (MM:SS) 253.1 00:00 .swf 238.5 00:04 .zpaq 238.3 01:12 .tar.bcm 236.6 00:22 .pea 236.5 16:43 .tar.bz2 236.2 03:16 .zip 236.2 03:01 .tar.gz 235.5 01:02 .arc 233.5 00:12 .tar.xz 232.1 03:21 .tar.br 231.9 00:22 .tar.zst 231.0 02:20 .7z
Others
1.36GB of PDF files, compressed with 7-Zip ZS 1.5.0 R1 (gzip, bzip2, xz, zip, 7z, lizard, lz4, lz5, zstandard):
LARGER --------------- SMALLER lz5, lz4, gz, liz, 7z, zst, xz (1.19GB) SLOWEST -------------- FASTEST gz, 7z, lz5, xz, liz, zst, lz4
I did not test bzip2 as it was previously shown to be the slowest algorithm available.
HTML Dumps
With PeaZip 7.90, I tested out some archived message boards that were mostly HTML with very little CSS or images:
LEAST SIZE REDUCTION --------- MOST SIZE REDUCTION zip, pea, gz, bz2, br, zst, zpaq, xz, 7z, bcm, arc
XML/JSON
I threw 7-Zip ZS 1.5.0 R1's formats at a small collection of bookmark files and JSON/OPML settings exports:
LARGER ------------------------- SMALLER lz5, lz4, liz, zip, gz, bz2, zst, 7z, xz
Conclusions
For text files and websites under 5GB in size, the FreeArc 0.67 alpha from 12014 absolutely dunks on everything else, even 7+ years after it was made. For larger website mirrors, 7z was able to achieve smaller sizes.
For images, videos, and animations, zstandard offers if not the best compression ratio, then the best speed/size compromise. But, if you absolutely must have the smallest backups, use BCM for video, and 7z for SWFs.
To save a bit of space, I replaced PeaZip with 7-Zip ZS, but kept FreeArc's CLI utility around to crunch them HTTrack backups since it's exceptional at doing so. The rest of my backups are stored in 7Zs for general files, ZSTs for images, and XZ for PDFs.
Unless stated otherwise, all graphics on this page are Copyright the respective
rightsholders, and all text is published under the Creative Commons
Attribution-ShareAlike License.
Originally appeared
on Bytemoth's Brook / CC BY-SA
Please enjoy responsibly.