compiling the zip2 crate to wasm with the goal of compiling to luau through wasynth
Find a file
David Caldwell 73143a0ad6
perf: Faster cde rejection (#255)
* Use the tempfile crate instead of the tempdir crate (which is deprecated)

https://github.com/rust-lang-deprecated/tempdir?tab=readme-ov-file#deprecation-note

* perf: Add benchmark that measures the rejection speed of a large non-zip file

* perf: Speed up non-zip rejection by increasing END_WINDOW_SIZE

I tested several END_WINDOW_SIZEs across 2 machines:

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
512:   test parse_large_non_zip  ... bench:  30,450,608 ns/iter (+/- 673,910)
4096:  test parse_large_non_zip  ... bench:   7,741,366 ns/iter (+/- 521,101)
8192:  test parse_large_non_zip  ... bench:   5,807,443 ns/iter (+/- 546,227)
16384: test parse_large_non_zip  ... bench:   4,794,314 ns/iter (+/- 419,114)
32768: test parse_large_non_zip  ... bench:   4,262,897 ns/iter (+/- 397,582)
65536: test parse_large_non_zip  ... bench:   4,060,847 ns/iter (+/- 280,964)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
512:   test parse_large_non_zip  ... bench:  65,132,581 ns/iter (+/- 7,429,976)
4096:  test parse_large_non_zip  ... bench:  14,109,503 ns/iter (+/- 2,892,086)
8192:  test parse_large_non_zip  ... bench:   9,942,500 ns/iter (+/- 1,886,063)
16384: test parse_large_non_zip  ... bench:   8,205,851 ns/iter (+/- 2,902,041)
32768: test parse_large_non_zip  ... bench:   7,012,011 ns/iter (+/- 2,222,879)
65536: test parse_large_non_zip  ... bench:   6,577,275 ns/iter (+/- 881,546)

In both cases END_WINDOW_SIZE=8192 performed about 6x better than 512 and >8192
didn't make much of a difference on top of that.

* perf: Speed up non-zip rejection by limiting search for EOCDR.

I benchmarked several search sizes across 2 machines
(these benches are using an 8192 END_WINDOW_SIZE):

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   5,773,801 ns/iter (+/- 411,277)
last 128k:    test parse_large_non_zip              ... bench:      54,402 ns/iter (+/- 4,126)
last 66,000:  test parse_large_non_zip              ... bench:      36,152 ns/iter (+/- 4,293)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   9,942,306 ns/iter (+/- 1,963,522)
last 128k:    test parse_large_non_zip              ... bench:      73,604 ns/iter (+/- 16,662)
last 66,000:  test parse_large_non_zip              ... bench:      41,349 ns/iter (+/- 16,812)

As you might expect these significantly increase the rejection speed for
large non-zip files.

66,000 was the number previously used by zip-rs. It was changed to zero in
7a55945743.

128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero)
choice for compatibility reasons.

[1] Info-zip is extremely old and doesn't not have an official git repo to
    link to. However, an unofficial fork can be found here:
    bb0c4755d4/zipfile.c (L4073)

---------

Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-11-19 18:49:35 +00:00
.github ci(fuzz): Switch to alf for faster fuzzing (#245) 2024-11-19 14:25:55 +00:00
benches perf: Faster cde rejection (#255) 2024-11-19 18:49:35 +00:00
examples Merge branch 'master' into utf8_extra_fields 2024-06-02 17:52:04 -07:00
fuzz_read ci(fuzz): Switch to alf for faster fuzzing (#245) 2024-11-19 14:25:55 +00:00
fuzz_write ci(fuzz): Switch to alf for faster fuzzing (#245) 2024-11-19 14:25:55 +00:00
security-advisories in-source vulnerability tracking 2023-09-19 18:55:01 +01:00
src perf: Faster cde rejection (#255) 2024-11-19 18:49:35 +00:00
tests perf: Faster cde rejection (#255) 2024-11-19 18:49:35 +00:00
.gitattributes test: Add .gitattributes to force test data files to be binary 2024-05-21 09:15:44 -07:00
.gitignore ci(fuzz): Switch to alf for faster fuzzing (#245) 2024-11-19 14:25:55 +00:00
.whitesource Add .whitesource configuration file 2023-04-23 21:33:22 +00:00
Cargo.toml perf: Faster cde rejection (#255) 2024-11-19 18:49:35 +00:00
CHANGELOG.md chore: release (#234) 2024-08-19 20:27:21 +00:00
cliff.toml doc: exclude doc updates from CHANGELOG 2024-04-22 19:18:15 -07:00
CODE_OF_CONDUCT.md doc: veeeery small fix to CoC 2022-01-23 17:35:39 +03:00
CONTRIBUTING.md Make CONTRIBUTING.md link to pull_request_template.md 2024-05-09 19:41:26 -07:00
LICENSE doc: Add some missing license information 2024-05-19 11:47:12 -07:00
pull_request_template.md docs: Update pull_request_template.md 2024-07-20 20:44:42 -07:00
README.md docs: Update list of supported features (#230) 2024-08-05 17:15:45 +00:00
release-plz.toml doc: exclude doc updates from CHANGELOG 2024-04-22 19:18:15 -07:00

zip

Build Status Crates.io version

Documentation

Info

A zip library for rust which supports reading and writing of simple ZIP files. Formerly hosted at https://github.com/zip-rs/zip2.

Supported compression formats:

  • stored (i.e. none)
  • deflate
  • deflate64 (decompression only)
  • bzip2
  • zstd
  • lzma (decompression only)
  • xz (decompression only)

Currently unsupported zip extensions:

  • Multi-disk

Features

The features available are:

  • aes-crypto: Enables decryption of files which were encrypted with AES. Supports AE-1 and AE-2 methods.
  • deflate: Enables compressing and decompressing an unspecified implementation (that may change in future versions) of the deflate compression algorithm, which is the default for zip files. Supports compression quality 1..=264.
  • deflate-flate2: Combine this with any flate2 feature flag that enables a back-end, to support deflate compression at quality 1..=9.
  • deflate-zopfli: Enables deflating files with the zopfli library (used when compression quality is 10..=264). This is the most effective deflate implementation available, but also among the slowest.
  • deflate64: Enables the deflate64 compression algorithm. Only decompression is supported.
  • lzma: Enables the LZMA compression algorithm. Only decompression is supported.
  • bzip2: Enables the BZip2 compression algorithm.
  • time: Enables features using the time crate.
  • chrono: Enables converting last-modified zip::DateTime to and from chrono::NaiveDateTime.
  • zstd: Enables the Zstandard compression algorithm.

By default aes-crypto, bzip2, deflate, deflate64, lzma, time and zstd are enabled.

The following feature flags are deprecated:

  • deflate-miniz: Use flate2's default backend for compression. Currently the same as deflate.

MSRV

Our current Minimum Supported Rust Version is 1.73. When adding features, we will follow these guidelines:

  • We will always support the latest four minor Rust versions. This gives you a 6 month window to upgrade your compiler.
  • Any change to the MSRV will be accompanied with a minor version bump.

Examples

See the examples directory for:

  • How to write a file to a zip.
  • How to write a directory of files to a zip (using walkdir).
  • How to extract a zip file.
  • How to extract a single file from a zip.
  • How to read a zip from the standard input.
  • How to append a directory to an existing archive

Fuzzing

Fuzzing support is through cargo fuzz. To install cargo fuzz:

cargo install cargo-fuzz

To list fuzz targets:

cargo +nightly fuzz list

To start fuzzing zip extraction:

cargo +nightly fuzz run fuzz_read

To start fuzzing zip creation:

cargo +nightly fuzz run fuzz_write