* Use the tempfile crate instead of the tempdir crate (which is deprecated)
https://github.com/rust-lang-deprecated/tempdir?tab=readme-ov-file#deprecation-note
* perf: Add benchmark that measures the rejection speed of a large non-zip file
* perf: Speed up non-zip rejection by increasing END_WINDOW_SIZE
I tested several END_WINDOW_SIZEs across 2 machines:
Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
512: test parse_large_non_zip ... bench: 30,450,608 ns/iter (+/- 673,910)
4096: test parse_large_non_zip ... bench: 7,741,366 ns/iter (+/- 521,101)
8192: test parse_large_non_zip ... bench: 5,807,443 ns/iter (+/- 546,227)
16384: test parse_large_non_zip ... bench: 4,794,314 ns/iter (+/- 419,114)
32768: test parse_large_non_zip ... bench: 4,262,897 ns/iter (+/- 397,582)
65536: test parse_large_non_zip ... bench: 4,060,847 ns/iter (+/- 280,964)
Machine 2: Debian testing, x86_64 (tmpfs /tmp)
512: test parse_large_non_zip ... bench: 65,132,581 ns/iter (+/- 7,429,976)
4096: test parse_large_non_zip ... bench: 14,109,503 ns/iter (+/- 2,892,086)
8192: test parse_large_non_zip ... bench: 9,942,500 ns/iter (+/- 1,886,063)
16384: test parse_large_non_zip ... bench: 8,205,851 ns/iter (+/- 2,902,041)
32768: test parse_large_non_zip ... bench: 7,012,011 ns/iter (+/- 2,222,879)
65536: test parse_large_non_zip ... bench: 6,577,275 ns/iter (+/- 881,546)
In both cases END_WINDOW_SIZE=8192 performed about 6x better than 512 and >8192
didn't make much of a difference on top of that.
* perf: Speed up non-zip rejection by limiting search for EOCDR.
I benchmarked several search sizes across 2 machines
(these benches are using an 8192 END_WINDOW_SIZE):
Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
whole file: test parse_large_non_zip ... bench: 5,773,801 ns/iter (+/- 411,277)
last 128k: test parse_large_non_zip ... bench: 54,402 ns/iter (+/- 4,126)
last 66,000: test parse_large_non_zip ... bench: 36,152 ns/iter (+/- 4,293)
Machine 2: Debian testing, x86_64 (tmpfs /tmp)
whole file: test parse_large_non_zip ... bench: 9,942,306 ns/iter (+/- 1,963,522)
last 128k: test parse_large_non_zip ... bench: 73,604 ns/iter (+/- 16,662)
last 66,000: test parse_large_non_zip ... bench: 41,349 ns/iter (+/- 16,812)
As you might expect these significantly increase the rejection speed for
large non-zip files.
66,000 was the number previously used by zip-rs. It was changed to zero in
7a55945743.
128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero)
choice for compatibility reasons.
[1] Info-zip is extremely old and doesn't not have an official git repo to
link to. However, an unofficial fork can be found here:
bb0c4755d4/zipfile.c (L4073)
---------
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
* test(fuzz): Migrate to afl++ for fuzzing
* build: Exclude new fuzz binaries
* chore: Fix new warning
* ci: Use cargo action for format check
* deps: Update constant_time_eq and flate2
* ci: Bug fix for file paths
* ci: Bug fix: working directory is parent of repository root
* ci: Bug fix: remove stray `cd` commands
* ci: Bug fix? Make paths explicitly descend from workspace root
* ci: Bug fix? Assume github.workspace is the repo root
* test(fuzz): Commit files that were previously missing
* ci(fuzz): Bug fix for fuzz_write_with_no_features
* ci(fuzz): Bug fix: no -V arg for cmin
* ci(fuzz): Bug fix: no -a arg for cmin
* Bug fix: replace colons with dashes in filenames
* style: Fix 2 clippy warnings
* style: Fix another clippy warning in some configs
* ci(fuzz): Enable renaming in all fuzz jobs
* ci(fuzz): Fix: need to rename files in multiple dirs
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
* ci(fuzz): Install `rename` tool
* ci(fuzz): Fix redundant steps and too-late install of `rename`
* ci(fuzz): fix? replace multiple colons
---------
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
* fix: Rare combination of settings could lead to writing a corrupt archive with overlength extra data
* fix: Previous fix was breaking alignment
* style: cargo fmt --all
* fix: ZIP64 header was being written twice
* style: cargo fmt --all
* ci(fuzz): Add check that file-creation options are individually valid
* fix: Need to update extra_data_start in deep_copy_file
* style: cargo fmt --all
* test(fuzz): fix bug in Arbitrary impl
* fix: Cursor-position bugs when merging archives or opening for append
* fix: unintended feature dependency
* style: cargo fmt --all
* fix: merge_contents was miscalculating new start positions for absorbed archive's files
* fix: shallow_copy_file needs to reset CDE location since the CDE is copied
* fix: ZIP64 header was being written after AES header location was already calculated
* fix: ZIP64 header was being counted twice when writing extra-field length
* fix: deep_copy_file was positioning cursor incorrectly
* test(fuzz): Reimplement Debug so that it prints the method calls actually made
* test(fuzz): Fix issues with `Option<&mut Formatter>`
* chore: Partial debug
* chore: Revert: `merge_contents` already adjusts header_start and data_start
* chore: Revert unused `mut`
* style: cargo fmt --all
* refactor: eliminate a magic number for CDE block size
* chore: WIP: fix bugs
* refactor: Minor refactors
* refactor: eliminate a magic number for CDE block size
* refactor: Minor refactors
* refactor: Can use cde_start_pos to locate ZIP64 end locator
* chore: Fix import that can no longer be feature-gated
* chore: Fix import that can no longer be feature-gated
* refactor: Confusing variable name
* style: cargo fmt --all and fix Clippy warnings
* style: fix another Clippy warning
---------
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
* refactor: eliminate a magic number for CDE block size
* refactor: Minor refactors
* refactor: Can use cde_start_pos to locate ZIP64 end locator
* chore: Fix import that can no longer be feature-gated
* chore: Fix import that can no longer be feature-gated
Commit messages in PR no longer need to follow ConCom, since we now squash-merge PRs.
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>