Commit graph

113 commits

Author SHA1 Message Date
David Caldwell
73143a0ad6
perf: Faster cde rejection (#255)
* Use the tempfile crate instead of the tempdir crate (which is deprecated)

https://github.com/rust-lang-deprecated/tempdir?tab=readme-ov-file#deprecation-note

* perf: Add benchmark that measures the rejection speed of a large non-zip file

* perf: Speed up non-zip rejection by increasing END_WINDOW_SIZE

I tested several END_WINDOW_SIZEs across 2 machines:

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
512:   test parse_large_non_zip  ... bench:  30,450,608 ns/iter (+/- 673,910)
4096:  test parse_large_non_zip  ... bench:   7,741,366 ns/iter (+/- 521,101)
8192:  test parse_large_non_zip  ... bench:   5,807,443 ns/iter (+/- 546,227)
16384: test parse_large_non_zip  ... bench:   4,794,314 ns/iter (+/- 419,114)
32768: test parse_large_non_zip  ... bench:   4,262,897 ns/iter (+/- 397,582)
65536: test parse_large_non_zip  ... bench:   4,060,847 ns/iter (+/- 280,964)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
512:   test parse_large_non_zip  ... bench:  65,132,581 ns/iter (+/- 7,429,976)
4096:  test parse_large_non_zip  ... bench:  14,109,503 ns/iter (+/- 2,892,086)
8192:  test parse_large_non_zip  ... bench:   9,942,500 ns/iter (+/- 1,886,063)
16384: test parse_large_non_zip  ... bench:   8,205,851 ns/iter (+/- 2,902,041)
32768: test parse_large_non_zip  ... bench:   7,012,011 ns/iter (+/- 2,222,879)
65536: test parse_large_non_zip  ... bench:   6,577,275 ns/iter (+/- 881,546)

In both cases END_WINDOW_SIZE=8192 performed about 6x better than 512 and >8192
didn't make much of a difference on top of that.

* perf: Speed up non-zip rejection by limiting search for EOCDR.

I benchmarked several search sizes across 2 machines
(these benches are using an 8192 END_WINDOW_SIZE):

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   5,773,801 ns/iter (+/- 411,277)
last 128k:    test parse_large_non_zip              ... bench:      54,402 ns/iter (+/- 4,126)
last 66,000:  test parse_large_non_zip              ... bench:      36,152 ns/iter (+/- 4,293)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   9,942,306 ns/iter (+/- 1,963,522)
last 128k:    test parse_large_non_zip              ... bench:      73,604 ns/iter (+/- 16,662)
last 66,000:  test parse_large_non_zip              ... bench:      41,349 ns/iter (+/- 16,812)

As you might expect these significantly increase the rejection speed for
large non-zip files.

66,000 was the number previously used by zip-rs. It was changed to zero in
7a55945743.

128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero)
choice for compatibility reasons.

[1] Info-zip is extremely old and doesn't not have an official git repo to
    link to. However, an unofficial fork can be found here:
    bb0c4755d4/zipfile.c (L4073)

---------

Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-11-19 18:49:35 +00:00
Richard Ivánek
1f2957db1f
fix: resolve new clippy warnings on nightly (#262) 2024-11-18 22:31:26 +00:00
Danny McClanahan
a7c1230dfa
publicly export and document the zip64 threshold constants (#79)
- add doctest for ZIP64_BYTES_THR
2024-07-20 01:52:06 +00:00
Benoît du Garreau
7a8048b159 Improve FixedSizeBlock
- Remove allocations
- Make unsafe code easier to check
- Prevent potential `repr(Rust)` fields reordering
2024-07-12 11:11:17 +02:00
Chris Hennick
8bb3be02c1
refactor: Verify with debug assertions that no FixedSizeBlock expects a multi-byte alignment (#198) 2024-06-21 17:03:44 -07:00
Chris Hennick
d309f07010
chore: Fix build errors on older Rust versions 2024-06-18 20:09:50 -07:00
Chris Hennick
9bf914d7d4
fix: May have to consider multiple CDEs before filtering for validity 2024-06-18 19:58:16 -07:00
Chris Hennick
cb2d7abde7
fix: We now keep searching for a real CDE header after read an invalid one from the file comment 2024-06-18 10:31:25 -07:00
Chris Hennick
77e718864d
fix: Incorrect behavior following a rare combination of merge_archive, abort_file and deep_copy_file. As well, we now return an error when a file is being copied to itself. 2024-06-13 13:49:27 -07:00
Chris Hennick
68f0a2f481
style: Change len() == 0 to is_empty() 2024-06-12 21:42:08 -07:00
Chris Hennick
057224f9a2
style: Remove unneeded parens 2024-06-12 21:40:39 -07:00
Chris Hennick
5bc1ba910f
fix: path_to_string now properly handles the case of an empty path 2024-06-12 21:39:17 -07:00
Chris Hennick
97245ad68d
chore: Fix a new Clippy warning 2024-06-02 22:04:40 -07:00
Chris Hennick
2725416c0d
chore: Fix a bug and inline deserialize for safety 2024-06-02 22:00:44 -07:00
Chris Hennick
eacc320fe0
chore: Add check for wrong-length blocks, and incorporate fixed-size requirement into the trait name 2024-06-02 21:48:21 -07:00
Chris Hennick
326b2c4582
Revert macro changes
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 13:15:58 -07:00
Chris Hennick
3af70176e3
Remove an unused macro branch
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 13:11:07 -07:00
Chris Hennick
01bb162456
Remove an unused macro branch
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 13:10:44 -07:00
Chris Hennick
1bb0b14456
style: Fix cargo fmt check
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 13:03:00 -07:00
Chris Hennick
18760e9f9d
Switch to debug_assert! for an assert! involving only constants
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 12:58:36 -07:00
Chris Hennick
848309a944
Switch to debug_assert! for an assert! involving only constants
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 12:58:19 -07:00
Chris Hennick
9722dd31e9
Return error if file comment is too long
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 12:57:34 -07:00
Danny McClanahan
d81382b29a
revert limit for search_lower_bound to fix benchmark 2024-05-24 09:00:33 -04:00
Danny McClanahan
a509efc28a
review comments 3 2024-05-24 08:26:38 -04:00
Danny McClanahan
4a784b5636
interpose ZipRawValues into ZipFileData 2024-05-24 07:58:05 -04:00
Danny McClanahan
7c2474f80c
go into_boxed_slice() earlier 2024-05-24 07:54:40 -04:00
Danny McClanahan
d852c222fc
review comments 1 2024-05-24 07:54:40 -04:00
Danny McClanahan
a7fd5874cf
reduce visibility for all the blocks 2024-05-24 07:54:40 -04:00
Danny McClanahan
21d07e192c
add ExtraFieldMagic and Zip64ExtraFieldBlock 2024-05-24 07:54:39 -04:00
Danny McClanahan
8fbc4039a8
lean more on the ::MAGIC trait constants 2024-05-24 07:54:04 -04:00
Danny McClanahan
e1c92e2f21
make SIG_BYTES const 2024-05-24 07:52:31 -04:00
Danny McClanahan
03c92a1184
add to_and_from_le! macro 2024-05-24 07:52:31 -04:00
Danny McClanahan
83cdbadae8
make window size assertions much less complex with Magic 2024-05-24 07:52:31 -04:00
Danny McClanahan
7eb5907622
remove a lot of boilerplate for Block impls 2024-05-24 07:52:31 -04:00
Danny McClanahan
3fa0d84554
make Magic into a wrapper struct 2024-05-24 07:52:31 -04:00
Danny McClanahan
ea308499af
bulk parsing and bulk writing
- use blocks for reading individual file headers
- remove unnecessary option wrapping for stream entries
- create Block trait
- add coerce method to reduce some boilerplate
- add serialize method to reduce more boilerplate
- use to_le! and from_le!
- add test case
- add some docs
- rename a few structs to clarify zip32-only
2024-05-24 07:52:25 -04:00
Danny McClanahan
7a55945743
add benchmarks 2024-05-24 07:39:54 -04:00
Chris Hennick
267ab432cf
chore: partial revert - only &str has chars(), but Box<str> should auto-deref 2024-05-15 16:51:12 -07:00
Chris Hennick
d78f127039
chore: contains_key needs a Box<str>, so generify is_dir to accept one 2024-05-15 16:49:05 -07:00
Chris Hennick
b7ac989013
refactor: is_dir only needs to look at the filename 2024-05-15 16:44:59 -07:00
Chris Hennick
bd473ef75b
perf: Use boxed slice for archive comment, since it can't be concatenated 2024-05-08 15:36:12 -07:00
Chris Hennick
eb063ad432
perf: Optimize for the fact that false signatures can't overlap with real ones 2024-05-08 10:59:32 -07:00
Chris Hennick
e1ef3fc65c
fix: file paths shouldn't start with slashes (#102) 2024-05-06 10:52:52 -07:00
Chris Hennick
52375437dc
fix: Process ZIP files with up to a 65,978-byte comment (https://github.com/zip-rs/zip-old/issues/183) 2024-05-05 19:48:32 -07:00
Chris Hennick
1b2c42b199
style: cargo fmt --all 2024-05-03 15:18:31 -07:00
Chris Hennick
74e76a94ca
chore: Refactor: can short-circuit handling of paths that start with MAIN_SEPARATOR, no matter what MAIN_SEPARATOR is 2024-05-03 15:01:43 -07:00
Chris Hennick
2adbbccb82
perf: Quick filter for paths that contain "/../" or "/./" or start with "./" or "../" 2024-05-03 14:59:35 -07:00
Chris Hennick
0fe12b2ec9
chore: Bug fix: non-canonical path detection when MAIN_SEPARATOR is not slash or occurs twice in a row 2024-05-03 14:34:05 -07:00
Chris Hennick
5cd448802f
chore: Bug fix: must recreate if . or .. is a path element 2024-05-03 14:31:32 -07:00
Chris Hennick
001967186a
perf: Fast handling for separator-free paths 2024-05-03 14:28:14 -07:00