Commit graph

402 commits

Author SHA1 Message Date
Richard Ivánek
33c71ccc80
fix: rewrite the EOCD/EOCD64 detection to fix extreme performance regression (#247)
* fix: resolve clippy warning in nightly

* wip: major rework of cde location

* wip: rework CDE lookup

* refactor: magic finder, eocd lookup retry

* wip: handle empty zips

* fix: satisfy tests, add documentation

* chore: remove unused dependencies

* feat: support both zip32 and zip64 comments

* feat: add zip64 comment functions to ZipWriter

* fix: first pass on maintainer comments

* fix: continue searching for EOCD when the central directory is invalid

* chore: satisfy clippy lints

* chore: satisfy style_and_docs

* feat: support both directions in MagicFinder, correctly find first CDFH

* fix: more checks to EOCD parsing, move comment size error from parse to write

* fix: use saturating add when checking eocd64 record_size upper bound

* fix: correctly handle mid window offsets in forward mode

* fix: compare maximum possible comment length against file size, not search region end

* feat: handle zip64 detection as a hint

* fix: detect oversized central directories when locating EOCD64

* fix: oopsie

---------

Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-12-16 03:32:55 +00:00
David Caldwell
73143a0ad6
perf: Faster cde rejection (#255)
* Use the tempfile crate instead of the tempdir crate (which is deprecated)

https://github.com/rust-lang-deprecated/tempdir?tab=readme-ov-file#deprecation-note

* perf: Add benchmark that measures the rejection speed of a large non-zip file

* perf: Speed up non-zip rejection by increasing END_WINDOW_SIZE

I tested several END_WINDOW_SIZEs across 2 machines:

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
512:   test parse_large_non_zip  ... bench:  30,450,608 ns/iter (+/- 673,910)
4096:  test parse_large_non_zip  ... bench:   7,741,366 ns/iter (+/- 521,101)
8192:  test parse_large_non_zip  ... bench:   5,807,443 ns/iter (+/- 546,227)
16384: test parse_large_non_zip  ... bench:   4,794,314 ns/iter (+/- 419,114)
32768: test parse_large_non_zip  ... bench:   4,262,897 ns/iter (+/- 397,582)
65536: test parse_large_non_zip  ... bench:   4,060,847 ns/iter (+/- 280,964)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
512:   test parse_large_non_zip  ... bench:  65,132,581 ns/iter (+/- 7,429,976)
4096:  test parse_large_non_zip  ... bench:  14,109,503 ns/iter (+/- 2,892,086)
8192:  test parse_large_non_zip  ... bench:   9,942,500 ns/iter (+/- 1,886,063)
16384: test parse_large_non_zip  ... bench:   8,205,851 ns/iter (+/- 2,902,041)
32768: test parse_large_non_zip  ... bench:   7,012,011 ns/iter (+/- 2,222,879)
65536: test parse_large_non_zip  ... bench:   6,577,275 ns/iter (+/- 881,546)

In both cases END_WINDOW_SIZE=8192 performed about 6x better than 512 and >8192
didn't make much of a difference on top of that.

* perf: Speed up non-zip rejection by limiting search for EOCDR.

I benchmarked several search sizes across 2 machines
(these benches are using an 8192 END_WINDOW_SIZE):

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   5,773,801 ns/iter (+/- 411,277)
last 128k:    test parse_large_non_zip              ... bench:      54,402 ns/iter (+/- 4,126)
last 66,000:  test parse_large_non_zip              ... bench:      36,152 ns/iter (+/- 4,293)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   9,942,306 ns/iter (+/- 1,963,522)
last 128k:    test parse_large_non_zip              ... bench:      73,604 ns/iter (+/- 16,662)
last 66,000:  test parse_large_non_zip              ... bench:      41,349 ns/iter (+/- 16,812)

As you might expect these significantly increase the rejection speed for
large non-zip files.

66,000 was the number previously used by zip-rs. It was changed to zero in
7a55945743.

128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero)
choice for compatibility reasons.

[1] Info-zip is extremely old and doesn't not have an official git repo to
    link to. However, an unofficial fork can be found here:
    bb0c4755d4/zipfile.c (L4073)

---------

Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-11-19 18:49:35 +00:00
Richard Ivánek
06632924e8
fix: resolve clippy warning in nightly (#252) 2024-10-21 05:05:49 +00:00
nick
af33ed343d
feat: Expose ZipArchive::central_directory_start (#232) 2024-08-11 12:00:08 +00:00
Chris Hennick
6d8ab6224b
fix: (#33) Rare combination of settings could lead to writing a corrupt archive with overlength extra data, and data_start locations when reading the archive back were also wrong (#221)
* fix: Rare combination of settings could lead to writing a corrupt archive with overlength extra data

* fix: Previous fix was breaking alignment

* style: cargo fmt --all

* fix: ZIP64 header was being written twice

* style: cargo fmt --all

* ci(fuzz): Add check that file-creation options are individually valid

* fix: Need to update extra_data_start in deep_copy_file

* style: cargo fmt --all

* test(fuzz): fix bug in Arbitrary impl

* fix: Cursor-position bugs when merging archives or opening for append

* fix: unintended feature dependency

* style: cargo fmt --all

* fix: merge_contents was miscalculating new start positions for absorbed archive's files

* fix: shallow_copy_file needs to reset CDE location since the CDE is copied

* fix: ZIP64 header was being written after AES header location was already calculated

* fix: ZIP64 header was being counted twice when writing extra-field length

* fix: deep_copy_file was positioning cursor incorrectly

* test(fuzz): Reimplement Debug so that it prints the method calls actually made

* test(fuzz): Fix issues with `Option<&mut Formatter>`

* chore: Partial debug

* chore: Revert: `merge_contents` already adjusts header_start and data_start

* chore: Revert unused `mut`

* style: cargo fmt --all

* refactor: eliminate a magic number for CDE block size

* chore: WIP: fix bugs

* refactor: Minor refactors

* refactor: eliminate a magic number for CDE block size

* refactor: Minor refactors

* refactor: Can use cde_start_pos to locate ZIP64 end locator

* chore: Fix import that can no longer be feature-gated

* chore: Fix import that can no longer be feature-gated

* refactor: Confusing variable name

* style: cargo fmt --all and fix Clippy warnings

* style: fix another Clippy warning

---------

Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-07-29 02:24:07 +00:00
Chris Hennick
3ecd65176c
refactor: Eliminate some magic numbers and unnecessary path prefixes (#225)
* refactor: eliminate a magic number for CDE block size

* refactor: Minor refactors

* refactor: Can use cde_start_pos to locate ZIP64 end locator

* chore: Fix import that can no longer be feature-gated

* chore: Fix import that can no longer be feature-gated
2024-07-28 01:43:44 +00:00
Chris Hennick
a60bd79826
Merge pull request #210 from a1phyr/multiple_refactors
Multiple refactors
2024-07-20 01:29:39 +00:00
Chris Hennick
7471cf526f
refactor: change invalid_state() return type to io::Result<T>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-07-19 13:32:23 -07:00
Chris Hennick
c14986806a Fix divergence from origin/master 2024-07-18 21:02:19 +00:00
Chris Hennick
81b5fb6088 Update fuzz_write corpus to use only smaller entries 2024-07-18 21:02:16 +00:00
Chris Hennick
6106a2bf0b
Merge pull request #201 from nichmor/fix/soft-links-should-remain-the-same
fix: soft links should remain the same
2024-07-18 17:26:17 +00:00
Chris Hennick
6b797b1ba9
Merge pull request #64 from zip-rs/oldpr368
feat: Added function to get if a file is encrypted or not
2024-07-17 17:25:50 +00:00
Chris Hennick
5632e7f25a
Merge pull request #69 from zip-rs/oldpr369
feat: Add by_name_seek() for Stored zips
2024-07-17 17:25:19 +00:00
Chris Hennick
b8c145717b
Merge pull request #212 from a1phyr/improve_unsafe_code
refactor: Improve `FixedSizeBlock`
2024-07-17 17:24:58 +00:00
Benoît du Garreau
e9b13121cc Make make_crypto_reader take ZipFileData directly 2024-07-16 10:54:52 +02:00
Benoît du Garreau
deb71baf9b Remove crypto_reader field from ZipFile 2024-07-16 10:54:51 +02:00
Benoît du Garreau
b01d5c9b1f Split reader and decompressor 2024-07-16 10:47:11 +02:00
Chris Hennick
bde1bb9ef1
Merge branch 'master' into fix/soft-links-should-remain-the-same 2024-07-15 09:01:34 -07:00
Benoît du Garreau
7a8048b159 Improve FixedSizeBlock
- Remove allocations
- Make unsafe code easier to check
- Prevent potential `repr(Rust)` fields reordering
2024-07-12 11:11:17 +02:00
Benoît du Garreau
83b1273fab Improve several Read methods on ZipFile 2024-07-11 14:31:31 +02:00
nichmor
a3232a2119
Merge branch 'master' into fix/soft-links-should-remain-the-same 2024-07-08 17:15:38 +03:00
Chris Hennick
57f01ba946
chore: Fix build errors 2024-07-06 14:26:37 -07:00
Chris Hennick
8635b16316
Merge branch 'master' into oldpr368 2024-07-06 12:38:27 -07:00
Chris Hennick
1d551ff23c
Merge branch 'master' into oldpr369
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-07-06 12:19:59 -07:00
nichmor
955ea393ee fix: read entire target and dont transform symlink to absoulte 2024-06-26 16:24:04 +03:00
Chris Hennick
0807029a63
Merge branch 'master' into xz 2024-06-22 16:58:54 -07:00
Chris Hennick
b051ca3d47
chore: Fix a bug introduced by c934c824 2024-06-22 16:57:49 -07:00
Chris Hennick
5b749c4ed9
Merge branch 'master' into xz 2024-06-21 23:15:47 -07:00
Chris Hennick
fcc4fa93e3
style: Fix a Clippy warning re unnecessary into_iter() 2024-06-21 23:15:23 -07:00
Chris Hennick
c0ede17cd0
Merge branch 'master' into xz 2024-06-21 20:29:30 -07:00
Chris Hennick
e20fd7959a
style: cargo fmt --all 2024-06-21 20:28:43 -07:00
Chris Hennick
9a2391358c
Merge branch 'master' into xz 2024-06-21 20:26:07 -07:00
Chris Hennick
c934c82405
fix: Some archives with over u16::MAX files were handled incorrectly or slowly (#189) 2024-06-21 20:22:15 -07:00
LoveSy
421e1dd8fb
feat: support XZ decompression 2024-06-22 11:12:53 +08:00
Chris Hennick
26e6462a8d
style: cargo fmt --all 2024-06-21 10:34:03 -07:00
Chris Hennick
27c7fa4cd4
chore: Fix a failing unit test 2024-06-20 13:40:12 -07:00
Chris Hennick
f1b617d112
fix: Check number of files when deciding whether a CDE is the real one 2024-06-20 04:45:43 -07:00
Chris Hennick
78a38e977a
fix: Could still select a fake CDE over a real one in some cases 2024-06-18 22:33:24 -07:00
Chris Hennick
a895aa57b1
style: cargo fmt --all 2024-06-18 20:11:56 -07:00
Chris Hennick
d309f07010
chore: Fix build errors on older Rust versions 2024-06-18 20:09:50 -07:00
Chris Hennick
9bf914d7d4
fix: May have to consider multiple CDEs before filtering for validity 2024-06-18 19:58:16 -07:00
Chris Hennick
45472486f1
style: Fix a Clippy warning 2024-06-18 12:41:16 -07:00
Chris Hennick
19118f45f3
chore: Fix build 2024-06-18 10:41:35 -07:00
Chris Hennick
cb2d7abde7
fix: We now keep searching for a real CDE header after read an invalid one from the file comment 2024-06-18 10:31:25 -07:00
Chris Hennick
9568e713bd
style: cargo fmt --all 2024-06-17 18:54:06 -07:00
Chris Hennick
4065f0501f
fix: Always search for data start when opening an archive for append, and reject the header if data appears to start after central directory 2024-06-17 17:44:34 -07:00
Chris Hennick
052f3a133e
fix: ZIP64 header was being written twice when copying a file 2024-06-14 17:09:36 -07:00
Chris Hennick
a770913f7b
fix: ZIP64 header was being written to central header twice 2024-06-14 16:38:11 -07:00
Chris Hennick
fdb79845be
perf: Only build one IndexMap after choosing among the possible valid headers 2024-06-14 15:03:56 -07:00
Chris Hennick
c4bd7a61a5
test: Fix a bug involving ZIP64 field parsing 2024-06-14 13:25:49 -07:00