Commit graph

141 commits

Author SHA1 Message Date
Richard Ivánek
33c71ccc80
fix: rewrite the EOCD/EOCD64 detection to fix extreme performance regression (#247)
* fix: resolve clippy warning in nightly

* wip: major rework of cde location

* wip: rework CDE lookup

* refactor: magic finder, eocd lookup retry

* wip: handle empty zips

* fix: satisfy tests, add documentation

* chore: remove unused dependencies

* feat: support both zip32 and zip64 comments

* feat: add zip64 comment functions to ZipWriter

* fix: first pass on maintainer comments

* fix: continue searching for EOCD when the central directory is invalid

* chore: satisfy clippy lints

* chore: satisfy style_and_docs

* feat: support both directions in MagicFinder, correctly find first CDFH

* fix: more checks to EOCD parsing, move comment size error from parse to write

* fix: use saturating add when checking eocd64 record_size upper bound

* fix: correctly handle mid window offsets in forward mode

* fix: compare maximum possible comment length against file size, not search region end

* feat: handle zip64 detection as a hint

* fix: detect oversized central directories when locating EOCD64

* fix: oopsie

---------

Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-12-16 03:32:55 +00:00
David Caldwell
73143a0ad6
perf: Faster cde rejection (#255)
* Use the tempfile crate instead of the tempdir crate (which is deprecated)

https://github.com/rust-lang-deprecated/tempdir?tab=readme-ov-file#deprecation-note

* perf: Add benchmark that measures the rejection speed of a large non-zip file

* perf: Speed up non-zip rejection by increasing END_WINDOW_SIZE

I tested several END_WINDOW_SIZEs across 2 machines:

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
512:   test parse_large_non_zip  ... bench:  30,450,608 ns/iter (+/- 673,910)
4096:  test parse_large_non_zip  ... bench:   7,741,366 ns/iter (+/- 521,101)
8192:  test parse_large_non_zip  ... bench:   5,807,443 ns/iter (+/- 546,227)
16384: test parse_large_non_zip  ... bench:   4,794,314 ns/iter (+/- 419,114)
32768: test parse_large_non_zip  ... bench:   4,262,897 ns/iter (+/- 397,582)
65536: test parse_large_non_zip  ... bench:   4,060,847 ns/iter (+/- 280,964)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
512:   test parse_large_non_zip  ... bench:  65,132,581 ns/iter (+/- 7,429,976)
4096:  test parse_large_non_zip  ... bench:  14,109,503 ns/iter (+/- 2,892,086)
8192:  test parse_large_non_zip  ... bench:   9,942,500 ns/iter (+/- 1,886,063)
16384: test parse_large_non_zip  ... bench:   8,205,851 ns/iter (+/- 2,902,041)
32768: test parse_large_non_zip  ... bench:   7,012,011 ns/iter (+/- 2,222,879)
65536: test parse_large_non_zip  ... bench:   6,577,275 ns/iter (+/- 881,546)

In both cases END_WINDOW_SIZE=8192 performed about 6x better than 512 and >8192
didn't make much of a difference on top of that.

* perf: Speed up non-zip rejection by limiting search for EOCDR.

I benchmarked several search sizes across 2 machines
(these benches are using an 8192 END_WINDOW_SIZE):

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   5,773,801 ns/iter (+/- 411,277)
last 128k:    test parse_large_non_zip              ... bench:      54,402 ns/iter (+/- 4,126)
last 66,000:  test parse_large_non_zip              ... bench:      36,152 ns/iter (+/- 4,293)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   9,942,306 ns/iter (+/- 1,963,522)
last 128k:    test parse_large_non_zip              ... bench:      73,604 ns/iter (+/- 16,662)
last 66,000:  test parse_large_non_zip              ... bench:      41,349 ns/iter (+/- 16,812)

As you might expect these significantly increase the rejection speed for
large non-zip files.

66,000 was the number previously used by zip-rs. It was changed to zero in
7a55945743.

128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero)
choice for compatibility reasons.

[1] Info-zip is extremely old and doesn't not have an official git repo to
    link to. However, an unofficial fork can be found here:
    bb0c4755d4/zipfile.c (L4073)

---------

Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-11-19 18:49:35 +00:00
nichmor
a3232a2119
Merge branch 'master' into fix/soft-links-should-remain-the-same 2024-07-08 17:15:38 +03:00
Chris Hennick
270fcde96f
test: Require _deflate-any to run new test 2024-07-06 11:56:36 -07:00
Chris Hennick
81b6cf51a3
style: cargo fmt --all 2024-07-05 09:06:05 -07:00
Chris Hennick
80b0025831
test: Remove shell-script version of new test, and move Rust version to a new file 2024-07-05 09:05:51 -07:00
nichmor
3c9b5dbb53 misc: add test to run on unix 2024-06-26 15:50:03 +03:00
nichmor
06a0b4e90e misc: add test that validate the usecase 2024-06-26 15:48:31 +03:00
LoveSy
421e1dd8fb
feat: support XZ decompression 2024-06-22 11:12:53 +08:00
Chris Hennick
94df73ea04
chore: Fix boxed_local warning (can borrow instead) 2024-06-14 00:17:21 -07:00
Chris Hennick
e555f8c770
test: Fix a bug 2024-06-14 00:00:37 -07:00
Chris Hennick
fce5e0a2d3
test: Add regression tests for #159 2024-06-04 09:29:33 -07:00
Chris Hennick
6afffaefb7
test: Update an integration test case 2024-06-03 17:47:52 -07:00
Chris Hennick
847e537e86
test: Add unit test for UTF8 extra-field handling 2024-06-02 17:46:55 -07:00
Chris Hennick
eb949ebdef
chore: Update unit tests 2024-05-25 15:05:02 -07:00
Danny McClanahan
011e5afe7b
add test that breaks without the fix 2024-05-24 07:39:55 -04:00
Chris Hennick
2148580a27
doc: Add some missing license information 2024-05-19 11:47:12 -07:00
Chris Hennick
6ae2cfbf52
chore(#132): Attribution for some copied test data 2024-05-17 19:19:07 -07:00
Chris Hennick
492c96c18f
chore: Fix conditionally-unused import 2024-05-15 15:12:18 -07:00
Chris Hennick
c52ec50306
chore: Fix CI failure involving conversion to OsString for symlinks (see my comments on #125) 2024-05-15 14:47:52 -07:00
Chris Hennick
8715d936cb
fix: Extract symlinks into symlinks on Unix and Windows, and fix a bug that affected making directories writable on MacOS 2024-05-13 20:50:40 -07:00
Chris Hennick
3bf0301e39
feat: Add is_symlink method 2024-05-13 19:52:14 -07:00
Chris Hennick
be5836e14d
test: Fix unused imports by moving them inside the cfg-gated test 2024-05-10 15:01:11 -07:00
Chris Hennick
07caf646a0
test: Fix cfg - new test is only needed on Unix and can only run with deflate 2024-05-10 14:30:52 -07:00
Chris Hennick
2ea4e5059f
fix: Extract directory contents on Unix even if the directory doesn't have write permission (https://github.com/zip-rs/zip-old/issues/423) 2024-05-10 14:27:25 -07:00
Chris Hennick
8eb5a75a87
style: Merge patches from code into non_utf8.zip 2024-05-10 09:01:17 -07:00
Chris Hennick
5d3c73a5d5
Merge branch 'master' into dev/ziped
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-09 12:19:48 -07:00
hidez8891
d5f27dfad0
Fixed writing wrong UTF-8 flag
If the UTF-8 flag (generic bit 11) is set, file names and comments must be saved in UTF-8 format. (APPENDIX D)
However, the UTF-8 flag is set even for formats that are non-UTF-8 (GB18030, SHIFT_JIS, etc.). Fix this problem.
2024-05-08 22:22:50 +09:00
Chris Hennick
3ff9428e66
Merge branch 'master' into aes-encryption3
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-06 11:44:55 -07:00
Chris Hennick
2cff4ec936
test: Update reference version-needed-to-open in unit tests (cont'd) 2024-05-06 11:43:01 -07:00
Johannes Löthberg
d096e4dbf1
Add support for writing AES-encrypted files
Signed-off-by: Johannes Löthberg <johannes.loethberg@elokon.com>
2024-05-06 16:37:05 +02:00
Chris Hennick
b59515bbd7
chore: Remove a drop that can no longer be explicit 2024-05-05 19:30:18 -07:00
Chris Hennick
b520c7f517
test: Fix end-to-end test 2024-05-02 17:50:28 -07:00
Chris Hennick
84ae5fc157
refactor: Remove byteorder dependency (#83) 2024-05-02 17:50:27 -07:00
Chris Hennick
9af296d080
style: cargo fmt --all, fix bzip2 error 2024-05-02 10:55:41 -07:00
Chris Hennick
3140276a33
Merge remote-tracking branch 'jans/master' into oldpr437a
# Conflicts:
#	README.md
#	src/cp437.rs
#	src/read.rs
#	src/types.rs
#	src/write.rs
2024-05-02 10:51:01 -07:00
Jan Starke
ccaba9df74
add test case for extended timestamp 2024-05-02 09:34:20 +02:00
awakening
4078bd34cd
fix: Decrypt the read bytes in ZipCrypto instead of entire buffer
Fixes `corrupt deflate stream` panic when extracting a file from encrypted archive (zip-rs/zip#280).
2024-04-27 23:41:32 +07:00
awakening
b718fdf5d0
test: Move embedded archive from variable to a constant 2024-04-27 23:41:28 +07:00
Chris Hennick
46ff80d294
test: verify that we can read a file with a data descriptor 2024-04-23 11:09:06 -07:00
Chris Hennick
174825229c
Change crate name to "zip" per https://github.com/zip-rs/zip/issues/446#issuecomment-2063837388 2024-04-19 18:50:27 -07:00
Wyatt Herkamp
61afe4dad9
Added ExtendedFileOptions 2024-04-15 16:32:07 -04:00
Chris Hennick
4f3f2d1fca Bug fix: LZMA state is large, so put it in a Box 2024-04-11 13:28:37 -07:00
Chris Hennick
b7fe3f6e4f Add tests and update fuzzing dictionary/corpus for LZMA 2024-04-11 13:14:34 -07:00
Chris Hennick
4b8738f8c1 Update a test 2024-03-13 13:16:15 -07:00
Chris Hennick
dec73ef5c1 Merge branch 'tune_fuzz'
# Conflicts:
#	src/read.rs
2024-03-13 13:09:14 -07:00
Chris Hennick
ece098d393 Make InvalidPassword a kind of ZipError 2024-03-13 13:05:54 -07:00
Chris Hennick
b85dd4ba82 Replace reproducing zip with a smaller one 2024-03-09 14:46:29 -08:00
Chris Hennick
be49def529 Replace hard-coded byte array with data file 2024-03-06 12:34:51 -08:00
Chris Hennick
dc62999f85 Bug fix: include data file for new test 2024-03-03 18:03:07 -08:00