Commit graph

33 commits

Author SHA1 Message Date
David Caldwell
73143a0ad6
perf: Faster cde rejection (#255)
* Use the tempfile crate instead of the tempdir crate (which is deprecated)

https://github.com/rust-lang-deprecated/tempdir?tab=readme-ov-file#deprecation-note

* perf: Add benchmark that measures the rejection speed of a large non-zip file

* perf: Speed up non-zip rejection by increasing END_WINDOW_SIZE

I tested several END_WINDOW_SIZEs across 2 machines:

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
512:   test parse_large_non_zip  ... bench:  30,450,608 ns/iter (+/- 673,910)
4096:  test parse_large_non_zip  ... bench:   7,741,366 ns/iter (+/- 521,101)
8192:  test parse_large_non_zip  ... bench:   5,807,443 ns/iter (+/- 546,227)
16384: test parse_large_non_zip  ... bench:   4,794,314 ns/iter (+/- 419,114)
32768: test parse_large_non_zip  ... bench:   4,262,897 ns/iter (+/- 397,582)
65536: test parse_large_non_zip  ... bench:   4,060,847 ns/iter (+/- 280,964)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
512:   test parse_large_non_zip  ... bench:  65,132,581 ns/iter (+/- 7,429,976)
4096:  test parse_large_non_zip  ... bench:  14,109,503 ns/iter (+/- 2,892,086)
8192:  test parse_large_non_zip  ... bench:   9,942,500 ns/iter (+/- 1,886,063)
16384: test parse_large_non_zip  ... bench:   8,205,851 ns/iter (+/- 2,902,041)
32768: test parse_large_non_zip  ... bench:   7,012,011 ns/iter (+/- 2,222,879)
65536: test parse_large_non_zip  ... bench:   6,577,275 ns/iter (+/- 881,546)

In both cases END_WINDOW_SIZE=8192 performed about 6x better than 512 and >8192
didn't make much of a difference on top of that.

* perf: Speed up non-zip rejection by limiting search for EOCDR.

I benchmarked several search sizes across 2 machines
(these benches are using an 8192 END_WINDOW_SIZE):

Machine 1: macOS 15.0.1, aarch64 (apfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   5,773,801 ns/iter (+/- 411,277)
last 128k:    test parse_large_non_zip              ... bench:      54,402 ns/iter (+/- 4,126)
last 66,000:  test parse_large_non_zip              ... bench:      36,152 ns/iter (+/- 4,293)

Machine 2: Debian testing, x86_64 (tmpfs /tmp)
whole file:   test parse_large_non_zip              ... bench:   9,942,306 ns/iter (+/- 1,963,522)
last 128k:    test parse_large_non_zip              ... bench:      73,604 ns/iter (+/- 16,662)
last 66,000:  test parse_large_non_zip              ... bench:      41,349 ns/iter (+/- 16,812)

As you might expect these significantly increase the rejection speed for
large non-zip files.

66,000 was the number previously used by zip-rs. It was changed to zero in
7a55945743.

128K is what Info-Zip uses[1]. This seems like a reasonable (non-zero)
choice for compatibility reasons.

[1] Info-zip is extremely old and doesn't not have an official git repo to
    link to. However, an unofficial fork can be found here:
    bb0c4755d4/zipfile.c (L4073)

---------

Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-11-19 18:49:35 +00:00
Chris Hennick
5e216fe150
Bug fix: len() is must-use
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 13:08:05 -07:00
Chris Hennick
3ab9f457fb
Bug fix: bench_n expects empty return
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 13:05:49 -07:00
Chris Hennick
a4915fdcd7
Fix a bug in benchmark: closure needs a parameter
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 13:01:51 -07:00
Chris Hennick
ed1d38f5da
Run bench only once for each random input
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
2024-05-24 12:53:27 -07:00
Danny McClanahan
8e5b157853
fix stream benchmark 2024-05-24 08:58:41 -04:00
Danny McClanahan
d852c222fc
review comments 1 2024-05-24 07:54:40 -04:00
Danny McClanahan
46c42c7f82
review comments 1 2024-05-24 07:52:30 -04:00
Danny McClanahan
ea308499af
bulk parsing and bulk writing
- use blocks for reading individual file headers
- remove unnecessary option wrapping for stream entries
- create Block trait
- add coerce method to reduce some boilerplate
- add serialize method to reduce more boilerplate
- use to_le! and from_le!
- add test case
- add some docs
- rename a few structs to clarify zip32-only
2024-05-24 07:52:25 -04:00
Danny McClanahan
3d1728d796
add stream benchmark 2024-05-24 07:39:54 -04:00
Danny McClanahan
0a573d3747
make benchmarks report bytes/second 2024-05-24 07:39:54 -04:00
Danny McClanahan
7a55945743
add benchmarks 2024-05-24 07:39:54 -04:00
Chris Hennick
e9d48b7333
style: Remove unnecessary "mut"s in merge_archive benchmarks 2024-05-05 19:39:13 -07:00
Chris Hennick
d663b31fb2
chore: Fix: don't feature-gate all of merge_archive.rs, only the parts that use compression 2024-05-03 11:49:09 -07:00
Chris Hennick
cb6f87bc02
chore: Fix a pre-existing failure 2024-05-03 11:43:41 -07:00
Danny McClanahan
e42ff64449
add merge_archive benchmarks 2024-05-02 00:25:05 -04:00
Chris Hennick
e4d0a0228a
cargo fmt --all 2024-04-19 18:52:45 -07:00
Chris Hennick
174825229c
Change crate name to "zip" per https://github.com/zip-rs/zip/issues/446#issuecomment-2063837388 2024-04-19 18:50:27 -07:00
Wyatt Herkamp
61afe4dad9
Added ExtendedFileOptions 2024-04-15 16:32:07 -04:00
Chris Hennick
2407ef95c6
Fixes and refactors for no-features build 2023-05-30 18:17:59 -07:00
Chris Hennick
255cfaf261
Add flush_on_finish_file parameter 2023-05-26 17:22:53 -07:00
Chris Hennick
bf0ad491c0
Bug fix 2023-05-13 14:02:34 -07:00
Chris Hennick
98d37c8b77
Fix more formatting issues (sort imports) 2023-04-23 15:26:00 -07:00
Chris Hennick
06b5ceaef9
Fix another formatting issue 2023-04-23 15:19:19 -07:00
Chris Hennick
6dc099d128
Fix more formatting issues 2023-04-23 15:12:56 -07:00
Chris Hennick
d3400509bc
Fix formatting issues from cargo fmt 2023-04-23 14:58:10 -07:00
Chris Hennick
cde5d5ed11
Implement shallow copy from within the file being written 2023-04-23 14:33:10 -07:00
Kyle Bloom
03f5009c34 fix: Clippy uninlined format args 2023-01-31 17:29:34 +00:00
Pieter-Jan Briers
621971f078 Use some ::with_capacity when reading zip file.
Now with a proper benchmark
2022-04-11 16:17:20 +02:00
Emmanuel Gil Peyrot
b031ab75bd Use getrandom instead of rand for benches
The current code didn’t build, and this one includes fewer dependencies
than the full rand set of crates.
2021-09-06 23:42:14 +02:00
Ryan Levick
ebb07348ee Run cargo fmt 2020-06-15 10:44:39 +02:00
Lachezar Lechev
29517e9a6b
run cargo fix --edition-idioms and manually fix other things 2019-11-11 09:20:31 +02:00
Sam Rijs
2b42b0219b add read_entry benchmark 2018-11-13 23:55:59 +11:00