Preemptively allocating structures with the number of reported files can
lead to trouble as an invalid zip can still have a valid central
directory end that is fed into a `with_capacity` causing it to overflow.
This commit introduces a heuristic that will use the reported number of
files as long as the number is less than the cde offset.
Benchmarks were unaffected by this change.
As someone who has personal projects that take untrusted zips as input,
it is important to me to be able to fuzz the zip project to simulate
possible inputs and to ensure the projects are not vulnerable.
This commit adds a cargo fuzz module for reading and extracting input.
The `fuzz` directory was scaffolded with a `cargo fuzz init`
I added a CI step to guard against the fuzz module decaying over time.
The primary goal of this commit is to enable this library to emit
zip archives with symlinks while minimizing the surface area of the
change to hopefully enable the PR to merge with minimal controversy.
Today, it isn't possible to write symlinks with this library because
there's no way to preserve the upper S_IFMT bits in the file mode
bits because:
* There's no way to set FileOptions.permissions with the S_IFLNK bits
set (FileOptions.unix_permissions() throws away bits beyond 0o777).
* Existing APIs for starting a "typed" (e.g. file or directory) entry
automatically set the S_IFMT bits and could conflict with bits
set on FileOptions.permissions.
* The low-level, generic start_entry() function isn't public.
When implementing this, I initially added a `FileOptions.unix_mode()`
function to allow setting all 16 bits in the eventual external
attributes u32. However, I quickly realized this wouldn't be enough
because APIs like start_file() do things like `|= 0o100000`. So if
we went this route, we'd need to make consumers of
FileOptions.permissions aware of when they should or shouldn't touch
the high bits beyond 0o777.
I briefly thought about making FileOptions.permissions an enum with
a variant to allow the st_mode bits to sail through unmodified. But
this change seemed overly invasive, low level, and not very
user-friendly.
So the approach I decided on was to define a new add_symlink() API.
It follows the pattern of add_directory() and provides an easy-to-use
and opionated API around the addition of a special file type.
I purposefully chose to not implement reading or extraction support
for symlinks because a) I don't need the feature at the moment
b) implementing symlink extraction in a way that works reliably on all
platforms and doesn't have security issues is hard. I figured it was
best to limit the scope of this change so this PR stands a good chance
of being merged.
Partially implements #77.
BufReader can cache the result of stream_position() so it's fast, whereas seek(0) discards the read buffer and passes through straight to the OS. This drastically worsens the efficiency of loading performance when using BufReader (or anything else to avoid all these tiny reads going straight to the kernel).