If the UTF-8 flag (generic bit 11) is set, file names and comments must be saved in UTF-8 format. (APPENDIX D)
However, the UTF-8 flag is set even for formats that are non-UTF-8 (GB18030, SHIFT_JIS, etc.). Fix this problem.
Preemptively allocating structures with the number of reported files can
lead to trouble as an invalid zip can still have a valid central
directory end that is fed into a `with_capacity` causing it to overflow.
This commit introduces a heuristic that will use the reported number of
files as long as the number is less than the cde offset.
Benchmarks were unaffected by this change.
The naming matches that of std::fs::Metadata.
An entry is determined to be a directory based on the presence of
a trailing path separator, i.e. '/' or '\'.
This patch adds a small test zip containing files and directories.
Their names match their type so as to make testing easy.
I constructed this file using a hack from the Zip manpage: if the input to a
Zip compression command is streamed on standard input, the output is given in
ZIP64 format since the tool doesn't know how big the input will be. I modified
the resulting file by adding some leading junk text and editing the non-ZIP64
end-of-central-directory structure to have 0xFFFF for its "number of files"
parameters, to help the test demonstrate that the ZIP64 data are being
properly read. (0xFFFF is the value used in the non-ZIP64 structure if the
archive actually has more than 65535 files.)