Skip to content

Commit aec0aed

Browse files
danny0838blurb-it[bot]gpshead
authored
gh-51067: Add remove() and repack() to ZipFile (GH-134627)
The docs included in the commit do the best job of describing this. Much discussion on the PR and issue. thank you to to core team folks jaraco, emmatyping, gpshead, and all others who added their constructive comments along the way. --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
1 parent 1fb874c commit aec0aed

6 files changed

Lines changed: 2941 additions & 0 deletions

File tree

Doc/library/zipfile.rst

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -550,6 +550,94 @@ ZipFile objects
550550
.. versionadded:: 3.11
551551

552552

553+
.. method:: ZipFile.remove(zinfo_or_arcname)
554+
555+
Removes a member entry from the archive's central directory.
556+
*zinfo_or_arcname* may be the full path of the member or a :class:`ZipInfo`
557+
instance. If multiple members share the same full path and the path is
558+
given as a string, only one of them is removed and which one is unspecified;
559+
it should not be relied upon. Pass the specific :class:`ZipInfo` instance to
560+
remove a particular member.
561+
562+
The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.
563+
564+
Returns the removed :class:`ZipInfo` instance.
565+
566+
Calling :meth:`remove` on a closed ZipFile will raise a :exc:`ValueError`.
567+
568+
.. note::
569+
This method only removes the member's entry from the central directory,
570+
making it inaccessible to most tools. The member's local file entry,
571+
including content and metadata, remains in the archive and is still
572+
recoverable using forensic tools. Call :meth:`repack` afterwards to
573+
remove the local file entry and reclaim space; pass the returned
574+
:class:`ZipInfo` to :meth:`repack` to ensure the data is removed
575+
regardless of how the entry was written.
576+
577+
.. versionadded:: next
578+
579+
580+
.. method:: ZipFile.repack(removed=None, *, \
581+
strict_descriptor=True[, chunk_size])
582+
583+
Rewrites the archive to remove unreferenced local file entries, shrinking
584+
its file size. The archive must be opened with mode ``'a'``.
585+
586+
If *removed* is provided, it must be a sequence of :class:`ZipInfo` objects
587+
representing the recently removed members, and only their corresponding
588+
local file entries will be removed. Otherwise, the archive is scanned to
589+
locate and remove local file entries that are no longer referenced in the
590+
central directory.
591+
592+
Passing *removed* is the most reliable way to reclaim space: the
593+
corresponding local file entries are located directly from the central
594+
directory and removed regardless of how they were written, whereas the scan
595+
used when *removed* is omitted may leave some entries in place (see
596+
*strict_descriptor* below). To remove members and reclaim their space in a
597+
single step::
598+
599+
with ZipFile('spam.zip', 'a') as myzip:
600+
removed = [myzip.remove(name) for name in ('ham.txt', 'eggs.txt')]
601+
myzip.repack(removed)
602+
603+
When scanning, *strict_descriptor* controls how entries written with an
604+
unsigned *data descriptor* are handled. A data descriptor is an optional
605+
record holding an entry's CRC and sizes, stored just after the entry's data;
606+
it is used when the archive is written to a non-seekable stream, and is
607+
*signed* when it begins with a marker signature or *unsigned* otherwise.
608+
Unsigned descriptors have been deprecated by the `PKZIP Application Note`_
609+
since version 6.3.0 (released in 2006) and are written only by some legacy
610+
tools; signed descriptors—written by Python and other modern tools—are always
611+
detected. When *strict_descriptor* is true (the default), only signed data
612+
descriptors are detected, so an unreferenced entry written with an unsigned
613+
descriptor is not located and its space is not reclaimed by the scan.
614+
Setting ``strict_descriptor=False`` additionally detects unsigned
615+
descriptors, at the cost of a significantly slower scan—around 100 to 1000
616+
times in the worst case—which may be exploitable as a denial-of-service
617+
vector on untrusted input. This does not affect entries without a data
618+
descriptor, and is not needed when *removed* is provided.
619+
620+
*chunk_size* may be specified to control the buffer size when moving
621+
entry data (default is 1 MiB).
622+
623+
Calling :meth:`repack` on a closed ZipFile will raise a :exc:`ValueError`.
624+
625+
.. note::
626+
The scanning algorithm is heuristic-based and assumes that the ZIP file
627+
is normally structured—for example, with local file entries stored
628+
consecutively, without overlap or interleaved binary data. Prepended
629+
binary data, such as a self-extractor stub, is recognized and preserved
630+
unless it happens to contain bytes that coincidentally resemble a valid
631+
local file entry in multiple respects—an extremely rare case. Embedded
632+
ZIP payloads are also handled correctly, as long as they follow normal
633+
structure. However, the algorithm does not guarantee correctness or
634+
safety on untrusted or intentionally crafted input. It is generally
635+
recommended to provide the *removed* argument for better reliability and
636+
performance.
637+
638+
.. versionadded:: next
639+
640+
553641
The following data attributes are also available:
554642

555643
.. attribute:: ZipFile.filename

Doc/whatsnew/3.16.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,15 @@ xml
172172
instead of failing later, when encounter non-ASCII data.
173173
(Contributed by Serhiy Storchaka in :gh:`62259`.)
174174

175+
zipfile
176+
-------
177+
178+
* Add :meth:`ZipFile.remove() <zipfile.ZipFile.remove>` to remove a member
179+
from an archive's central directory, and
180+
:meth:`ZipFile.repack() <zipfile.ZipFile.repack>` to reclaim the space used
181+
by the local file entries of removed members.
182+
(Contributed by Danny Lin in :gh:`51067`.)
183+
175184
.. Add improved modules above alphabetically, not here at the end.
176185
177186
Optimizations

0 commit comments

Comments
 (0)