arcgzip.py is an alternative python gzip library, of which focus is on the full support of gzip metadata.
Specifically, this library enables you to ...
- extract all of the meta information defined in RFC-1952.
- create an archive with fine-tuned header data.
- handle multiple files in a single gzip archive.
-
Python 3.2 or later required (Python 2.7 is also supported)
-
Download the source code and run 'setup.py':
$ python setup.py install
arcgzip.py -l archive.gz - Show the list of contents.
arcgzip.py -a archive.gz targets - Add target files to the archive.
arcgzip.py -c archive.gz targets - Create a new archive from target files.
arcgzip.py -d archive.gz targets - Extract files from the archive,
--ascii - Set ASCII text flag.
--crc16 - Add crc16 checksum field to the header.
--comment [S] - Add file comments for the file.
--content [S] - Write the string to archive (instead of target files)
--encoding [S] - Specify the encoding of the string (with --content)
--exfield [B] - Set the base64-encoded data to the extra field.
--level [N] - Compression level to be used (1-fastest/9-slowest)
- Support stream input.
- Implement iterator interface on GzipFile class.
- Add functional tests for command-line options.
- Improve the output format of 'LIST' mode.
Belows are some thoughts on the design issues of arcgzip.
- Some user may find it useful if arcgzip provides 'no-strict' mode which skips all the CRC16/CRC32/ISIZE checks.
- It might be good idea to provide an accommodating interface to 'exfield' assuming anyone actually makes use of the data field.
- To support stream input, we will need to implement the I/O wrapper that enable seeking (for some extent, at least) by buffering the input bytes. Is there any standard library that can be used to build that feature?
- Currently arcgzip buffers all the file contents in memory while decompressing. It's awkward, but it has an advantage that extracted file object doesn't mess the original file pointer of GzipFile.
- Pickling the GzipInfo array effectively serves as an extrenal index file.