Skip to content

Compression

Utilities for wrangling whole-file compression formats.

Because MAST does not accept files compressed in any format other than plain gzip, this module doesn't support reading or writing such files. However, it is capable of identifying several other common whole-file compression formats for diagnostic purposes.

check_compression_magic

check_compression_magic(magic: bytes, name: Path) -> str | None

Verify that a file named NAME starts with the expected magic number for the compression format identified by the file's extension. MAGIC is the first N bytes of the file.

If the file does not appear to be compressed, returns None.

If the file's contents and name are consistent, returns the canonical name extension for that type of compression.

Throws ValueError if the file's contents and name are inconsistent.

closing_gzfile

closing_gzfile(*, fileobj: BinaryIO, mode: str) -> GzipFile

As gzip.GzipFile(), except that 'fileobj' must be a file object, and closing the resulting GzipFile does close that object.

compression_format_for_magic

compression_format_for_magic(magic: bytes) -> str | None

Get the canonical file name extension for the compression format identified by the magic number MAGIC (i.e. the first N bytes of a file compressed using that format). If the magic number doesn't correspond to a compression format we know about, returns None.

open

open(name: Path, mode: MCOpenMode) -> tuple[MCFile, str | None]

Like the built-in open(), but:

  • when opening for reading, if the filename extension on 'name' is one of the recognized whole-file compression extensions, checks whether the content of the file matches that extension, and, for the smaller set of supported compression algorithms, transparently decompresses the file contents.

  • when opening for writing, if the filename extension on 'name' indicates a supported compression algorithm, transparently compresses the file contents.

Unlike the built-in open():

  • the first argument must be a pathlib.Path
  • the mode argument is mandatory
  • you cannot open a file in text mode (wrap the returned open-file object in an io.TextIOWrapper to get text mode).

The return value is a 2-tuple of the open file object and the canonical extension for the compression format (with a leading dot, like Path.suffix) or None if the file does not appear to be compressed.