Interface SplittableDecompressionCodec

All Superinterfaces:
DecompressionCodec

public interface SplittableDecompressionCodec extends DecompressionCodec
Extension of DecompressionCodec for codecs that support splitting compressed files into independently decompressible ranges aligned to compressed block boundaries.

Bzip2 is the canonical example: each bzip2 block starts with a 48-bit magic marker (0x314159265359) and can be decompressed independently when preceded by a synthetic stream header. This enables parallel decompression of large compressed files.

Stream-only codecs (gzip, zstd) cannot implement this interface because their compressed blocks depend on previous state.

  • Method Details

    • findBlockBoundaries

      long[] findBlockBoundaries(StorageObject object, long start, long end) throws IOException
      Finds compressed block boundaries within the given byte range of a storage object. Returns byte offsets (in the compressed stream) where blocks start.

      Returns an empty array when start >= end or when the range contains no block boundaries (e.g. header-only or empty files).

      Parameters:
      object - the storage object to scan
      start - start byte offset in the compressed file (inclusive)
      end - end byte offset in the compressed file (exclusive)
      Returns:
      sorted array of byte offsets where compressed blocks begin
      Throws:
      IOException
    • decompressRange

      InputStream decompressRange(StorageObject object, long blockStart, long nextBlockStart) throws IOException
      Decompresses a range of compressed blocks. The returned stream yields decompressed bytes for blocks starting at blockStart up to (but not including) the block at nextBlockStart.

      For bzip2, this creates a synthetic stream by prepending the file header (BZh + block size digit) to the raw block data, then wrapping in a standard decompressor.

      The caller is responsible for closing the returned stream.

      Parameters:
      object - the storage object containing the compressed data
      blockStart - byte offset of the first block to decompress
      nextBlockStart - byte offset of the next block (or file length for the last block); must be greater than blockStart
      Returns:
      an input stream yielding decompressed bytes for the specified block range
      Throws:
      IllegalArgumentException - if blockStart >= nextBlockStart
      IOException