Interface SplittableDecompressionCodec
- All Superinterfaces:
DecompressionCodec
Extension of
DecompressionCodec for codecs that support splitting compressed
files into independently decompressible ranges aligned to compressed block boundaries.
Bzip2 is the canonical example: each bzip2 block starts with a 48-bit magic marker
(0x314159265359) and can be decompressed independently when preceded by a
synthetic stream header. This enables parallel decompression of large compressed files.
Stream-only codecs (gzip, zstd) cannot implement this interface because their compressed blocks depend on previous state.
-
Method Summary
Modifier and TypeMethodDescriptiondecompressRange(StorageObject object, long blockStart, long nextBlockStart) Decompresses a range of compressed blocks.long[]findBlockBoundaries(StorageObject object, long start, long end) Finds compressed block boundaries within the given byte range of a storage object.Methods inherited from interface org.elasticsearch.xpack.esql.datasources.spi.DecompressionCodec
decompress, extensions, name
-
Method Details
-
findBlockBoundaries
Finds compressed block boundaries within the given byte range of a storage object. Returns byte offsets (in the compressed stream) where blocks start.Returns an empty array when
start >= endor when the range contains no block boundaries (e.g. header-only or empty files).- Parameters:
object- the storage object to scanstart- start byte offset in the compressed file (inclusive)end- end byte offset in the compressed file (exclusive)- Returns:
- sorted array of byte offsets where compressed blocks begin
- Throws:
IOException
-
decompressRange
InputStream decompressRange(StorageObject object, long blockStart, long nextBlockStart) throws IOException Decompresses a range of compressed blocks. The returned stream yields decompressed bytes for blocks starting atblockStartup to (but not including) the block atnextBlockStart.For bzip2, this creates a synthetic stream by prepending the file header (
BZh+ block size digit) to the raw block data, then wrapping in a standard decompressor.The caller is responsible for closing the returned stream.
- Parameters:
object- the storage object containing the compressed datablockStart- byte offset of the first block to decompressnextBlockStart- byte offset of the next block (or file length for the last block); must be greater thanblockStart- Returns:
- an input stream yielding decompressed bytes for the specified block range
- Throws:
IllegalArgumentException- ifblockStart >= nextBlockStartIOException
-