Interface RangeAwareFormatReader

All Superinterfaces:
AutoCloseable, Closeable, FormatReader

public interface RangeAwareFormatReader extends FormatReader
Extension of FormatReader for columnar formats (Parquet, ORC) that support row-group-level split parallelism.

Unlike line-oriented formats that use SegmentableFormatReader with byte-range views, columnar formats need full-file access (e.g. Parquet's footer is at EOF) but can selectively read row groups within a byte range. This interface separates the range specification from the file access, allowing the reader to open the full file and use format-native range filtering (e.g. ParquetReadOptions.withRange()).

The split discovery flow:

  1. discoverSplitRanges(org.elasticsearch.xpack.esql.datasources.spi.StorageObject) reads file metadata (e.g. Parquet footer) and returns byte ranges for each independently readable unit (e.g. row group).
  2. The framework creates one split per range, distributed across drivers/nodes.
  3. At read time, readRange(org.elasticsearch.xpack.esql.datasources.spi.StorageObject, java.util.List<java.lang.String>, int, long, long, java.util.List<org.elasticsearch.xpack.esql.core.expression.Attribute>, org.elasticsearch.xpack.esql.datasources.spi.ErrorPolicy) receives the full-file object and the assigned byte range, reading only the relevant row groups.
  • Method Details

    • discoverSplitRanges

      List<RangeAwareFormatReader.SplitRange> discoverSplitRanges(StorageObject object) throws IOException
      Discovers independently readable byte ranges within a file by reading its metadata. Each range typically corresponds to one row group (Parquet) or stripe (ORC).

      Returns a list of RangeAwareFormatReader.SplitRange objects. An empty list means the file cannot be split (e.g. single row group) and should be read as a whole.

      Parameters:
      object - the storage object representing the full file
      Returns:
      list of split ranges with optional per-range statistics
      Throws:
      IOException
    • readRange

      CloseableIterator<Page> readRange(StorageObject object, List<String> projectedColumns, int batchSize, long rangeStart, long rangeEnd, List<Attribute> resolvedAttributes, ErrorPolicy errorPolicy) throws IOException
      Reads only the row groups / stripes that fall within the given byte range. The storage object must represent the full file (not a range-limited view), because columnar formats need access to file-level metadata (e.g. footer).
      Parameters:
      object - the full-file storage object
      projectedColumns - columns to project
      batchSize - rows per page
      rangeStart - start byte offset of the assigned range
      rangeEnd - end byte offset (exclusive) of the assigned range
      resolvedAttributes - schema attributes resolved from metadata
      errorPolicy - error handling policy
      Returns:
      an iterator that yields pages from the matching row groups
      Throws:
      IOException