Interface RangeAwareFormatReader
- All Superinterfaces:
AutoCloseable,Closeable,FormatReader
Extension of
FormatReader for columnar formats (Parquet, ORC) that support
row-group-level split parallelism.
Unlike line-oriented formats that use SegmentableFormatReader with byte-range
views, columnar formats need full-file access (e.g. Parquet's footer is at EOF) but
can selectively read row groups within a byte range. This interface separates the
range specification from the file access, allowing the reader to open the full file
and use format-native range filtering (e.g. ParquetReadOptions.withRange()).
The split discovery flow:
discoverSplitRanges(org.elasticsearch.xpack.esql.datasources.spi.StorageObject)reads file metadata (e.g. Parquet footer) and returns byte ranges for each independently readable unit (e.g. row group).- The framework creates one split per range, distributed across drivers/nodes.
- At read time,
readRange(org.elasticsearch.xpack.esql.datasources.spi.StorageObject, java.util.List<java.lang.String>, int, long, long, java.util.List<org.elasticsearch.xpack.esql.core.expression.Attribute>, org.elasticsearch.xpack.esql.datasources.spi.ErrorPolicy)receives the full-file object and the assigned byte range, reading only the relevant row groups.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic final recordA byte range within a file with optional per-range statistics (e.g.Nested classes/interfaces inherited from interface org.elasticsearch.xpack.esql.datasources.spi.FormatReader
FormatReader.SchemaResolution -
Field Summary
Fields inherited from interface org.elasticsearch.xpack.esql.datasources.spi.FormatReader
NO_LIMIT -
Method Summary
Modifier and TypeMethodDescriptiondiscoverSplitRanges(StorageObject object) Discovers independently readable byte ranges within a file by reading its metadata.readRange(StorageObject object, List<String> projectedColumns, int batchSize, long rangeStart, long rangeEnd, List<Attribute> resolvedAttributes, ErrorPolicy errorPolicy) Reads only the row groups / stripes that fall within the given byte range.Methods inherited from interface org.elasticsearch.xpack.esql.datasources.spi.FormatReader
aggregatePushdownSupport, defaultErrorPolicy, defaultSchemaResolution, fileExtensions, filterPushdownSupport, formatName, metadata, read, read, readAsync, schema, supportsNativeAsync, withConfig, withPushedFilter, withSchema
-
Method Details
-
discoverSplitRanges
List<RangeAwareFormatReader.SplitRange> discoverSplitRanges(StorageObject object) throws IOException Discovers independently readable byte ranges within a file by reading its metadata. Each range typically corresponds to one row group (Parquet) or stripe (ORC).Returns a list of
RangeAwareFormatReader.SplitRangeobjects. An empty list means the file cannot be split (e.g. single row group) and should be read as a whole.- Parameters:
object- the storage object representing the full file- Returns:
- list of split ranges with optional per-range statistics
- Throws:
IOException
-
readRange
CloseableIterator<Page> readRange(StorageObject object, List<String> projectedColumns, int batchSize, long rangeStart, long rangeEnd, List<Attribute> resolvedAttributes, ErrorPolicy errorPolicy) throws IOException Reads only the row groups / stripes that fall within the given byte range. The storage object must represent the full file (not a range-limited view), because columnar formats need access to file-level metadata (e.g. footer).- Parameters:
object- the full-file storage objectprojectedColumns- columns to projectbatchSize- rows per pagerangeStart- start byte offset of the assigned rangerangeEnd- end byte offset (exclusive) of the assigned rangeresolvedAttributes- schema attributes resolved from metadataerrorPolicy- error handling policy- Returns:
- an iterator that yields pages from the matching row groups
- Throws:
IOException
-