Interface FormatReader
- All Superinterfaces:
AutoCloseable,Closeable
- All Known Subinterfaces:
RangeAwareFormatReader,SegmentableFormatReader
Simple formats: implement only read(StorageObject, FormatReadContext) (sync) -
async wrapping is automatic.
Async-capable formats: override readAsync(StorageObject, FormatReadContext, Executor, ActionListener)
for native async behavior.
The output is ESQL's native Page format rather than Arrow to avoid mandating Arrow as a dependency for all format implementations.
Implementations should provide metadata discovery via metadata(StorageObject)
which returns a unified SourceMetadata containing schema and source information.
Per-query format configuration (delimiter, encoding, etc.) is set on the reader instance
via withConfig(Map). Per-query optimizer hints (pushed filters for row-group
or stripe skipping) are set via withPushedFilter(Object). Per-read execution
parameters (projection, batch size, limit, error policy, split config) are bundled in
FormatReadContext.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enumStrategy for resolving schemas across multiple files in a glob/multi-file query. -
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptiondefault AggregatePushdownSupportReturns the aggregate pushdown support for this format.default ErrorPolicyReturns the default error policy for this format.default FormatReader.SchemaResolutiondefault FilterPushdownSupportReturns the filter pushdown support for this format, or null if not supported.metadata(StorageObject object) default CloseableIterator<Page> read(StorageObject object, List<String> projectedColumns, int batchSize) Convenience overload that delegates toread(StorageObject, FormatReadContext).read(StorageObject object, FormatReadContext context) Reads data from the given storage object using the provided context.default voidreadAsync(StorageObject object, FormatReadContext context, Executor executor, ActionListener<CloseableIterator<Page>> listener) Asynchronously reads data from the given storage object using the provided context.schema(StorageObject object) default booleandefault FormatReaderwithConfig(Map<String, Object> config) Returns a format reader configured with the given config map (from the WITH clause).default FormatReaderwithPushedFilter(Object pushedFilter) Returns a format reader configured with the given pushed filter from the optimizer.default FormatReaderwithSchema(List<Attribute> schema) Returns a format reader configured with the schema attributes.
-
Field Details
-
NO_LIMIT
static final int NO_LIMIT- See Also:
-
-
Method Details
-
defaultSchemaResolution
-
defaultErrorPolicy
Returns the default error policy for this format. Override to change the default behavior for a specific format (e.g. NDJSON defaults to lenient because skipping malformed lines is its natural behavior). -
metadata
- Throws:
IOException
-
schema
- Throws:
IOException
-
read
Reads data from the given storage object using the provided context.This is the primary read method. All implementations must override this method.
- Throws:
IOException
-
read
default CloseableIterator<Page> read(StorageObject object, List<String> projectedColumns, int batchSize) throws IOException Convenience overload that delegates toread(StorageObject, FormatReadContext). Keeps test code and simple call sites working without constructing a context.- Throws:
IOException
-
readAsync
default void readAsync(StorageObject object, FormatReadContext context, Executor executor, ActionListener<CloseableIterator<Page>> listener) Asynchronously reads data from the given storage object using the provided context.The default wraps the synchronous
read(StorageObject, FormatReadContext)in the provided executor. Formats with native async support should override this. -
formatName
String formatName() -
fileExtensions
-
withConfig
Returns a format reader configured with the given config map (from the WITH clause). Implementations should parse format-specific options from the config and return a new reader instance if any options are present. The default returnsthis(no configuration). -
withPushedFilter
Returns a format reader configured with the given pushed filter from the optimizer.The pushed filter is an opaque object produced by
FilterPushdownSupportduring local physical optimization. Only format readers that support predicate pushdown (e.g., Parquet row-group skipping, ORC stripe-level predicates) need to override this.The filter is per-query: it applies identically to every file/split in the query. Implementations should cast the filter to their expected type and return a new reader instance with the filter stored as an instance field.
- Parameters:
pushedFilter- opaque filter object, or null if no filter was pushed- Returns:
- a new reader with the filter applied, or
thisif the filter is not applicable
-
aggregatePushdownSupport
Returns the aggregate pushdown support for this format. Only format readers with column statistics in their metadata (Parquet, ORC) override this. -
withSchema
Returns a format reader configured with the schema attributes.The schema is determined during the planning phase (via
metadata(StorageObject)) and is constant for all files/splits in a query. Passing it here allows the reader to skip re-reading/inferring the schema from the file header on every read, which is especially important for split-based reads where the split may start mid-file (no header available).Formats with embedded schemas (Parquet, ORC) may ignore this since they always read the schema from the file metadata.
- Parameters:
schema- the planning-phase schema attributes, or null to clear- Returns:
- a new reader with the schema set, or
thisif the schema is not needed
-
filterPushdownSupport
Returns the filter pushdown support for this format, or null if not supported.When non-null, the optimizer can translate ESQL filter expressions into format-specific predicates (e.g., Parquet FilterPredicate) that enable row-group skipping via statistics, dictionary, and bloom filter checks.
- Returns:
- FilterPushdownSupport for this format, or null if not supported
-
supportsNativeAsync
default boolean supportsNativeAsync()
-