Interface SourceMetadata
- All Known Subinterfaces:
ExternalSourceMetadata
- All Known Implementing Classes:
SimpleSourceMetadata
For file-based sources (Parquet, CSV), the schema is embedded in the file itself, so no additional metadata needs to flow through to execution.
For table-based sources (Iceberg, Delta Lake), the native schema and other
source-specific data must be preserved in sourceMetadata() to avoid
re-resolving the table during execution. Core passes this through without
interpreting it; only the source-specific operator factory understands it.
Implementations should be immutable and thread-safe.
-
Method Summary
Modifier and TypeMethodDescriptionconfig()Returns configuration for operator creation.location()Returns the original path or location of the source.Returns optional partition column names.schema()Returns the resolved schema as ESQL attributes.Returns opaque source-specific metadata.Returns the source type identifier.default Optional<SourceStatistics> Returns optional statistics for query planning.
-
Method Details
-
schema
Returns the resolved schema as ESQL attributes. The attributes represent the columns available for querying.- Returns:
- list of attributes representing the schema, never null
-
sourceType
String sourceType()Returns the source type identifier. Examples: "parquet", "iceberg", "csv", "delta"- Returns:
- the source type string, never null
-
location
String location()Returns the original path or location of the source. This is the URI or path used to access the data.- Returns:
- the location string, never null
-
statistics
Returns optional statistics for query planning. Statistics can include row counts, column statistics, etc.- Returns:
- optional statistics, empty if not available
-
partitionColumns
Returns optional partition column names. For partitioned data sources, this indicates which columns are used for partitioning.- Returns:
- optional list of partition column names, empty if not partitioned
-
sourceMetadata
Returns opaque source-specific metadata.This is used by table-based sources (Iceberg, Delta Lake) to pass native schema and other source-specific data through to the operator factory without core needing to understand it.
For example, Iceberg stores its native
Schemaobject here under a well-known key. The Iceberg operator factory retrieves it when creating operators, avoiding the need to re-resolve the table.File-based sources typically return an empty map since the schema is embedded in the file itself.
- Returns:
- map of source-specific metadata, never null
-
config
Returns configuration for operator creation.This replaces source-specific configuration classes (like S3Configuration) leaking into core. Configuration is stored as a generic map that the source-specific operator factory interprets.
Common keys include:
- "access_key" - S3 access key
- "secret_key" - S3 secret key
- "endpoint" - S3 endpoint URL
- "region" - AWS region
- Returns:
- configuration map, never null
-