Interface SourceMetadata

All Known Subinterfaces:
ExternalSourceMetadata
All Known Implementing Classes:
SimpleSourceMetadata

public interface SourceMetadata
Unified metadata output type returned by all schema discovery mechanisms. This interface provides a consistent way to access metadata regardless of whether it comes from a FormatReader (Parquet, CSV) or a TableCatalog (Iceberg, Delta Lake).

For file-based sources (Parquet, CSV), the schema is embedded in the file itself, so no additional metadata needs to flow through to execution.

For table-based sources (Iceberg, Delta Lake), the native schema and other source-specific data must be preserved in sourceMetadata() to avoid re-resolving the table during execution. Core passes this through without interpreting it; only the source-specific operator factory understands it.

Implementations should be immutable and thread-safe.

  • Method Details

    • schema

      List<Attribute> schema()
      Returns the resolved schema as ESQL attributes. The attributes represent the columns available for querying.
      Returns:
      list of attributes representing the schema, never null
    • sourceType

      String sourceType()
      Returns the source type identifier. Examples: "parquet", "iceberg", "csv", "delta"
      Returns:
      the source type string, never null
    • location

      String location()
      Returns the original path or location of the source. This is the URI or path used to access the data.
      Returns:
      the location string, never null
    • statistics

      default Optional<SourceStatistics> statistics()
      Returns optional statistics for query planning. Statistics can include row counts, column statistics, etc.
      Returns:
      optional statistics, empty if not available
    • partitionColumns

      default Optional<List<String>> partitionColumns()
      Returns optional partition column names. For partitioned data sources, this indicates which columns are used for partitioning.
      Returns:
      optional list of partition column names, empty if not partitioned
    • sourceMetadata

      default Map<String,Object> sourceMetadata()
      Returns opaque source-specific metadata.

      This is used by table-based sources (Iceberg, Delta Lake) to pass native schema and other source-specific data through to the operator factory without core needing to understand it.

      For example, Iceberg stores its native Schema object here under a well-known key. The Iceberg operator factory retrieves it when creating operators, avoiding the need to re-resolve the table.

      File-based sources typically return an empty map since the schema is embedded in the file itself.

      Returns:
      map of source-specific metadata, never null
    • config

      default Map<String,Object> config()
      Returns configuration for operator creation.

      This replaces source-specific configuration classes (like S3Configuration) leaking into core. Configuration is stored as a generic map that the source-specific operator factory interprets.

      Common keys include:

      • "access_key" - S3 access key
      • "secret_key" - S3 secret key
      • "endpoint" - S3 endpoint URL
      • "region" - AWS region
      Returns:
      configuration map, never null