Class ParquetStorageObjectAdapter

java.lang.Object
org.elasticsearch.xpack.esql.datasource.parquet.ParquetStorageObjectAdapter
All Implemented Interfaces:
org.apache.parquet.io.InputFile

public class ParquetStorageObjectAdapter extends Object implements org.apache.parquet.io.InputFile
Adapter that wraps a StorageObject to implement Parquet's InputFile interface. This allows using our storage abstraction with Parquet's ParquetFileReader.

Key features:

  • Uses only range reads (newStream(position, length)) — never full-object GET
  • Sliding window cache (default 4MB) to amortize seeks and avoid InputStream.skip
  • Optimized for remote storage (S3, HTTP) where full GET and skip-download are expensive
  • No Hadoop dependencies — uses pure Java InputStream
  • Constructor Details

    • ParquetStorageObjectAdapter

      public ParquetStorageObjectAdapter(StorageObject storageObject)
      Creates an adapter with the default 4MB sliding window.
  • Method Details

    • forRange

      public static ParquetStorageObjectAdapter forRange(StorageObject storageObject, long rangeBytes)
      Creates an adapter with an adaptive window sized to cover the given byte range. This allows all column chunks within a small row-group split to be fetched in a single I/O instead of incurring multiple range GETs with the default 4 MiB window.
      Parameters:
      rangeBytes - byte span of the range being read; clamped to [DEFAULT_WINDOW_SIZE, MAX_WINDOW_SIZE]
    • getLength

      public long getLength() throws IOException
      Specified by:
      getLength in interface org.apache.parquet.io.InputFile
      Throws:
      IOException
    • newStream

      public org.apache.parquet.io.SeekableInputStream newStream() throws IOException
      Specified by:
      newStream in interface org.apache.parquet.io.InputFile
      Throws:
      IOException