Interface StorageObject


public interface StorageObject
Unified interface for storage object access.

Simple providers: implement sync methods - async wrapping is automatic. Async-capable providers (HTTP, S3): override async methods for native non-blocking I/O.

Provides metadata access and methods to open streams for reading. Uses standard Java InputStream for compatibility with existing Elasticsearch code. Random access is handled via range-based reads (like BlobContainer pattern).

StorageObject instances themselves do not hold open resources and do not need closing. The InputStream instances returned by newStream() and newStream(long, long) are the resources that callers must close.

  • Field Details

    • TRANSFER_BUFFER_SIZE

      static final int TRANSFER_BUFFER_SIZE
      Transfer buffer size used by the default readBytes(long, ByteBuffer) when reading into a direct ByteBuffer via an InputStream. Kept small to avoid large stack allocations while still being efficient for typical I/O page sizes.
      See Also:
  • Method Details

    • newStream

      InputStream newStream() throws IOException
      Opens an input stream for sequential reading from the beginning.
      Throws:
      IOException
    • newStream

      InputStream newStream(long position, long length) throws IOException
      Opens an input stream for reading a specific byte range. Critical for columnar formats like Parquet that read specific column chunks. For reading object footers (e.g., Parquet), use: newStream(length() - footerSize, footerSize)
      Throws:
      IOException
    • length

      long length() throws IOException
      Returns the object size in bytes.
      Throws:
      IOException
    • lastModified

      Instant lastModified() throws IOException
      Returns the last modification time, or null if not available.
      Throws:
      IOException
    • exists

      boolean exists() throws IOException
      Checks if the object exists.
      Throws:
      IOException
    • path

      StoragePath path()
      Returns the path of this object.
    • readBytesAsync

      default void readBytesAsync(long position, long length, Executor executor, ActionListener<ByteBuffer> listener)
      Async byte read with ActionListener callback.

      Default implementation wraps the sync newStream(long, long) method in an executor. Override this method for native async I/O (e.g., HTTP sendAsync, S3AsyncClient).

      Columnar formats (Parquet) can use this for parallel chunk reads when supportsNativeAsync() returns true.

      Parameters:
      position - the starting byte position
      length - the number of bytes to read
      executor - executor for running the async operation
      listener - callback for the result or failure
    • readBytesAsync

      default void readBytesAsync(long position, ByteBuffer target, Executor executor, ActionListener<Integer> listener)
      Async byte read into a caller-provided ByteBuffer.

      Avoids per-call allocation by reading directly into the target buffer. For heap-backed buffers, reads directly into the backing array. For direct buffers, falls back to allocating a temporary array.

      The buffer's position is advanced by the number of bytes read. The listener receives the number of bytes actually read.

      Parameters:
      position - the starting byte position in the storage object
      target - the ByteBuffer to read into; bytes are written starting at target.position()
      executor - executor for running the async operation
      listener - callback with the number of bytes read, or failure
    • readBytes

      default int readBytes(long position, ByteBuffer target) throws IOException
      Reads bytes from a specific position directly into a ByteBuffer.

      This is a positional, stateless read: each call specifies the position explicitly, so no mutable cursor state is needed. Providers override this to enable zero-copy I/O:

      • Local files: FileChannel.read(target, position) reads from OS page cache directly into both heap and direct buffers with no intermediate copies.
      • GCS: ReadChannel.read(target) reads natively into the ByteBuffer.
      • S3/HTTP/Azure: The default stream-based implementation is used; for heap-backed buffers it reads directly into the backing array, for direct buffers it uses a small chunked transfer buffer (8 KB) instead of allocating a full-size temporary array.

      The buffer's position is advanced by the number of bytes read.

      Parameters:
      position - the starting byte position in the storage object
      target - the ByteBuffer to read into; bytes are written starting at target.position()
      Returns:
      the number of bytes actually read, or -1 if the position is at or past end of content
      Throws:
      IOException - if an I/O error occurs
    • supportsNativeAsync

      default boolean supportsNativeAsync()
      Returns true if this object has native async support.

      Columnar formats (Parquet) can use this to determine whether to use readBytesAsync(long, long, java.util.concurrent.Executor, org.elasticsearch.action.ActionListener<java.nio.ByteBuffer>) for parallel chunk reads instead of sequential stream-based reads.

      Returns:
      true if readBytesAsync(long, long, java.util.concurrent.Executor, org.elasticsearch.action.ActionListener<java.nio.ByteBuffer>) has a native implementation, false if it uses the default sync wrapper