Class CoalescedSplit

java.lang.Object
org.elasticsearch.xpack.esql.datasources.CoalescedSplit
All Implemented Interfaces:
NamedWriteable, Writeable, ExternalSplit

public class CoalescedSplit extends Object implements ExternalSplit
A composite split that groups multiple child splits into a single scheduling unit. Reduces per-split overhead when thousands of tiny files (e.g. Iceberg micro-partitions) would otherwise each become an independent work item. Operators that encounter a CoalescedSplit iterate over its children and process each one individually.

Schema mapping duplication: when schema reconciliation is active, each child FileSplit carries its own SchemaReconciliation.ColumnMapping. On the coordinator all splits from the same file share a single object reference (see FileSplitProvider), but each is serialized independently on the wire. SplitCoalescer groups splits by size, not by file, so a single CoalescedSplit may contain children from multiple files with different mappings. To eliminate the duplication on the wire, one of these approaches could be used in a follow-up:

  • Dedup table: add a List<ColumnMapping> table here; each child writes a 1-byte index into the table instead of the full mapping. Cleanest wire saving but couples this generic container to schema-specific types.
  • Group-by-file coalescing: modify SplitCoalescer to group splits by file first, so each CoalescedSplit has a single shared mapping. Simple but may reduce bin-packing quality.
  • Post-deser dedup: after deserializing children, replace content-equal mappings with a single instance. Saves heap but not wire bytes.
For typical schemas (< 200 columns) and split counts (< 50 per file), the per-split mapping overhead is well under 1 KB, making this a low-priority optimisation relative to the multi-MB data payloads.