Class PlannerSettings

java.lang.Object
org.elasticsearch.xpack.esql.planner.PlannerSettings

public class PlannerSettings extends Object
Values for cluster level settings used in physical planning.
  • Field Details

    • DEFAULT_DATA_PARTITIONING

      public static final Setting<DataPartitioning> DEFAULT_DATA_PARTITIONING
    • DOC_THRESHOLD_AUTO_PARTITIONING

      public static final Setting<Integer> DOC_THRESHOLD_AUTO_PARTITIONING
      The minimum number of documents in a shard before we select the DataPartitioning.AutoStrategy for DataPartitioning.AUTO the default of DEFAULT_DATA_PARTITIONING. For shards with documents below the threshold the DataPartitioning.SHARD will be used for DataPartitioning.AUTO.
    • VALUES_LOADING_JUMBO_SIZE

      public static final Setting<ByteSizeValue> VALUES_LOADING_JUMBO_SIZE
    • LUCENE_TOPN_LIMIT

      public static final Setting<Integer> LUCENE_TOPN_LIMIT
    • INTERMEDIATE_LOCAL_RELATION_MAX_SIZE

      public static final Setting<ByteSizeValue> INTERMEDIATE_LOCAL_RELATION_MAX_SIZE
    • REDUCTION_LATE_MATERIALIZATION

      public static final Setting<Boolean> REDUCTION_LATE_MATERIALIZATION
    • BLOCK_LOADER_SIZE_SCRIPT

      public static final Setting<ByteSizeValue> BLOCK_LOADER_SIZE_SCRIPT
      Circuit breaker space reserved for each script BlockLoader.Reader. The default is pretty poor estimate for the overhead of the script, but it'll do for now. We're estimating 100kb for loading ordinals from doc values and 2kb for loading numbers from doc values. This 300kb is sort of a shrug because we don't know what the script will do, and we don't know how many doc values it'll load. And, we're not sure much memory the script itself will actually use.
    • BLOCK_LOADER_SIZE_ORDINALS

      public static final Setting<ByteSizeValue> BLOCK_LOADER_SIZE_ORDINALS
      Circuit breaker space reserved for each ordinals BlockLoader.Reader. Measured in heap dumps from 3.5kb to 65kb. This is an intentional overestimate.
    • PARTIAL_AGGREGATION_EMIT_KEYS_THRESHOLD

      public static final Setting<Integer> PARTIAL_AGGREGATION_EMIT_KEYS_THRESHOLD
      The threshold number of grouping keys for a partial aggregation to start emitting intermediate results early. While emitting partial results can reduce memory pressure and allow for incremental downstream processing, it might emit the same keys multiple times, incurring serialization and network overhead. This setting, in conjunction with PARTIAL_AGGREGATION_EMIT_UNIQUENESS_THRESHOLD, helps mitigate these costs by only triggering early emission when a significant number of keys have been collected and most are unique, thus lowering the probability of re-emitting the same keys.

      NOTE that the defaults are chosen somewhat arbitrarily but are partially based on other systems. Other systems sometimes default to a lower threshold (e.g., 10,000) without a uniqueness threshold. We may lower these defaults after benchmarking more use cases.

    • PARTIAL_AGGREGATION_EMIT_UNIQUENESS_THRESHOLD

      public static final Setting<Double> PARTIAL_AGGREGATION_EMIT_UNIQUENESS_THRESHOLD
      The uniqueness threshold of grouping keys for partial aggregation to start emitting keys early. This threshold controls the trade-off between the benefits of early emission and the costs of repeated serialization and network transfer of the same keys. A higher uniqueness ratio ensures early emission only if keys are not repeatedly seen in incoming data and are unlikely to appear again in future data.
    • REUSE_COLUMN_LOADERS_THRESHOLD

      public static final Setting<Integer> REUSE_COLUMN_LOADERS_THRESHOLD
      If we're loading more than this many fields at a time we discard column loaders after each page regardless of whether we can reuse them. They have significant per-field memory overhead so discarding them between pages allows some queries that would have OOMed to succeed. Usually the paths that need very high performance don't load more than a handful of fields at a time, so they do reuse fields.
    • MAX_KEYWORD_SORT_FIELDS

      public static final Setting<Integer> MAX_KEYWORD_SORT_FIELDS
      Maximum number of keyword sort fields allowed when pushing TopN to Lucene. Sorting on many keyword fields in Lucene can be expensive. When exceeded, the sort falls back to the compute engine.
    • SOURCE_RESERVATION_FACTOR

      public static final Setting<Double> SOURCE_RESERVATION_FACTOR
      Multiplier applied to lastKnownSourceSize to pre-reserve memory on the circuit breaker before loading _source. The source loading path creates large untracked allocations: scratch, SourceFilter.filterBytes() and JSON parsing creates, heap dump has shown about 8x of the actual source size in untracked memory at peak. 10x is an overestimation to prevent crashes.
    • BYTES_REF_RAM_OVERESTIMATE_THRESHOLD

      public static final Setting<ByteSizeValue> BYTES_REF_RAM_OVERESTIMATE_THRESHOLD
      When a BytesRefArrayVector's average value length exceeds this size, the RAM estimate is multiplied by BYTES_REF_RAM_OVERESTIMATE_FACTOR to account for untracked overhead in large byte arrays. The untracked overhead may come from loading large text fields from _source.
    • BYTES_REF_RAM_OVERESTIMATE_FACTOR

      public static final Setting<Double> BYTES_REF_RAM_OVERESTIMATE_FACTOR
      Multiplier applied to the RAM estimate of a BytesRefArrayVector whose backing values exceed BYTES_REF_RAM_OVERESTIMATE_THRESHOLD.
    • DOC_SEQUENCE_BYTES_REF_FIELD_THRESHOLD

      public static final Setting<Integer> DOC_SEQUENCE_BYTES_REF_FIELD_THRESHOLD
      When loading from a multi-leaf doc vector that maps to a single shard and segment, the reader switches to a doc-sequential iteration order if the number of BYTES_REF fields exceeds this threshold. The doc-sequential path avoids the expensive backwards reorder and supports partial-page splitting bounded by jumboBytes.
    • DEFAULTS

      public static final PlannerSettings DEFAULTS
      Defaults.
  • Constructor Details

    • PlannerSettings

      public PlannerSettings(DataPartitioning defaultDataPartitioning, int docsThresholdForAutoPartitioning, ByteSizeValue valuesLoadingJumboSize, int luceneTopNLimit, ByteSizeValue intermediateLocalRelationMaxSize, int partialEmitKeysThreshold, double partialEmitUniquenessThreshold, int reuseColumnLoadersThreshold, ByteSizeValue blockLoaderSizeOrdinals, ByteSizeValue blockLoaderSizeScript, int maxKeywordSortFields, double sourceReservationFactor, ByteSizeValue bytesRefRamOverestimateThreshold, double bytesRefRamOverestimateFactor, int docSequenceBytesRefFieldThreshold)
      Create.
  • Method Details

    • settings

      public static List<Setting<?>> settings()
    • defaultDataPartitioning

      public PlannerSettings defaultDataPartitioning(DataPartitioning defaultDataPartitioning)
    • defaultDataPartitioning

      public DataPartitioning defaultDataPartitioning()
    • valuesLoadingJumboSize

      public PlannerSettings valuesLoadingJumboSize(ByteSizeValue valuesLoadingJumboSize)
    • valuesLoadingJumboSize

      public ByteSizeValue valuesLoadingJumboSize()
    • luceneTopNLimit

      public PlannerSettings luceneTopNLimit(int luceneTopNLimit)
    • luceneTopNLimit

      public int luceneTopNLimit()
      Maximum LIMIT that we're willing to push to Lucene's topn.

      Lucene's topn code was designed for search which typically fetches 10 or 30 or 50 or 100 or 1000 documents. That's as many you want on a page, and that's what it's designed for. But if you go to, say, page 10, Lucene implements this as a search for page_size * page_number docs and then materializes only the last page_size documents. Traditionally, Elasticsearch limits that page_size * page_number which it calls the "result window". So! ESQL defaults to the same default - 10,000.

    • intermediateLocalRelationMaxSize

      public PlannerSettings intermediateLocalRelationMaxSize(ByteSizeValue intermediateLocalRelationMaxSize)
    • intermediateLocalRelationMaxSize

      public ByteSizeValue intermediateLocalRelationMaxSize()
    • partialEmitKeysThreshold

      public PlannerSettings partialEmitKeysThreshold(int partialEmitKeysThreshold)
    • partialEmitKeysThreshold

      public int partialEmitKeysThreshold()
    • partialEmitUniquenessThreshold

      public PlannerSettings partialEmitUniquenessThreshold(double partialEmitUniquenessThreshold)
    • partialEmitUniquenessThreshold

      public double partialEmitUniquenessThreshold()
    • reuseColumnLoadersThreshold

      public PlannerSettings reuseColumnLoadersThreshold(int reuseColumnLoadersThreshold)
    • reuseColumnLoadersThreshold

      public int reuseColumnLoadersThreshold()
      If we're loading more than this many fields at a time we discard column loaders after each page regardless of whether we can reuse them. They have significant per-field memory overhead so discarding them between pages allows some queries that would have OOMed to succeed. Usually the paths that need very high performance don't load more than a handful of fields at a time, so they do reuse fields.
    • blockLoaderSizeOrdinals

      public PlannerSettings blockLoaderSizeOrdinals(ByteSizeValue blockLoaderSizeOrdinals)
    • blockLoaderSizeOrdinals

      public ByteSizeValue blockLoaderSizeOrdinals()
      Circuit breaker space reserved for each ordinals BlockLoader.Reader.
    • blockLoaderSizeScript

      public PlannerSettings blockLoaderSizeScript(ByteSizeValue blockLoaderSizeScript)
    • blockLoaderSizeScript

      public ByteSizeValue blockLoaderSizeScript()
      Circuit breaker space reserved for each script BlockLoader.Reader.
    • maxKeywordSortFields

      public PlannerSettings maxKeywordSortFields(int maxKeywordSortFields)
    • maxKeywordSortFields

      public int maxKeywordSortFields()
    • sourceReservationFactor

      public PlannerSettings sourceReservationFactor(double sourceReservationFactor)
    • sourceReservationFactor

      public double sourceReservationFactor()
    • bytesRefRamOverestimateThreshold

      public PlannerSettings bytesRefRamOverestimateThreshold(ByteSizeValue bytesRefRamOverestimateThreshold)
    • bytesRefRamOverestimateThreshold

      public ByteSizeValue bytesRefRamOverestimateThreshold()
    • bytesRefRamOverestimateFactor

      public PlannerSettings bytesRefRamOverestimateFactor(double bytesRefRamOverestimateFactor)
    • bytesRefRamOverestimateFactor

      public double bytesRefRamOverestimateFactor()
    • docSequenceBytesRefFieldThreshold

      public PlannerSettings docSequenceBytesRefFieldThreshold(int docSequenceBytesRefFieldThreshold)
    • docSequenceBytesRefFieldThreshold

      public int docSequenceBytesRefFieldThreshold()
    • docsThresholdForAutoPartitioning

      public PlannerSettings docsThresholdForAutoPartitioning(int docsThresholdForAutoPartitioning)
    • docsThresholdForAutoPartitioning

      public int docsThresholdForAutoPartitioning()