Class Approximation
A query is currently suitable for approximation if:
- it contains exactly one
STATScommand - the other processing commands are from the supported set
(
SUPPORTED_COMMANDS); this set contains almost all unary commands, but most notably notFORKorJOIN. - the aggregate functions are from the supported set
(
SUPPORTED_SINGLE_VALUED_AGGSandSUPPORTED_MULTIVALUED_AGGS)
When these conditions are met, the query is replaced by an approximation query
that samples documents before the STATS, and extrapolates the aggregate
functions if needed. This new logical plan is generated by
ApproximationPlan.get(org.elasticsearch.xpack.esql.plan.logical.LogicalPlan, org.elasticsearch.xpack.esql.approximation.ApproximationSettings). The substitution of the original query by the
approximation query happens during logical plan optimization, in the rule
SubstituteApproximationPlan.
In addition to approximate results, confidence intervals are also computed.
This is done by dividing the sampled rows ApproximationPlan.TRIAL_COUNT
times into ApproximationPlan.BUCKET_COUNT buckets, computing the aggregate
functions for each bucket, and using these sampled values to compute
confidence intervals with the bias-corrected and accelerated (BCa) bootstrap
method, see also ConfidenceInterval.
The initial approximation plan contains a placeholder for the sample probability,
which is determined during subplan execution, and is based on results set size.
To obtain an appropriate sample probability, first a target number of rows
is set. This is determined via the ApproximationSettings.
Next, the total number of rows in the source index is counted via the subplan
sourceCountSubPlan(). This plan always executes fast. When
there are no commands that can change the number of rows, the sample
probability can be directly computed as a ratio of the target number of rows
and this total number.
In the presence of commands that can change the number of rows (e.g. filtering),
another step is needed. The first goal is to find a sample probability that
leads to approximately ROW_COUNT_FOR_COUNT_ESTIMATION rows,
and when this probability is found, a sample probability leading to the target
number of rows is computed.
This is done by setting the initial sample probability to the ratio of
ROW_COUNT_FOR_COUNT_ESTIMATION and the total number
of rows in the source index, and a number of rows is sampled with the subplan
countSubPlan(double). As long as the sampled number of rows is too
small, the probability is increased until a good probability is reached. This
final probability is used to compute the probability using for approximating
the original query.
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionstatic Approximationcreate(LogicalPlan logicalPlan, ApproximationSettings approximationSettings) Creates an Approximation object for a logical plan if it's an approximation plan, and returns null otherwise.Returns the first subplan to execute for approximation, or null if the main plan can be executed directly.newMainPlan(Result result) Returns the new main plan to execute for approximation after executing a subplan, based on the result of the subplan.verifyPlan(LogicalPlan logicalPlan) Verifies that a plan is suitable for approximation.
-
Method Details
-
create
public static Approximation create(LogicalPlan logicalPlan, ApproximationSettings approximationSettings) Creates an Approximation object for a logical plan if it's an approximation plan, and returns null otherwise. -
verifyPlan
Verifies that a plan is suitable for approximation.- Returns:
- the query properties relevant for approximation if it's suitable, or null otherwise Adds warning headers as a side effect when the plan is not suitable
-
firstSubPlan
Returns the first subplan to execute for approximation, or null if the main plan can be executed directly. -
newMainPlan
Returns the new main plan to execute for approximation after executing a subplan, based on the result of the subplan.
-