Class ApproximationPlan
java.lang.Object
org.elasticsearch.xpack.esql.approximation.ApproximationPlan
The approximation plan, that is substituted during logical plan optimization
in the rule
SubstituteApproximationPlan.
See the Javadocs of Approximation for more details.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA placeholder expression in the main approximation plan, that is replaced by the actual value after subplan execution. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intThe number of buckets to use for computing confidence intervals.static final StringThe column name for the bucket ID in the sampled aggregate.static final StringPrefix for certified column names in the approximation output.static final StringPrefix for confidence interval column names in the approximation output. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncolumnMetadata(Attribute column) Returns the_metamap for an approximation column, ornullif the column name does not match an approximation pattern.static LogicalPlanget(LogicalPlan logicalPlan, ApproximationSettings settings) Returns a plan that approximates the original plan and computes confidence intervals.static booleanis(LogicalPlan logicalPlan) Returns whether the logical plan is an approximation plan.static LogicalPlansubstituteSampleProbability(LogicalPlan logicalPlan, double sampleProbability) Substitutes theApproximationPlan.SampleProbabilityPlaceHolderin the approximation plan by the actual sample probability.
-
Field Details
-
BUCKET_ID_COLUMN_NAME
The column name for the bucket ID in the sampled aggregate. This is used to assign each sampled row to a bucket, to compute confidence intervals. -
CONFIDENCE_INTERVAL_COLUMN_PREFIX
Prefix for confidence interval column names in the approximation output.- See Also:
-
CERTIFIED_COLUMN_PREFIX
Prefix for certified column names in the approximation output.- See Also:
-
BUCKET_COUNT
public static final int BUCKET_COUNTThe number of buckets to use for computing confidence intervals.- See Also:
-
-
Constructor Details
-
ApproximationPlan
public ApproximationPlan()
-
-
Method Details
-
columnMetadata
Returns the_metamap for an approximation column, ornullif the column name does not match an approximation pattern. -
is
Returns whether the logical plan is an approximation plan. -
get
Returns a plan that approximates the original plan and computes confidence intervals. This approximation query consists of the following:- Source command
-
SAMPLEwith aApproximationPlan.SampleProbabilityPlaceHolderfor the sample probability - All commands before the
STATScommand -
EVALadding a new column with random bucket IDs for each trial -
STATScommand with:-
COUNTto track the sample size - Each aggregate function replaced by a sample-corrected version (if needed)
-
TRIAL_COUNT*BUCKET_COUNTadditional columns with a sampled values for each aggregate function, sample-corrected (if needed)
-
-
FILTERto remove all rows with a too small sample size - All commands after the
STATScommand, modified to also process the additional bucket columns where possible -
EVALto compute confidence intervals for all fields with buckets -
PROJECTto drop all non-output columns
is rewritten to (prob=sampleProbability, T=trialCount, B=bucketCount):FROM index | EVAL x = 2*x | STATS s = SUM(x) BY group | EVAL t = s*sDuring execution theFROM index | EVAL x = 2*x | EVAL bucketId = MV_APPEND(RANDOM(B), ... , RANDOM(B)) // T times | SAMPLED_STATS[SampleProbabilityPlaceHolder] sampleSize = COUNT(*), s = SUM(x), `s$0` = SUM(x) WHERE MV_SLICE(bucketId, 0, 0) == 0 ..., `s$T*B-1` = SUM(x) WHERE MV_SLICE(bucketId, T-1, T-1) == B-1 BY group | WHERE sampleSize >= MIN_ROW_COUNT_FOR_RESULT_INCLUSION / prob | EVAL t = s*s, `t$0` = `s$0`*`s$0`, ..., `t$T*B-1` = `s$T*B-1`*`s$T*B-1` | EVAL `CONFIDENCE_INTERVAL(s)` = CONFIDENCE_INTERVAL(s, MV_APPEND(`s$0`, ... `s$T*B-1`), T, B, 0.90), `CONFIDENCE_INTERVAL(t)` = CONFIDENCE_INTERVAL(t, MV_APPEND(`t$0`, ... `t$T*B-1`), T, B, 0.90) | KEEP s, t, `CONFIDENCE_INTERVAL(s)`, `CONFIDENCE_INTERVAL(t)`SAMPLED_STATSis replaced on the data node by either sampling the source rows and a normalSTATS(with sample corrections applied to intermediate state), or pushed down to Lucene without any sampling (if possible). -
substituteSampleProbability
public static LogicalPlan substituteSampleProbability(LogicalPlan logicalPlan, double sampleProbability) Substitutes theApproximationPlan.SampleProbabilityPlaceHolderin the approximation plan by the actual sample probability. If the sample probability is 1.0, the SampledAggregate is also replaced by a regular Aggregate.
-