- All Superinterfaces:
org.apache.lucene.util.Accountable,AutoCloseable,BlockLoader.Block,Closeable,org.elasticsearch.core.RefCounted,org.elasticsearch.core.Releasable,Writeable
- All Known Subinterfaces:
AggregateMetricDoubleBlock,BooleanBlock,BytesRefBlock,DoubleBlock,ExponentialHistogramBlock,FloatBlock,HistogramBlock,IntBlock,LongBlock,LongRangeBlock,TDigestBlock
- All Known Implementing Classes:
AbstractArrowBufBlock,AbstractDelegatingCompoundBlock,AggregateMetricDoubleArrayBlock,BooleanArrayBlock,BooleanArrowBufBlock,BooleanBigArrayBlock,BooleanVectorBlock,BytesRefArrayBlock,BytesRefArrowBufBlock,BytesRefVectorBlock,CompositeBlock,ConstantNullBlock,DocBlock,DoubleArrayBlock,DoubleArrowBufBlock,DoubleBigArrayBlock,DoubleVectorBlock,Float16ArrowBufBlock,FloatArrayBlock,FloatArrowBufBlock,FloatBigArrayBlock,FloatVectorBlock,Int16ArrowBufBlock,Int8ArrowBufBlock,IntArrayBlock,IntArrowBufBlock,IntBigArrayBlock,IntVectorBlock,LongArrayBlock,LongArrowBufBlock,LongBigArrayBlock,LongMul1kArrowBufBlock,LongRangeArrayBlock,LongVectorBlock,OrdinalBytesRefBlock,TDigestArrayBlock,UInt16ArrowBufBlock,UInt32ArrowBufBlock,UInt8ArrowBufBlock
position (row) count, and various data retrieval methods for
accessing the underlying data that is stored at a given position (IntBlock.getInt(int),
LongBlock.getLong(int), BytesRefBlock.getBytesRef(int, org.apache.lucene.util.BytesRef)).
Reading
The usual way to read a block looks like:
for (int p = 0; p < block.getPositionCount(); p++) {
int count = block.getValueCount(p);
switch (count) {
case 0 -> // Do stuff for nulls
case 1 -> {
// Do stuff with single valued data
int v = block.getInt(block.getFirstValueIndex(p));
...
}
default -> {
// Do stuff with multi-valued data
int first = block.getFirstValueIndex(p);
int end = first + count;
for (int i = first; i < end; i++) {
int v = block.getInt(i);
}
}
}
}
But that's a ton of work! It's quite common that the Block itself represents
dense data. In that case, it'll return non-null from asVector() which
are much faster and easier to iterate. So generally you'll see code like:
IntVector vector = block.asVector();
if (vector == null) {
// iterate the Block as above
} else {
// iterate the Vector as documented in Vector
}
Reference counted
Blocks are reference counted. The JVM itself manages the pointers and GCs
as soon as there are no pointers, but we also maintain a reference count so we can
decrement a CircuitBreaker when we no longer reference the Block.
When you build a Block it's reference counter is set to 1. If you want
to return a few copies of the Block in the same Page you should
RefCounted.incRef() it.
When a Block is unused it's refs must be decremented to 0. You do that
with RefCounted.decRef() or Releasable.close(). Those two methods are the same, but folks
generally use Block in try-with-resources like:
try (
IntBlock lhs = lhsEval.eval(page);
IntBlock rhs = rhsEval.eval(page);
IntBlock.Builder builder = blockFactory.newIntBlockBuilder(lhs.getPositionCount());
) {
for (int p = 0; p < lhs.getPositionCount(); p++) {
// do stuff
}
return
}
lhsEval.eval(page) returns a Block. If it builds the Block
on the fly it'll have a reference count of 1 and the Releasable.close() called by
the try-with-resources will discard it. If it is read from the Page then the
read process will RefCounted.incRef() and the Releasable.close() will just decrement the counter.
Thread safety
Blocks are immutable.
Blocks can be passed between threads as long as no two threads hold a reference
to the Block at the same time. That's important because Driver can
shift from thread to thread while it is running.
To pass a Block to another Driver, you must first call
allowPassingToDifferentDriver().
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic interfaceBuildsBlocks.static enumHow are multivalued fields ordered? Some operators can enable its optimization when mv_values are sorted ascending or de-duplicated.Nested classes/interfaces inherited from interface org.elasticsearch.common.io.stream.Writeable
Writeable.Reader<V>, Writeable.Writer<V> -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final TransportVersionstatic final longThe maximum number of values that can be added to one position via lookup.static final intWe do not track memory for pages directly (only for single blocks), but the page memory overhead can still be significant, especially for pages containing thousands of blocks.static final bytestatic final bytestatic final bytestatic final byteSerialization type for blocks: 0 and 1 replace false/true used in pre-8.14static final byteFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLEFields inherited from interface org.elasticsearch.core.RefCounted
ALWAYS_REFERENCED -
Method Summary
Modifier and TypeMethodDescriptionvoidBefore passing a Block to another Driver, it is necessary to switch the owning block factory to its parent, which is associated with the global circuit breaker.booleanasVector()Returns an efficient dense single-value view of this block.The block factory associated with this block.deepCopy(BlockFactory blockFactory) Make a deep copy of thisBlockusing the providedBlockFactory, likely copying all data.booleanDoes this block have multivalued fields? UnlikemayHaveMultivaluedFields()this will never return a false positive.Returns the element type of this block.expand()Expand multivalued fields into one row per value.filter(boolean mayContainDuplicates, int... positions) Creates a new block that only exposes the positions provided.intgetFirstValueIndex(int position) Returns the index of the first value for the given position.intReturns the number of positions (rows) in this block.intReturns the total number of values in this block not counting nulls.intgetValueCount(int position) Returns the number of values for the given position.default BlockinsertNulls(IntVector before) booleanisNull(int position) booleanTells if this block has been released.keepMask(BooleanVector mask) org.elasticsearch.core.ReleasableIterator<? extends Block> lookup(IntBlock positions, ByteSizeValue targetBlockSize) Builds an Iterator of newBlocks with the sameelementType()as this Block whose values are copied from positions in this Block.booleanCan this block have multivalued fields? Blocks that returnfalsewill never return more than one fromgetValueCount(int).booleandefault booleanAre multivalued fields de-duplicated in each positionHow are multivalued fields ordered?default booleanAre multivalued fields sorted ascending in each positionstatic BlockReads the block type and then the block data from a stream input This should be paired withwriteTypedBlock(Block, StreamOutput)slice(int beginInclusive, int endExclusive) static booleanvoidwriteTo(StreamOutput out) Writes only the data of the block to a stream output.static voidwriteTypedBlock(Block block, StreamOutput out) Writes the type of the block followed by the block data to a stream output.Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources, ramBytesUsedMethods inherited from interface org.elasticsearch.core.RefCounted
decRef, hasReferences, incRef, mustIncRef, tryIncRefMethods inherited from interface org.elasticsearch.core.Releasable
close
-
Field Details
-
ESQL_AGGREGATE_METRIC_DOUBLE_BLOCK
-
MAX_LOOKUP
static final long MAX_LOOKUPThe maximum number of values that can be added to one position via lookup. TODO maybe make this everywhere?- See Also:
-
PAGE_MEM_OVERHEAD_PER_BLOCK
static final int PAGE_MEM_OVERHEAD_PER_BLOCKWe do not track memory for pages directly (only for single blocks), but the page memory overhead can still be significant, especially for pages containing thousands of blocks. For now, we approximate this overhead, per block, using this value. The exact overhead per block would be (more correctly)RamUsageEstimator.NUM_BYTES_OBJECT_REF, but we approximate it withRamUsageEstimator.NUM_BYTES_OBJECT_ALIGNMENTto avoid further alignments to object size (at the end of the alignment, it would make no practical difference). We uplift it* 4based on experiments with many small pages. -
SERIALIZE_BLOCK_VALUES
static final byte SERIALIZE_BLOCK_VALUESSerialization type for blocks: 0 and 1 replace false/true used in pre-8.14- See Also:
-
SERIALIZE_BLOCK_VECTOR
static final byte SERIALIZE_BLOCK_VECTOR- See Also:
-
SERIALIZE_BLOCK_ARRAY
static final byte SERIALIZE_BLOCK_ARRAY- See Also:
-
SERIALIZE_BLOCK_BIG_ARRAY
static final byte SERIALIZE_BLOCK_BIG_ARRAY- See Also:
-
SERIALIZE_BLOCK_ORDINAL
static final byte SERIALIZE_BLOCK_ORDINAL- See Also:
-
-
Method Details
-
asVector
Vector asVector()Returns an efficient dense single-value view of this block. Null, if the block is not dense single-valued. That is, if mayHaveNulls returns true, or getTotalValueCount is not equal to getPositionCount.- Returns:
- an efficient dense single-value view of this block
-
getPositionCount
int getPositionCount()Returns the number of positions (rows) in this block. See class javadoc for the usual way to iterate these positions.- Returns:
- the number of positions (rows) in this block
-
getFirstValueIndex
int getFirstValueIndex(int position) Returns the index of the first value for the given position. See class javadoc for the usual way to iterate these positions.For densely packed data this will return its parameter unchanged. For fields with
nullvalues or multivalued fields, this will shift. Here's an example:0 <---+ 1 | Values at first position 2 | 3 <---+ 5 <---- Value at second position 6 <---+ Values at third position 7 <---+This represents three rows. The first has the value
[0, 1, 2, 3]. The second has the value5. The third has the value[6, 7]. This method will return0for the first position,4for the second, and5for the third.- Returns:
- the index of the first value for the given position
-
getValueCount
int getValueCount(int position) Returns the number of values for the given position. See class javadoc for the usual way to iterate these positions.For densely packed data this will return
1. Fornulls this will return0. For multivalued fields, this will return the number of values. Here's an example:0 <---+ 1 | Values at first position 2 | 3 <---+ 5 <---- Value at second position 6 <---+ Values at third position 7 <---+This represents three rows. The first has the value
[0, 1, 2, 3]. The second has the value5. The third has the value[6, 7]. This method will return4for the first position,1for the second, and2for the third.- Returns:
- the number of values for the given position
-
getTotalValueCount
int getTotalValueCount()Returns the total number of values in this block not counting nulls. This powers theCOUNTaggregation and is used to report the number of fields loaded by ESQL.- Returns:
- the total number of values in this block not counting nulls
-
elementType
ElementType elementType()Returns the element type of this block.- Returns:
- the element type of this block
-
blockFactory
BlockFactory blockFactory()The block factory associated with this block. -
allowPassingToDifferentDriver
void allowPassingToDifferentDriver()Before passing a Block to another Driver, it is necessary to switch the owning block factory to its parent, which is associated with the global circuit breaker. This ensures that when the new driver releases this Block, it returns memory directly to the parent block factory instead of the local block factory of this Block. This is important because the local block factory is not thread safe and doesn't support simultaneous access by more than one thread. -
isReleased
boolean isReleased()Tells if this block has been released. A block is released by calling itsReleasable.close()orRefCounted.decRef()methods.- Returns:
- true iff the block's reference count is zero.
-
isNull
boolean isNull(int position) - Parameters:
position- the position- Returns:
- true if the value stored at the given position is null, false otherwise
-
mayHaveNulls
boolean mayHaveNulls()- Returns:
- true if some values might be null. False, if all values are guaranteed to be not null.
-
areAllValuesNull
boolean areAllValuesNull()- Returns:
- true if all values in this block are guaranteed to be null.
-
mayHaveMultivaluedFields
boolean mayHaveMultivaluedFields()Can this block have multivalued fields? Blocks that returnfalsewill never return more than one fromgetValueCount(int). This may returntruefor Blocks that do not have multivalued fields, but it will always answer quickly. -
doesHaveMultivaluedFields
boolean doesHaveMultivaluedFields()Does this block have multivalued fields? UnlikemayHaveMultivaluedFields()this will never return a false positive. In other words, if this returnstruethen there are positions for whichgetValueCount(int)will return more than 1. This will answer quickly if it can but may have to check all positions. -
filter
Creates a new block that only exposes the positions provided.- Parameters:
mayContainDuplicates- may the positions array contain duplicate positions?positions- the positions to retain- Returns:
- a filtered block
-
keepMask
-
lookup
org.elasticsearch.core.ReleasableIterator<? extends Block> lookup(IntBlock positions, ByteSizeValue targetBlockSize) Builds an Iterator of newBlocks with the sameelementType()as this Block whose values are copied from positions in this Block. It has the same number ofpositionsas thepositionsparameter.For example, if this block contained
[a, b, [b, c]]and were called with the block[0, 1, 1, [1, 2]]then the result would be[a, b, b, [b, b, c]].This process produces
count(this) * count(positions)values per positions which could be quite large. Instead of returning a single Block, this returns an Iterator of Blocks containing all of the promised values.The returned
ReleasableIteratormay retain a reference to thepositionsparameter. Close it to release those references.This block is built using the same
BlockFactoryas was used to build thepositionsparameter. -
mvOrdering
Block.MvOrdering mvOrdering()How are multivalued fields ordered? -
mvDeduplicated
default boolean mvDeduplicated()Are multivalued fields de-duplicated in each position -
mvSortedAscending
default boolean mvSortedAscending()Are multivalued fields sorted ascending in each position -
slice
Return a subset of thisBlockfrom positionbeginInclusiveto positionendExclusive. This may return the same instance if the range covers all positions, but if it does it willRefCounted.incRef()it.NOTE: Implementations will not try to optimize zero length slices as we expect them to be rare.
-
expand
Block expand()Expand multivalued fields into one row per value. Returns the same block if there aren't any multivalued fields to expand. The returned block needs to be closed by the caller to release the block's resources. -
insertNulls
Build aBlockwith anullinsertedbeforeeach listed position.Note:
beforemust be non-decreasing. -
deepCopy
Make a deep copy of thisBlockusing the providedBlockFactory, likely copying all data. -
writeTo
Writes only the data of the block to a stream output. This method should be used when the type of the block is known during reading.- Specified by:
writeToin interfaceWriteable- Throws:
IOException
-
writeTypedBlock
Writes the type of the block followed by the block data to a stream output. This should be paired withreadTypedBlock(BlockStreamInput)- Throws:
IOException
-
readTypedBlock
Reads the block type and then the block data from a stream input This should be paired withwriteTypedBlock(Block, StreamOutput)- Throws:
IOException
-
supportsAggregateMetricDoubleBlock
-