Package com.qfs.chunk

Interface IChunk<K>

Type Parameters:
K - the type of data stored by the chunk
All Superinterfaces:
IArray, IArrayReader, IArrayWriter, IMemoryMonitored, IWritableArray
All Known Subinterfaces:
ICanCollectVectors<K>, IChunkBoolean, IChunkComposite, IChunkDouble, IChunkFloat, IChunkInteger, IChunkLong, IChunkNullable<K>, IChunkPositiveInteger, IChunkPrimitiveInteger, IChunkSingleValue<K>, IConcurrentChunkInteger, IConcurrentChunkLong, IDecoratedChunk<D,V>, IFrequentValueChunk<K>, IVectorChunk
All Known Implementing Classes:
AChunk, AChunkOffset, AChunkPositiveInteger, AChunkPrimitiveInteger, ADirectChunk, ADirectChunkNullable, ADirectChunkPrimitiveInteger, ADirectVectorBlock, ArrayChunkBits, ArrayChunkBoolean, ArrayChunkBytes, ArrayChunkDouble, ArrayChunkDoubleNullable, ArrayChunkFloat, ArrayChunkFloatNullable, ArrayChunkHexa, ArrayChunkInteger, ArrayChunkIntegerNullable, ArrayChunkLong, ArrayChunkLongNullable, ArrayChunkPositiveInteger, ArrayChunkQuad, ArrayChunkShorts, ArrayChunkTriBytes, BufferChunkBoolean, BufferChunkDouble, BufferChunkDoubleNullable, BufferChunkFloat, BufferChunkFloatNullable, BufferChunkLong, BufferChunkLongNullable, BufferChunkPositiveInteger, BufferChunkPrimitiveBits, BufferChunkPrimitiveBytes, BufferChunkPrimitiveHexa, BufferChunkPrimitiveInts, BufferChunkPrimitiveIntsNullable, BufferChunkPrimitiveQuad, BufferChunkPrimitiveShorts, BufferChunkPrimitiveTriBytes, ChunkComposite, ChunkCompositeSparse, ChunkDictionary, ChunkHistory, ChunkHistory.SparseChunkHistory, ChunkMarked, ChunkMarkedVector, ChunkObject, ChunkOffsetInteger, ChunkOffsetLong, ChunkSingleBoolean, ChunkSingleDouble, ChunkSingleFloat, ChunkSingleInteger, ChunkSingleLong, ChunkSingleObject, ChunkSingleTimestamp, ChunkSingleVector, ChunkVector, DirectChunkBits, DirectChunkBoolean, DirectChunkBytes, DirectChunkDouble, DirectChunkDoubleNullable, DirectChunkFloat, DirectChunkFloatNullable, DirectChunkHexa, DirectChunkInteger, DirectChunkIntegerNullable, DirectChunkLong, DirectChunkLongNullable, DirectChunkPositiveInteger, DirectChunkQuad, DirectChunkShorts, DirectChunkTriBytes, DirectDoubleVectorBlock, DirectFloatVectorBlock, DirectIntegerVectorBlock, DirectLongVectorBlock, EmptyChunk, FrequentDoubleChunk, FrequentFloatChunk, FrequentIntegerChunk, FrequentLongChunk, FrequentNullDoubleChunk, FrequentNullFloatChunk, FrequentNullIntegerChunk, FrequentNullLongChunk, FrequentObjectChunk, NullableFrequentDoubleChunk, NullableFrequentFloatChunk, NullableFrequentIntegerChunk, NullableFrequentLongChunk, SparseChunk, TombStoneChunk, WrapperChunkInteger

public interface IChunk<K> extends IWritableArray
A chunk of data within a column. The column delegates read and write operations to its chunks.

Various implementations of the chunk interface can handle various kinds of data, including space efficient primitives.

The implementation of a chunk must allow multiple concurrent readers to read the data while one single writer writes in the chunk. Multiple concurrent writers is not by default supported by chunk implementations.

Author:
ActiveViam
  • Field Details

    • NO_SIZE_LIMIT

      static final int NO_SIZE_LIMIT
      Special value to pass as a limit not to set a limit when scanning values in a chunk.
      See Also:
  • Method Details

    • getChunkId

      long getChunkId()
      Gets the unique ID of this chunk.
      Returns:
      the unique ID of this chunk
    • capacity

      int capacity()
      Returns the (fixed) capacity of the chunk (number of elements it can store).
      Returns:
      the (fixed) capacity of the chunk
    • getBindingType

      int getBindingType()
      Gets the type for binding transfers of this chunk.
      Returns:
      the physical type like Types.PHYSICAL_INT, etc
    • read

      K read(int position)
      Returns the data stored at that position in the chunk.
      Specified by:
      read in interface IArrayReader
      Parameters:
      position - the position in the chunk at which to read
      Returns:
      the data stored at that position in the chunk
    • copyInto

      void copyInto(int position, IWritableCell cell)
      Copies the value at a given position in this chunk into the provided cell.
      Parameters:
      position - row of the chunk to read
    • writeFromCell

      void writeFromCell(int position, IReadableCell cell)
      Writes the content of the cell into the chunk at the given position.

      It is up to chunks to use the appropriate read method to avoid boxing as much as possible.

    • transfer

      void transfer(IWritableTable destination, int destinationColumn, int[] rowsMapping, int numRows)
      Transfers a given number of rows from this chunk to target rows in the given column of the destination table.

      The mapping from the source rows to the target rows is described with an array of 2*n slots. Each even slot contains the index of the row to be copied from the source chunk and the following odd slot contains the index of the row into which the source data will be transferred.

      Parameters:
      destination - the destination table
      destinationColumn - the column to transfer the data to in the destination table
      rowsMapping - the mapping from source rows to target rows, as explained above
      numRows - the number of rows to transfer
    • reset

      void reset()
      Resets the chunk before it is reused.

      Warning
      Chunks holding references to objects will be nullified to avoid getting in the way of the garbage collector, but chunks holding primitive data will be left untouched. Users cannot expect values to be "zeroed" by calling reset.

    • destroy

      Runnable destroy()
      Equivalent to a finalizer for this chunk, but that can be called by the application when it is certain that the chunk will not be used or even reached anymore.

      Chunk implementations can use this opportunity to dereference objects, or even free memory in the case of direct memory chunks. For that reason it is unsafe to call destroy() and it is possible to crash the JVM if the chunk is accessed after its memory has been reclaimed.

      Most often this Runnable is registered to IActiveCollector.register(Runnable) to make sure that the destruction is safe.

      Returns:
      the Runnable to run to actually free the data
    • isPrimitive

      boolean isPrimitive()
      Returns whether this chunk contains primitive data, as per Types.isPrimitive(int).
      Returns:
      true if this chunk contains primitive data, as per Types.isPrimitive(int), false otherwise
    • isDead

      boolean isDead()
      Whether this chunk is dead.

      For a chunk, being "dead" means to have all its elements removed. Datastore being an "append-only" structure, the data storage of dead chunks may be safely freed immediately after the removal of the last row. But for the same reason dead chunks are still kept inside record sets and their metadata is kept available. For example, dead chunks should still keep track of their capacity, even though they hold no real data.

      The dead chunks mostly appear as a result of the following 3 operations:

      • Data compression
      • Manual IEpochHistory#forceDiscardEpochs(Predicate) applied to the datastore history
      • Structural transactions that modify record set content
      Consequently, even though dead chunks should never be read or written to, it is totally OK to encounter them during a record set scan. In other words, data compression or partial discard of data should not prevent users from scanning the record set.

      Every method scanning the chunks is expected to perform the required checks to properly skip the dead chunks. Such methods should not throw when the dead chunk is encountered.

      Returns:
      true if this chunk is dead, false otherwise
    • findRowsEqualTo

      void findRowsEqualTo(Object value, long epoch, IChunkLong version, int offset, int chunkLimit, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with the sought object in the chunk.
      Parameters:
      value - the sought object
      epoch - the epoch will define which rows are read. Only rows created before or deleted after this epoch will be read
      version - the column holding the version number at which each row has been inserted or removed
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      chunkLimit - the number of rows to iterate in this chunk, it should not be bigger than the chunk size
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with the sought object
    • findRowsEqualTo

      void findRowsEqualTo(Object value, int offset, int chunkLimit, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with the sought object in the chunk.
      Parameters:
      value - the sought object
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      chunkLimit - the number of rows to iterate in this chunk, it should not be bigger than the chunk size
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with the sought object
    • findRowsEqualTo

      void findRowsEqualTo(Object value, IIntIterator filter, int offset, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with the sought object in the chunk.
      Parameters:
      value - the sought object
      filter - all the row numbers that should be considered for this research
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with the sought object
    • findRowsInSet

      void findRowsInSet(Set<Object> values, long epoch, IChunkLong version, int offset, int chunkLimit, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with any object in the set of sought objects in the chunk.
      Parameters:
      values - the set of sought objects
      epoch - the epoch will define which rows are read, only rows created before or deleted after this epoch will be read
      version - the column holding the version number at which each row has been inserted or removed
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      chunkLimit - the number of rows to iterate in this chunk, it should not be bigger than the chunk size
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with any object in the set of sought objects
    • findRowsInSet

      void findRowsInSet(Set<Object> values, int offset, int chunkLimit, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with any object in the set of sought objects in the chunk.
      Parameters:
      values - the set of sought objects
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      chunkLimit - the number of rows to iterate in this chunk, it should not be bigger than the chunk size
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with any object in the set of sought objects
    • findRowsInSet

      void findRowsInSet(Set<Object> values, IIntIterator filter, int offset, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with any object in the set of sought objects in the chunk.
      Parameters:
      values - the set of sought objects
      filter - all the row numbers that should be considered for this research
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with any object in the set of sought objects
    • findRowsInTransactionEqualTo

      void findRowsInTransactionEqualTo(Object value, IChunkLong version, int offset, int chunkLimit, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with the sought object in the chunk.
      Parameters:
      value - the sought object
      version - the column holding the version number at which each row has been inserted or removed corresponding to this chunk
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      chunkLimit - the number of rows to iterate in this chunk, it should not be bigger than the chunk size
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with the sought object
    • findRowsInTransactionEqualTo

      void findRowsInTransactionEqualTo(Object value, IChunkLong version, IBitmap deletions, int offset, int chunkLimit, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with the sought object in the chunk.
      Parameters:
      value - the sought object
      version - the column holding the version number at which each row has been inserted or removed corresponding to this chunk
      deletions - a bitmap of deleted rows, that should not be scanned
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      chunkLimit - the number of rows to iterate in this chunk, it should not be bigger than the chunk size
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with the sought object
    • findRowsInTransactionEqualTo

      void findRowsInTransactionEqualTo(Object value, IIntIterator filter, IBitmap deletions, int offset, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with the sought object in the chunk.
      Parameters:
      value - the sought object
      filter - all the row numbers that should be considered for this research
      deletions - a bitmap of deleted rows, that should not be scanned
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with the sought object
    • findRowsInTransactionInSet

      void findRowsInTransactionInSet(Set<Object> values, IChunkLong version, int offset, int chunkLimit, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with any object in the set of sought objects in the chunk.
      Parameters:
      values - the set of sought objects
      version - the column holding the version number at which each row has been inserted or removed corresponding to this chunk
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      chunkLimit - the number of rows to iterate in this chunk, it should not be bigger than the chunk size
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with any object in the set of sought objects
    • findRowsInTransactionInSet

      void findRowsInTransactionInSet(Set<Object> values, IChunkLong version, IBitmap deletions, int offset, int chunkLimit, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with any object in the set of sought objects in the chunk.
      Parameters:
      values - the set of sought objects
      version - the column holding the version number at which each row has been inserted or removed corresponding to this chunk
      deletions - a bitmap of deleted rows, that should not be scanned
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      chunkLimit - the number of rows to iterate in this chunk, it should not be bigger than the chunk size
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with any object in the set of sought objects
    • findRowsInTransactionInSet

      void findRowsInTransactionInSet(Set<Object> values, IIntIterator filter, IBitmap deletions, int offset, IRowMapping mapping, int limit, IBitmap result)
      Scans all the records and returns all the rows matching with any object in the set of sought objects in the chunk.
      Parameters:
      values - the set of sought objects
      filter - all the row numbers that should be considered for this research
      deletions - a bitmap of deleted rows, that should not be scanned
      offset - the offset in the result, the offset is the row number in the store corresponding to the first row in this chunk
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      limit - the size limit for the result of this search, it can be 0 for a non limit size result
      result - a bitmap to add the row numbers matching with any object in the set of sought objects
    • replaceBy

      default Runnable replaceBy(IChunk<?> replacement)
      Returns the destructor to run when this chunk is replaced by the given chunk.
      Parameters:
      replacement - the new chunk, that replaces this chunks
      Returns:
      the destructor to run when this chunk is replaced by the given chunk
    • localRow

      default int localRow(int chunkRow, IRowMapping mapping)
      Converts the corresponding chunk row into the underlying chunk, if it is a SparseChunk, otherwise it returns the chunk row.
      Parameters:
      chunkRow - the chunk row
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      Returns:
      the corresponding row in the chunk
    • getPhysicalSize

      default int getPhysicalSize(IRowMapping mapping)
      Returns the physical size of chunk before the chunk was compressed, if the chunk have not been compressed, it returns the chunk capacity.
      Parameters:
      mapping - this mapping is only used when this chunk is a SparseChunk, and it is a map of the global row chunk to the underlying chunk, otherwise is null
      Returns:
      the physical size of the chunk
    • freeRow

      default void freeRow(int row)
      Clears the data stored at a given row.
      Parameters:
      row - the row
    • compress

      IChunk<K> compress(IRowMapping mapping, int[] arrayMapping, int newChunkSize, IChunkFactory<K> defaultChunkCreator)
      Creates a compressed chunk from this chunk, using only the rows given in the mapping. The mapping is given in different representations :
      • The mapping as an IRowMapping from the rows of the externally visible chunk to those of the compressed chunk. This representation corresponds to the mapping used in SparseChunk
      • The arrayMapping as an array of 2*n slots. This representation corresponds to the mapping used in IChunkBinding.transfer(int[], int). Each even slot contains the index of the row to be copied from this chunk and the following odd slot contains the index of the row into which the data will be in the compressed chunk.

      Note that the two representations of the mapping can seem inconsistent, in case we are recompressing the underlying chunk of a sparse chunk (the IRowMapping maps the externally visible rows to the rows of the future underlying of the future re-compressed chunk, while the arrayMapping maps the rows of current underlying chunk to the future underlying of re-compressed chunk).

      Parameters:
      mapping - the mapping from the rows of the externally visible chunk to those of the compressed chunk
      arrayMapping - the array representation of the mapping
      newChunkSize - the size of the chunk we want to create
      defaultChunkCreator - the chunk creator for a "normal" chunk (if this chunks knows of a better way to sparse itself, it can use it)
      Returns:
      a compressed chunk containing the copied data from this chunk
    • createArrayCursor

      default IArrayCursor createArrayCursor()
      Description copied from interface: IArray
      Creates a new read-only cursor that can be moved up and down the array. This cursor delegates all reading calls to the underlying array.
      Specified by:
      createArrayCursor in interface IArray
      Specified by:
      createArrayCursor in interface IWritableArray
      Returns:
      the cursor, stationed at index 0 of the array
    • sparseChunkCompression

      default IChunk<K> sparseChunkCompression(IRowMapping mapping, int[] arrayMapping, int newChunkSize, IChunkFactory<K> underlyingChunkCreator)
      Creates a sparse chunk from this chunk.
      Parameters:
      mapping - the mapping from the rows of the externally visible chunk to those of the compressed chunk
      arrayMapping - the array representation of the mapping
      newChunkSize - the size of the chunk we want to create
      underlyingChunkCreator - the chunk creator for the underlying chunk of the sparse chunk
      Returns:
      a compressed chunk containing the copied data from this chunk
    • getChunkCreatorForSparse

      default IChunkFactory<K> getChunkCreatorForSparse(IChunkFactory<K> defaultChunkCreator)
      Gets the chunk creator to use for the internal smaller chunk when replacing this chunk with a sparse chunk.
      Parameters:
      defaultChunkCreator - the chunk creator that was defined for the initial chunk that led to the creation of this chunk, possibly through several compression cycles
      Returns:
      the chunk creator to use to create a sparse chunk from this chunk
    • createFrequentValueChunk

      default IFrequentValueChunk<K> createFrequentValueChunk(int chunkSize, K frequentValue, IChunk<K> underlyingChunk, IRowMapping mapping, boolean hasNullValues)
      Creates a compressed version of this chunk using frequency compression.
      Parameters:
      chunkSize - visible size of the compressed chunk
      underlyingChunk - underlying chunk containing values different from the frequent value
      mapping - mapping liking visible rows to each non-frequent value
      hasNullValues - flag indicating if the underlying chunk contains null values.
      See Also:
    • getChunkType

      default String getChunkType()
      (For debug purpose) Gets the type of the chunk, and, if this chunk has an underlying chunk, the type of its underlying chunks recursively.
      Returns:
      a String representing the type of the chunk