Chunks
As explained in Partitioning and NUMA, each store of the Datastore is split into different partitions, to allow for better concurrency.
Each partition points to a data storage component. This storage has the same columnar structure across all partitions of a given store, with one column for each and every field of that store.
Each column is then split into chunks. Chunks are the atomic unit of off heap memory allocation in ActivePivot: it is the smallest allocation a column will ever do. In one partition, all chunks hold data for the same number of rows: the chunks of different columns are aligned.
Chunk compression
Compressing a chunk consists in finding a more compact representation of the data it stores. It aims at reducing the memory usage.
Chunk compression is performed automatically but can also be triggered using com.qfs.multiversion.IEpochHistory.compress()
.
A chunk is only compressed once it is entirely filled with values.
Frequent value compression
Frequency compression relies on extracting the most-frequent value. The chunk is then compressed by only storing the values that differ from the extracted value - also referred as explicit values -. The new chunk being significantly smaller than its uncompressed version, a mapping must also be constructed, indicating, for each line of the uncompressed chunk, where to find the explicit value in the compressed chunk.
Heuristics ensure that the memory footprint of the compressed chunk, summed with the memory footprint of the additional mapping, does indeed result in a memory gain. This compression is only performed for significant memory gains, as its tradeoff is an additional indirection when accessing a row to read its value.
The compression happens when a value appears more than x
% in the chunk. x
can be specified as a ratio through the property ActiveViamProperty#CHUNK_FREQUENCY_COMPRESSION_RATIO_PROPERTY
. The default value is 0.75. It is not allowed for 0.5 or lower values for x
as it could result in a case where there are more than one value with a presence ratio over the threshold, but the frequency value compression only handles one.
Frequent value compression can be enabled or disabled for each of the following five data types: int
, float
, long
, double
and object
. This can be controlled by specifying a control word through the property: ActiveViamProperty#ENABLED_FREQUENCY_COMPRESSIONS_PROPERTY
. This control word is a 6 bit word, each bit representing, in order from left to right, one of these five types: object
, double
, float
, long
, integer
. The last bit (to the right) currently has no meaning.
A bit set to 1 enables the frequent value compression mode for this specific data type.
For instance, to enable all types, one must give 111110, which translates to 62. To enable the compression for long and object types, one must give 100100, which translates to 36.