Skip to main content

Chunks

As explained in Partitioning and NUMA, each store of the Datastore is split into different partitions, to allow for better concurrency.

Each partition points to a data storage component. This storage has the same columnar structure across all partitions of a given store, with one column for each and every field of that store.

Each column is then split into chunks. Chunks are the atomic unit of off heap memory allocation in Atoti: it is the smallest allocation a column will ever do. In one partition, all chunks hold data for the same number of rows: the chunks of different columns are aligned.

Chunk compression

Compressing a chunk consists in finding a more compact representation of the data it stores. It aims at reducing the memory usage.

Chunk compression is performed automatically but can also be triggered using com.activeviam.tech.mvcc.api.IEpochHistory.compress().

A chunk is only compressed once it is entirely filled with values.

Frequent value compression

Frequency compression relies on extracting the most-frequent value. The chunk is then compressed by only storing the values that differ from the extracted value - also referred as explicit values -. The new chunk being significantly smaller than its uncompressed version, a mapping must also be constructed, indicating, for each line of the uncompressed chunk, where to find the explicit value in the compressed chunk.

Heuristics ensure that the memory footprint of the compressed chunk, summed with the memory footprint of the additional mapping, does indeed result in a memory gain. This compression is only performed for significant memory gains, as its tradeoff is an additional indirection when accessing a row to read its value.

The compression happens when a value appears more than x % in the chunk. x can be specified as a ratio through the property ActiveViamProperty#CHUNK_FREQUENCY_COMPRESSION_RATIO_PROPERTY. The default value is 0.75. It is not allowed for 0.5 or lower values for x as it could result in a case where there are more than one value with a presence ratio over the threshold, but the frequency value compression only handles one.

Frequent value compression can be enabled or disabled for each of the following five data types: int, float, long, double and object. This can be controlled by specifying a control word through the property: ActiveViamProperty#ENABLED_FREQUENCY_COMPRESSIONS_PROPERTY. This control word is a 6 bit word, each bit representing, in order from left to right, one of these five types: object, double, float, long, integer. The last bit (to the right) currently has no meaning.

A bit set to 1 enables the frequent value compression mode for this specific data type.

For instance, to enable all types, one must give 111110, which translates to 62. To enable the compression for long and object types, one must give 100100, which translates to 36.

Chunk allocator

Atoti stores the in-memory data in chunks. How these chunks are allocated depends on which allocator is used. In general, keeping the default allocator is the best option.

Chunk Allocator Key property

Chunk allocator can be changed using the CHUNK_ALLOCATOR_KEY_PROPERTY ActiveViam Property.

Values for this property can be found in ChunkAllocators and are the following:

  • slab: This is the default value. The SLAB direct chunk allocator is an optimal choice for NUMA aware systems that support huge pages. Requires a system property vm.overcommit_memory=1 to be set on the machine. MBean jmxPrintMemoryAllocation is available to monitor direct memory usage of this allocator.
  • direct: The Direct chunk allocator uses sun.misc.Unsafe API to allocate its memory. It can be used instead of the SLAB when it is really not possible to set vm.overcommit_memory=1 which is required for the SLAB allocator to function.
  • mmap: The MMAP direct chunk allocator uses mmap system calls to allocate its off-heap memory. It is a recommended allocator to be used on macOS as a replacement for other allocators. Use it if you get an error RuntimeException: getAvailableVirtualMemory function is not supported.
  • array: The Array chunk allocator allocates array-based chunks, stored in the heap. It is not recommended to use in production but can be used for debugging memory issues.