Vectors in Atoti Server

Introduction

A vector in Atoti is equivalent to a fixed-size array that has been strongly typed for performance. It comes with read and write capabilities, internal operations related to statistics usage (topKIndices, variance...) and cross-vector operations (plus, minus, scale...).

Vectors are used in Atoti as values of a field in the datastore. This can be achieved when building a store with the StoreDescriptionBuilder, using withVectorField(String name, String type).

Storing Design Overview

While this section mentions Off-Heap memory to explain the design choices behind the vector architecture, the design is the same for On-Heap vectors.

In most Java applications, there is a typical distribution for the lifetime of created objects, where the vast majority of objects dies young. Thus, most Garbage Collectors are built around this empirical observation.

However, Atoti Server, and especially Atoti's datastore, acts as a database, and thus does not follow this rule: all the data is long-lived. The Garbage Collector must keep track of these objects, and even transfer them when necessary. Off-Heap Memory provides a way to hide these objects from the Garbage Collector, allowing it to focus on objects created and deleted within the application's life cycle.

The Java NIO API gives access to a DirectByteBuffer to read and write from the Off-Heap memory.

However, these buffers do not provide the performance needed for Atoti:

Instances of DirectByteBuffer are much bigger than standard Java arrays.
Java poorly handles millions of such buffers.
The creation of each buffer induces a call to the system's malloc, a single-threaded memory allocator that adds another memory overhead for its own tracking system. This goes against Atoti's efforts for multithreading, notably through partitioning

To reduce this performance overhead, Atoti allocates several vectors within a single buffer, using an abstraction called blocks.

This also means that as vectors get deleted within a block, the block itself holds less and less relevant data. To keep Atoti's memory usage efficient, blocks come with a compaction mechanism.

Allocation

In the datastore, it is possible to define the size of the vector and the vector's block for each field. Fields with different vector block sizes will use different vector allocators. However, vectors with the same vector block size will rely on the same allocator, and thus belong to the same blocks, even if they have different vector sizes.

For instance, if field A has vectors of size 10, and field B has vectors of size 5, while both have a block size of 50, a block may, at a given time, contain { 10, 5, 10, 10, 5, 5 }. If the allocator needs to allocate a vector for field A, a new block will be created.

Off-heap storage can be disabled. In this case, Atoti will rely on either arrays or heap-buffers to manage the storage.

The class VectorUtils offers entry points to create a range of transient vectors that can be used for optimization purposes. Atoti does not by default optimize vectors to rely on these entry points. For instance, if a vector contains a single value, repeated n times, VectorUtils offers a way to create a ISameValueVector, but there are no internal mechanism that will read each vector and replace it, if possible, with a ISameValueVector.

Compaction

Atoti Server relies on a copy Garbage Collection algorithm. Whenever a version is discarded, Atoti Server iterates through all vectors in the datastore and the aggregate store, and checks if they belong to a block that is mostly garbage. If they do, the vectors are transferred to another block.

Configuration

Each vector field can configure its own size and block size. The default block size is determined by the ActiveViam property activeviam.vectors.defaultBlockSize. By default, this value is automatically set to 16K, 32K, or 64K, depending on the available memory size of the JVM.

The ActiveViam Property activeviam.vectors.garbageCollectionFactor (value between 0 and 1) controls how soon a block is considered needing compaction. It represents a trade-off between transaction performance and memory footprint.

Selecting the size of the vector blocks is considered an advanced feature. The size is set automatically, depending on the amount of memory available to the JVM, and the size of the vectors.

Selecting a vector block size for a field is an operation that requires great care: in order to:

minimize external memory loss (have the block size as close as possible to a multiple of the page cache size)
minimize internal memory loss (not wasting memory in the last block to store the last vectors of the chunk)
keep a high enough number of blocks to ensure good compaction
keep their number low enough to minimize the number of allocations (those system calls are very expensive, especially in a multithreaded environment).

Swapping

This storing design allows you to seamlessly add efficient vectors swap capabilities in Atoti.

The Operating System uses a cache where it stores the most frequently accessed portions of files in memory: the page cache. When requesting some data within a file, the OS first checks the cache: if the data is found, a copy is given to the user. Otherwise, the file is loaded into the cache, and a copy of the data is given to the user.

Copying takes CPU time, hurts CPU caches, and wastes RAM with duplicated data. To minimize copies, Atoti Server relies on MappedByteBuffer (from Java NIO API as well). This class relies internally on the mmap system call to grant access to the page cache.

When swapping vectors, blocks are stored in MappedByteBuffers, and Atoti delegates the responsibility of writing the underlying file to the OS.

By default, files are written into the default OS temporary directory. Because of its limited size, it is best to provide a swap directory when defining a swapped vector field.

Swapping Advanced tuning considerations

As the amount of swapped data increases, the default settings might not provide sufficient performance. For advanced tuning, it is recommended to read on how the page cache works on Linux, to learn about vm.swapiness, vm.dirty_background_bytes and vm.dirty_bytes.

Atoti suggests these three properties to be respectively set to 0, a low value, and a high value.

The ActiveViam Property activeviam.vectors.swap.directory.numberFiles, set by default to 10k, controls the maximum number of swap files created in one directory.

On Linux, the default number of available mappings, set through vm.max_map_count, is 2^16. This limit is very likely hit on big projects, which will result in an OutOfMemoryError("Map failed"). This can be avoided by changing this kernel property or by increasing block size.

The JVM might core dump if disk space is full, or if the swap directory is full, instead of throwing an OOM Error.

Transparent Huge Pages, which acts as an adapter for simple use of Huge Pages, MUST be disabled, as the default allocator (SLAB) in Atoti natively supports Huge Pages.

Cleaning swapped vectors

Atoti does not collect old unused blocks from the disk, meaning that the disk usage can grow indefinitely. Thanks to the way memory-mapped files work, this is easily fixable: these files can be deleted without impacting Atoti, as the OS only permanently removes these files when the last reference to the file is deleted. This means that one can call rm on a swapped file, and it will be effectively deleted once the last vector in the corresponding block is marked as garbage.

Working with vector fields

With Copper

Working with vectors within Copper is a seamless experience. Standard operators between vectors, or between scalars and vectors, behave naturally. One can also access IVector-specific API using the map function and casting the argument, like so:

Copper.sum("VectorField")
    .mapToDouble(reader -> reader.readVector(0).quantileDouble(0.95))
    .withFormatter("DOUBLE[#,###.##]")
    .withName("95th Percentile");

With Post Processors

Within Atoti Server, custom aggregation functions can be written by extending the dedicatedAVectorAggregationFunction. Atoti Server does not provide specialized Post-Processors to handle vectors. Creating one's own Post-Processors should prove easy. For instance, one can calculate the 5% expected shortfall (the value that is lost on average with a 5% probability) using:

@Override
public void evaluate(
    final ILocation location,
    final IRecordReader underlyingMeasures,
    final IWritableCell resultCell) {
  final IVector v = underlyingMeasures.readVector(0);
  if (v != null) {
    final double result = v.topK((5 * v.size()) / 100).getUnderlyingArray().average();
    resultCell.writeDouble(result);
  } else {
    resultCell.writeNull();
  }
}

Warning: Note that this Post Processor does not copy the vector before using it. One must pay attention to the operation's impact on the vectors given as arguments. In the following example, it is necessary to copy the vector before applying the operation, as the plus operation modifies the vector in place.

/** This PostProcessor takes the sum of the vectorField.SUM and vectorField.AVG. */
@Override
public void evaluate(
    final ILocation location,
    final IRecordReader underlyingMeasures,
    final IWritableCell resultCell) {
  final IVector sumVector = underlyingMeasures.readVector(0);
  final IVector avgVector = underlyingMeasures.readVector(1);
  if (sumVector == null || avgVector == null) {
    resultCell.writeNull();
    return;
  }
  final IVector result = sumVector.cloneOnHeap();
  // plus overrides the vector which calls it (but only reads in the argument vector)
  result.plus(avgVector);
  resultCell.write(result);
}

Indeed, this can be calculated as the average of the biggest values of a vector, where the 'biggest' values are those that fall within the top 5%.

When implementing a post-processor that returns vectors as evaluation results, one should always use IVector as output type class parameter.

Introduction​

Storing Design Overview​

Allocation​

Compaction​

Configuration​

Swapping​

Swapping Advanced tuning considerations​

Cleaning swapped vectors​

Working with vector fields​

With Copper​

With Post Processors​