Data overlap

Atoti can now handle data overlap. This means it is possible to have several data cubes with duplicated data.

For more information about query nodes and data nodes responsibilities, see the distribution overview article.

danger

Atoti server will not ensure that the facts associated with a member of the distributing level are exactly the same across all replicas.

Explanation: Query distribution with data overlap

When N data nodes contain the same member of the distribution level, the query node will ensure the proper execution of each distributed query by retrieving the associated data from a single one of these N nodes.

This choice is based on a priority assigned to each data node. This priority is represented by an integer, where lower values indicate higher priority. If data is duplicated across nodes with the same priority, the node from which data is retrieved is chosen randomly at query time.

The priority of a data node can be defined with IDataClusterDefinition#DATA_NODE_PRIORITY. For more information, see the priority configuration section.

If no priority is defined, duplicated data is retrieved in priority from the node with the fewest members of distributing levels.

How to

Enable data overlap

The activation of this feature is done at start up like this:

StartBuilding.cube("MyQueryCube")
    .asQueryCube()
    .withClusterDefinition()
    .withClusterId("MyCluster")
    .withMessengerDefinition()
    .withLocalMessenger()
    .withNoProperty()
    .end()
    .withApplication("MyApplication")
    .withDistributingLevels(distributingLevels)
    .withProperty(
        IQueryClusterDefinition.HORIZONTAL_DATA_DUPLICATION_PROPERTY, Boolean.toString(true))
    .end()
    .build();

The following requirements must be met:

All applications must have at least one distributing level.
The data loaded for a duplicated distributing level member must be the same in each data node that duplicates that member in order to obtain consistent query results.

Configure data node priority

The configuration of the data node priority is done at start up like this:

StartBuilding.cube("MyDataCube")
    .withDimensions(dimensionsAdder)
    .asDataCube()
    .withClusterDefinition()
    .withClusterId("MyCluster")
    .withMessengerDefinition()
    .withLocalMessenger()
    .withNoProperty()
    .end()
    .withApplicationId("MyApplication")
    .withAllHierarchies()
    .withAllMeasures()
    .withProperty(IDataClusterDefinition.DATA_NODE_PRIORITY, String.valueOf(8))
    .end()
    .build();

When the priority is not defined for a data node, the following rules are applied:

If the property is set for some data nodes but not for others, nodes without the property will have the lowest possible priority: Integer.MAX_VALUE.
If the property is not set for any data node, the system falls back on the default behavior, prioritizing data retrieval from the node with the fewest members in the distributing levels.

Explanation: Query distribution with data overlap​

How to​

Enable data overlap​

Configure data node priority​

Explanation: Query distribution with data overlap

How to

Enable data overlap

Configure data node priority