What is data overlap?

Data overlap is a feature in a horizontal distribution setup that allows replicating all or part of the data across multiple data nodes. It helps improve availability, performance, and resilience in distributed environments.
In a distributed setup, each member of the distributing level represents a partition of the data.
The query node automatically splits queries across data nodes without duplicating computations on replicated partitions.

When to use Data Overlap?

Data overlap is mainly used in three scenarios:

Failover support:
Each data node can be fully replicated so that if one node goes offline, another can take over.
DirectQuery hybrid setup:
A DirectQuery node holds all data, while a Datastore node contains recent partitions for faster queries. If the Datastore node goes offline, the DirectQuery node serves queries. In this setup, data overlap facilitates the data roll over process.
Load balancing:
Frequently accessed partitions can be replicated across multiple nodes to distribute query load. When nodes have the same priority, queries are randomly distributed for automatic load balancing. Note that this is true random selection — there is no guarantee of even distribution across nodes.

Does data overlap have constraints or limitations?

The application must have at least one distributing level.
Replicated partitions must be identical across all data nodes. Improper synchronization will lead to inconsistent results.
Data overlap is defined at the query node level. All applications in the distributed setup must comply with these rules.

danger

Atoti does not enforce correct data replication. It is the user’s responsibility to keep replicated partitions synchronized at all times.

How query dispatching works with data overlap

The query node maintains a map of which data node contains which partitions of the distributing level. It knows when a partition is replicated. When executing a query:

Non-replicated partitions are processed normally.
For replicated partitions:
- The node with the highest priority is selected.
- Priority is a strictly positive integer; lower values mean higher priority. Zero and negative values are not allowed.
- Define priority using IDataClusterDefinition#DATA_NODE_PRIORITY. For details, see priority configuration section.

If no priority is defined by the user:

The dispatching algorithm selects the node with the fewest distributing level members.

If at least one priority is defined:

Nodes without priority default to Integer.MAX_VALUE (lowest priority).
If nodes have the same priority, the dispatching algorithm randomly selects one at query time. This is true random selection: there is no explicit sequencing. There is no user session affinity — a user is not tied to a specific data node.

Example of query dispatching

This example illustrates how query dispatching works in a distributed setup with overlapping data across two nodes.

Setup

Distributing level: Country

Data node priority: Node A has higher priority than Node B

Data distribution:

Country	Node A	Node B
France	✅	❌
Germany	✅	✅
Italy	✅	✅
Spain	❌	✅

Overlapping partitions: Germany, Italy (present in node A and node B)

Query examples

Query with point coordinate `Country = "Germany"`

Point coordinate: Country = Germany
List coordinate: Country IN [Italy, Spain]
Wildcard coordinate: Country = *

Results:

Query	Type of Query	Country	Retrieved From
1	point coordinate	Germany	node A (priority node)
2	list coordinate	Italy, Spain	node A for Italy (priority node) node B for Spain (unique partition)
3	wildcard coordinate	*	node A for Germany and Italy (priority node) node A for France, node B for Spain (unique partitions)

How to enable data overlap

Activate the feature at startup:

StartBuilding.cube("MyQueryCube")
    .asQueryCube()
    .withClusterDefinition()
    .withClusterId("MyCluster")
    .withMessengerDefinition()
    .withLocalMessenger()
    .withNoProperty()
    .end()
    .withApplication("MyApplication")
    .withDistributingLevels(distributingLevels)
    .withProperty(
        IQueryClusterDefinition.HORIZONTAL_DATA_DUPLICATION_PROPERTY, Boolean.toString(true))
    .end()
    .build();

This activation applies to all distributed applications. All applications must respect the constraints defined above.

How to configure data node priority

Set the priority at startup:

StartBuilding.cube("MyDataCube")
    .withDimensions(dimensionsAdder)
    .asDataCube()
    .withClusterDefinition()
    .withClusterId("MyCluster")
    .withMessengerDefinition()
    .withLocalMessenger()
    .withNoProperty()
    .end()
    .withApplicationId("MyApplication")
    .withAllHierarchies()
    .withAllMeasures()
    .withProperty(IDataClusterDefinition.DATA_NODE_PRIORITY, String.valueOf(8))
    .end()
    .build();

Priority values must be strictly positive integers. Zero and negative values are not allowed. Nodes without the property default to Integer.MAX_VALUE. If no nodes have priority, the system prioritizes nodes with fewer distributing level members.

How to monitor data overlap

Retrieve data partition mapping and node priority from the query node in two ways:

Use the Java API

Use the IDistributedActivePivotVersion API:

final IDistributedActivePivotVersion version = distributedCube.getHead();
final IDistributionInformation distributionInfo = version.getDistributionInformation();
// Get data node priorities
final Map<String, Integer> priorities = distributionInfo.getPriorityPerDataNode();
// Get member mapping across data nodes
final MemberMapping memberMapping = distributionInfo.getMemberMapping();

Use JMX

Access the same information through JMX using the getDistributingLevelMemberMapping operation in the MBean com.activeviam:node0=ActivePivotManager,node1=<SchemaName>,node2=<CubeName>.

When to use Data Overlap?​

Does data overlap have constraints or limitations?​

How query dispatching works with data overlap​

Example of query dispatching​

Setup​

Query examples​

Query with point coordinate Country = "Germany"​

How to enable data overlap​

How to configure data node priority​

How to monitor data overlap​

Use the Java API​

Use JMX​