Recommendations
This is a compilation of various recommendations to consider when deploying an ActivePivot cluster.
Keep in mind that there's no out-of-the-box universal configuration. An ActivePivot application depends on the dataset, how it is partitioned, how frequently it is updated, what calculations are run, how many concurrent users start queries, but it is also dependent on the underlying infrastructures, network, cloud provider, use of Kubernetes, and so on.
Before being deployed in production, a distributed application must be stress-tested with real-life conditions.
Avoid High-Cardinality Hierarchies
An ActivePivot cluster is composed of several data and query nodes. For the sake of simplicity, and because most use cases are built like this, this section describes a single query node.
The user sends queries to the query node. The query node then determines which data cube calculates the intermediate results, based on its knowledge of the different data cube topologies. It then propagates sub-queries to the concerned nodes. Finally, it assembles the partial results before returning the result to the user.
In order to support interactive multidimensional analysis, the query node must have instant access to each data cube's hierarchies and their corresponding members. To make that possible, when the data nodes join the cluster they send the content of all their hierarchies to the query node.
For more information on this process, see the Communication flow topic.
For large clusters with tens of nodes and cubes with high-cardinality hierarchies, this initial phase can become resource-intensive, as it consumes CPU to serialize and deserialize the data, network bandwidth, and transient memory on the query node that temporarily holds many copies of the hierarchies.
When pushed too far, it can create cluster instability, so we recommend avoiding high-cardinality hierarchies (100K+ members) in ActivePivot clusters with more than ten nodes.
A solution is to entirely conceal hierarchies from the distributed application. In that case, the data cube adapts the discovery message sent to the query node, so that it won't be aware of their existence. The members of these hierarchies are not sent on the network. This strongly improves the stability of the JGroups cluster by reducing the network bandwidth required, and the transient memory on the query node, while it is merging the contributions of many data cubes, within the limit set in the Pending Initial Discoveries property.
When the query cube forwards queries to this data node, the data node fills the coordinates of the concealed hierarchies with the paths of their default members to create complete locations. Then it removes them just before returning the result.
We recommend concealing high cardinality hierarchies when possible. We also recommend concealing any hierarchy that was created to add a field and compute a value within a post-processor, but that end users do not need to see.
You can conceal measures that are only executed on data node. This reduces the list of measures in the UI, and prevents those measures from being evaluated in the query node.
The following code block shows how these concepts are defined in ActivePivot:
.asDataCube()
.withClusterDefinition()
.withClusterId("SupplyChainCubeCluster")
.withMessengerDefinition()
.withProtocolPath("jgroups-protocols/protocol-tcp.xml")
.end()
.withApplicationId("supply-chain")
.withConcealedHierarchies()
.withConcealedHierarchy("Truck", "TruckId")
.endHierarchyListing()
.withAllMeasures()
.end()
.build();
Avoid Large Garbage Collection Pauses
In Java applications, memory allocation is managed by the JVM and its 'garbage collection' algorithms. Garbage collection regularly has to pause the application in order to reclaim unused memory.
Short pauses usually go unnoticed, but pauses that last more than a few seconds can cause service interruptions, as the application stops responding. During a pause, the whole JVM becomes irresponsive.
If a stand-alone ActivePivot instance stops responding for a few seconds during memory reclamation, there are no consequences, apart from an exceptionally long delay for the user.
However, within a distributed cluster, a long GC pause is much more serious: when a JVM is irresponsive, other nodes start suspecting that it is gone, and after a delay they update the state of the cluster.
See Removal Delay Optimization on the
removalDelay
property for more details.
The query cube calculates the contributions the irresponsive cube made to the application's global topology, and removes them. If the JVM then starts working again, the cluster must be updated once more. At scale this creates cluster-wide instability, so garbage collection must be monitored and long pauses prevented.
To operate a stable cluster, long GC pauses (> 5-10 sec.) must be avoided at all cost. It might require tuning the JVMs of the data nodes.
The query node doesn't use off-heap memory. It only stores the content of the hierarchies (on heap), and the transient data needed to handle queries. For that reason and because long GC pauses must be avoided, do not over-allocate memory to the JVM on which it is running. Java garbage collectors are not good at handling hundreds of gigabytes of memory without long GC pauses, even with G1GC. Therefore, do not set the -Xmx parameter above 100GB for the query node.
Removal Delay Optimization
Even with good tuning and monitoring, a data node may infrequently drop out of the cluster due to a network issue, because it was saturated with calculations, or because it entered a garbage collection pause and stopped answering heartbeats for too long. When the data node is back to normal, it has to join the cluster again, and must resend its hierarchies to the query node. If several data nodes go through this at the same time, it can create a peak of activity and make the cluster unstable.
ActivePivot has a mechanism to alleviate the problem. When a data node drops out of the cluster,
the query node maintains its hierarchies for a period of time, called removalDelay
.
After this period, the query cube removes the failing cube's contributions. As this operation can be costly, we want to avoid triggering it as much as possible.
If the data node comes back later and the query node sees it has not changed based on the latest epoch of the data node, there is no need to resend the hierarchies.
When the removal delay is set to zero, the mechanism is deactivated, which is not recommended for a large cluster.
You should set the removalDelay
property higher than the longest GC pause that has been
witnessed across the cluster.
Take Charge of JGroups Configuration
Do not use the JGroups sample configuration provided in the sandbox in production without trying to deploy and run queries on your cluster in real-life situation. If you notice instabilities, you may want to change the various JGroups protocol parameters.
Additionally, there are special JGroups protocol stack components that can only be used in particular environments, such as AWS, Azure, GCP, or Kubernetes.
Prevent Excessive MergeView
Sometimes, the cluster may split into subgroups that will be merged back together afterwards. In such cases, a MergeView
will be received by the application. The MergeView
is a subclass of View
that holds the a list of views that were merged.
This behavior can be caused by network partitioning, network issues, or by large workloads causing nodes to become unresponsive
(e.g huge data loading in the data nodes).
ActivePivot handles MergeView
by applying a FullReset
, that is, all members of the new view are considered as new members.
This results in applying the discovery process again. Thus, it is important to make sure that this behavior does not occur too often, otherwise
the application will suffer from it and nodes may become unresponsive.
When observing a lot MergeView
or FullReset
, which can be spotted in the logs as follows:
INFO c.q.messenger.impl.AJGroupsMessenger - [linuxhk01_Cube_QUERY_63651] The group membership has changed, view=MergeView::[linuxhk01_Cube_QUERY_63651|164] (115) ...
INFO c.q.messenger.impl.AJGroupsMessenger - [linuxhk01_Cube_QUERY_63651] Full reset is needed because the previous view was ...
we recommend tweaking JGroups protocols for FD_SOCK
(failure detection), VERIFY_SUSPECT
(verification of a member responsiveness) and pbcast.GMS
(membership protocol)
to increases the timeouts and the number of retries as follows:
<FD_SOCK
get_cache_timeout="10000"
cache_max_elements="300"
cache_max_age="60000"
suspect_msg_interval="10000"
num_tries="10"
sock_conn_timeout="10000"/>
<FD timeout="10000" max_tries="10" />
<VERIFY_SUSPECT timeout="10000" num_msgs="5"/>
<pbcast.GMS
print_local_addr="true"
join_timeout="10000"
leave_timeout="10000"
merge_timeout="10000"
num_prev_mbrs="200"
view_ack_collection_timeout="10000"/>
In the case where large amounts of data needs to be loaded, splitting the loading process into smaller data portions may help solve the issue.
Do Mandatory Plugin Injections
When running a distributed ActivePivot, two additional components must be used:
ADistributedMessenger
and DistributedSecurityManager
. They handle the communication and the
authentication processes.
Both components need to interact with other components in order to function correctly:
the ContextValueManager
and the UserDetailsService
.
As these components are implemented as ExtendedPlugins
, they must be injected into the
ADistributedMessenger
and the DistributedSecurityManager
in order to materialize the required
component dependencies.
See this series of articles on Plugins for more details.
Here is an example with Spring Java Config:
for (final Object key : Registry.getExtendedPlugin(IDistributedMessenger.class).keys()) {
inject(IDistributedMessenger.class, String.valueOf(key), apConfig.contextValueManager());
}
// Inject the distributed security manager with security services
for (final Object key : Registry.getExtendedPlugin(IDistributedSecurityManager.class).keys()) {
inject(
IDistributedSecurityManager.class,
String.valueOf(key),
securityConfig.qfsUserDetailsService());
}
Delay Messenger Startup in Data Nodes
As explained in the Communication flow topic, the most important messages in an ActivePivot cluster are the discovery messages sent from the data nodes to the query nodes. These messages are expected to have a significant size, take time to be transmitted, and consume a large portion of memory at serialization time.
For these reasons, ActiveViam optimized the communication between a query node and its associated data nodes to regulate the flow of discovery messages.
To leverage this optimization, we highly recommend starting each data node's messenger after the initial load. To do this, deactivate the messenger's autostart feature, and start it manually after the initial load by calling:
- in AP 5.5:
((MultiVersionDataActivePivot) multiVersionActivePivot).getMessenger().start()
- in AP 5.6 and after:
((MultiVersionDataActivePivot) multiVersionActivePivot).startDistribution()
Below is a code snippet from a 5.8.x ActivePivot project where we deactivate the autostart feature:
public static IActivePivotInstanceDescription createCubeDescription() {
return configureCubeBuilder(StartBuilding.cube(CUBE_NAME))
// make this cube available in a cluster as a data node (data cube), when it starts it will join the cluster
.asDataCube()
// setup the cluster
.withClusterDefinition()
.withClusterId(TrainingCubeDistConfig.CLUSTER_ID)
.withMessengerDefinition()
.withKey(NettyMessenger.PLUGIN_KEY)
.withProperty(IMessengerDefinition.AUTO_START, Boolean.FALSE.toString())
.end()
.withProtocolPath(TrainingUtils.retrieveProtocolPath())
.end()
// setup the application id
.withApplicationId(TrainingCubeDistConfig.APPLICATION_ID)
.end()
.build();
}
After the data loading has completed, the messenger can be turned back on.
private void startMessengerAfterLoading() {
try {
for (IMultiVersionActivePivot multiVersionActivePivot : activePivotConfig.activePivotManager().getActivePivots().values()) {
// start the data cube distributed messenger
if (multiVersionActivePivot instanceof MultiVersionDataActivePivot) {
((MultiVersionDataActivePivot) multiVersionActivePivot).startDistribution();
}
}
messengerStarted = true;
LOGGER.log(Level.INFO, " **** messenger started ***");
} catch(Exception e) {
LOGGER.log(Level.SEVERE, " **** failed to start messenger ***");
}
}