Distributed Post-Processors
A post-processor is a function that computes a measure at query time based on other measures in the cube. Thus, retrieving a post-processed measure involves retrieving the measures upon which it is based, potentially cascading through a dependency chain until Atoti finds primitive measures at the given location.
Prefetchers
are responsible for computing underlying dependencies, and are declared within
each post-processor.
A major feature of a distributed Atoti cluster is the fact that post-processors can have distributed prefetchers. A post-processor can ask for any underlying measure, be it pre-aggregated or post-processed, to be evaluated on a given set of locations spreading across all the remote cubes.
This allows the user to pre-compute some aggregates on the local instances before sending them to the query cube to finish the calculation. This way, the computation can benefit from each local instance's aggregates cache. Each data cube stores the results of the most popular queries, so that the pre-computed results are already available the next time similar queries hit the node.
For efficient query planning, post-processors declare a list of prefetchers. This often allows Atoti to retrieve all underlying measures in a single pass before evaluating the post- processors. In a distributed environment, this is of paramount importance, as a single pass also means a single round trip on the network. Do not ignore the prefetchers' warnings while working with a distributed cluster, as bad prefetching or lack of prefetching can lead to incorrect results.
Post-processors can be defined on both query cubes and data cubes. In what follows, we explain the different post-processor settings when using distribution.
Post-Processor Definition in the Query Cube
Declaring post-processors directly in the query cube allows performing advanced calculations on measures coming from data cubes of different applications that do not share the same topology.
In the following example, we consider one query cube linked to two data cubes from two different applications of a supply chain use case.
One data cube contains all the inbound flux, the other data cube contains all the outbound flux. To calculate the current stock for a product in a warehouse, the query cube needs to get all the inbound flux entries for that specific product/warehouse location, and needs to subtract all the outbound flux entries.
StartBuilding.cube("QueryCube")
.withUpdateTimestamp()
.withinFolder("Native_measures")
.withAlias("Distributed_Timestamp")
.withFormatter("DATE[HH:mm:ss]")
.withPostProcessor("CURRENT_STOCK")
// CURRENT_STOCK = incoming - leaving
.withPluginKey("DiffPostProcessor")
.withUnderlyingMeasures("incoming_flux", "leaving_flux")
.asQueryCube()
.withClusterDefinition()
.withClusterId("SupplyChainCubeCluster")
.withMessengerDefinition()
.withProtocolPath("jgroups-protocols/protocol-tcp.xml")
.end()
.withApplication("app-incoming-flux")
.withoutDistributingLevels()
.withApplication("app-leaving-flux")
.withoutDistributingLevels()
.end()
.build();
It is also possible to define measures in the query cube using the Copper API.
Post-Processor Definition in the Data Cube
In order to leverage the remote data cube instances, Atoti Server seeks to distribute the computations of the post-processors defined in the data cube. The partial results are then reduced within the query cube, using a given aggregation function (default is SUM). However, distribution is only possible if both of the following conditions are met:
Requirement 1: Preserving the distributing level in the partitioning
The post-processor and all its descendants must not remove the distributing level from the
collection of partitioning levels in IPostProcessor#setPartitioningLevels()
.
Requirement 2: Providing a reduction logic
The post-processor, or one of its ancestors that also satisfies Requirement 1, must define a valid reduction logic. This is the case if one of the following conditions holds:
- The post-processor implements
IPartitionedPostProcessor
and adds the distributing level to the collection of partitioning levels inIPostProcessor#setPartitioningLevels()
. - The post-processor implements
IDistributedPostProcessor
andIDistributedProcessor#canBeDistributed()
returnstrue
for the queried location. - The distributed level is expressed in the queried location as a point coordinate. This condition only depends on the location and applies uniformly to all post-processors.
To achieve a good distribution of the computations, it is therefore recommended to include the distributing level in the partitioning of the post-processors whenever it is possible. The top level post-processors should also define a reduction logic using one of the valid methods listed above.