Distributed Post-Processors
A post-processor is a function that computes a measure at query time based on other measures in the cube. Thus, retrieving a post-processed measure involves retrieving the measures upon which it is based, potentially cascading through a dependency chain until Atoti finds primitive measures at the given location.
Prefetchers
are responsible for computing underlying dependencies, and are declared within
each post-processor.
A major feature of a distributed Atoti cluster is the fact that post-processors can have distributed prefetchers. A post-processor can ask for any underlying measure, be it pre-aggregated or post-processed, to be evaluated on a given set of locations spreading across all the remote cubes.
This allows the user to pre-compute some aggregates on the local instances before sending them to the query cube to finish the calculation. This way, the computation can benefit from each local instance's aggregates cache. Each data cube stores the results of the most popular queries, so that the pre-computed results are already available the next time similar queries hit the node.
For efficient query planning, post-processors declare a list of prefetchers. This often allows Atoti to retrieve all underlying measures in a single pass before evaluating the post- processors. In a distributed environment, this is of paramount importance, as a single pass also means a single round trip on the network. Do not ignore the prefetchers' warnings while working with a distributed cluster, as bad prefetching or lack of prefetching can lead to incorrect results.
Post-processors can be defined on both query cubes and data cubes. In what follows, we explain the different post-processor settings when using distribution.
Post-Processor Definition in the Query Cube
Declaring post-processors directly in the query cube allows performing advanced calculations on measures coming from data cubes of different applications that do not share the same topology.
In the following example, we consider one query cube linked to two data cubes from two different applications of a supply chain use case.
One data cube contains all the inbound flux, the other data cube contains all the outbound flux. To calculate the current stock for a product in a warehouse, the query cube needs to get all the inbound flux entries for that specific product/warehouse location, and needs to subtract all the outbound flux entries.
StartBuilding.cube("QueryCube")
.withUpdateTimestamp()
.withinFolder("Native_measures")
.withAlias("Distributed_Timestamp")
.withFormatter("DATE[HH:mm:ss]")
.withPostProcessor("CURRENT_STOCK")
// CURRENT_STOCK = incoming - leaving
.withPluginKey("DiffPostProcessor")
.withUnderlyingMeasures("incoming_flux", "leaving_flux")
.asQueryCube()
.withClusterDefinition()
.withClusterId("SupplyChainCubeCluster")
.withMessengerDefinition()
.withProtocolPath("jgroups-protocols/protocol-tcp.xml")
.end()
.withApplication("app-incoming-flux")
.withoutDistributingFields()
.withApplication("app-leaving-flux")
.withoutDistributingFields()
.end()
.build();
Post-Processor Definition in the Data Cube
In order to leverage the remote data cube instances, Atoti Server relies on IDistributedPostProcessor
implementations, such as DistributedPostProcessor
,
whose computations are distributed on each data cube. The partial results are then reduced within the query cube,
using a given aggregation function (default is SUM). There are three approaches to make use of such post-processors. They are explained
below:
Implicit Distribution of a Partitioned Post-Processor
A post-processor is considered to be implicitly distributed if it implements
IPartitionedPostProcessor
and if it exposes all the distributing fields of the application as a
partitioning level.
In such cases, the post-processor and all its underlying measures are executed on each
involved cube locally, and the intermediate results are aggregated into the final result using the
IPartitionedPostProcessor#reduce
method.
IDistributedPostProcessors
and chaining of distributed operations
A distributed post-processor in a chain of post-processors can be processed locally on each remote cube, if all the chain of its recursive dependencies from its leaf computations is made of distributed operations (note that standard aggregations can always be distributed). The results of the top-most distributed operation will be sent to the query cube via the distribution layer, where they will be reduced using the distributed post-processor's reduction function.
Due to the recursive requirements for the distribution, using IDistributedPostProcessor
or IPartitionedPostProcessor
post-processors implementations is important when leveraging a horizontally-scaling distribution.