> ## Documentation Index
> Fetch the complete documentation index at: https://docs.activeviam.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Distributed Post-Processors

A post-processor is a function that computes a measure at query time based on other measures in
the cube. Thus, retrieving a post-processed measure involves retrieving the measures upon which it
is based, potentially cascading through a dependency chain until Atoti finds primitive
measures at the given location.

`Prefetchers` are responsible for computing underlying dependencies, and are declared within
each post-processor.

A major feature of a distributed Atoti cluster is the fact that post-processors can have
distributed prefetchers. A post-processor can ask for any underlying measure, be it
pre-aggregated or post-processed, to be evaluated on a given set of locations spreading across all
the remote cubes.

This allows the user to pre-compute some aggregates on the local instances before sending them to
the query cube to finish the calculation. This way, the computation can benefit from each local
instance's aggregates cache. Each data cube stores the results of the most popular queries,
so that the pre-computed results are already available the next time similar queries hit the node.

> For efficient query planning, post-processors declare a list of prefetchers. This often allows
> Atoti to retrieve all underlying measures in a single pass before evaluating the post-
> processors. In a distributed environment, this is of paramount importance, as a single pass also
> means a single round trip on the network. **Do not ignore the prefetchers' warnings while
> working with a distributed cluster, as bad prefetching or lack of prefetching can lead to incorrect
> results.**

Post-processors can be defined on both query cubes and data cubes. In what follows, we explain the different post-processor
settings when using distribution.

## Post-Processor Definition in the Query Cube

Declaring post-processors directly in the query cube allows performing advanced calculations on
measures coming from data cubes of different applications that **do not share the same topology**.

In the following example, we consider one query cube linked to two data cubes from two different applications of a supply chain use
case.

One data cube contains all the inbound flux, the other data cube contains all the outbound
flux. To calculate the current stock for a product in a warehouse, the query cube needs to get all
the inbound flux entries for that specific product/warehouse location, and needs to subtract all
the outbound flux entries.

```java theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
StartBuilding.cube("QueryCube")
    .withUpdateTimestamp()
    .withinFolder("Native_measures")
    .withAlias("Distributed_Timestamp")
    .withFormatter("DATE[HH:mm:ss]")
    .withPostProcessor("CURRENT_STOCK")
    // CURRENT_STOCK = incoming - leaving
    .withPluginKey("DiffPostProcessor")
    .withUnderlyingMeasures("incoming_flux", "leaving_flux")
    .asQueryCube()
    .withClusterDefinition()
    .withClusterId("SupplyChainCubeCluster")
    .withMessengerDefinition()
    .withProtocolPath("jgroups-protocols/protocol-tcp.xml")
    .end()
    .withApplication("app-incoming-flux")
    .withoutDistributingLevels()
    .withApplication("app-leaving-flux")
    .withoutDistributingLevels()
    .end()
    .build();
```

It is also possible to define measures in the query cube using the [Copper API](../copper/copper_measures).

## Post-Processor Definition in the Data Cube

In order to leverage the remote data cube instances, Atoti Server seeks to distribute the computations of
the post-processors defined in the data cube. The partial results are then reduced within the query cube,
using a given aggregation function (default is SUM). However, distribution is only possible if both of the
following conditions are met:

### Requirement 1: Preserving the distributing level in the partitioning

The post-processor and all its descendants **must not remove** the distributing level from the
collection of partitioning levels in `IPostProcessor#setPartitioningLevels()`.

### Requirement 2: Providing a reduction logic

The post-processor, or one of its ancestors **that also satisfies Requirement 1**, must define a valid reduction
logic. This is the case if one of the following conditions holds:

* The post-processor implements `IPartitionedPostProcessor` and **adds** the distributing level to the
  collection of partitioning levels in `IPostProcessor#setPartitioningLevels()`.
* The post-processor implements `IDistributedPostProcessor` and `IDistributedProcessor#canBeDistributed()` returns
  `true` for the queried location.
* The distributed level is expressed in the queried location as a point coordinate. This condition only depends on
  the location and applies uniformly to all post-processors.

> To achieve a good distribution of the computations, it is therefore recommended to include the distributing
> level in the partitioning of the post-processors whenever it is possible. The top level post-processors should
> also define a reduction logic using one of the valid methods listed above.
