> ## Documentation Index
> Fetch the complete documentation index at: https://docs.activeviam.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Aggregate Provider

> Aggregate Providers are optional pre-aggregated data structures configured per cube using a fluent builder or direct definition, with JIT, Leaf, and Bitmap provider types that trade memory footprint for query speed at a chosen level granularity

## Introduction

Atoti allows technical users to define complex business calculations as measures, and expose these
measures for analysis.
When processing a query against these measures, Atoti will recursively develop the defined chain of
measures until aggregated measures are reached.

Aggregate Providers are **optional** data structures that store pre-calculated values for these
aggregated measures.

### How does it work

Because data aggregation often represents a significant portion of the query execution time,
Aggregate Providers greatly reduce query processing time. This optimization comes with two
drawbacks:

* Increased memory usage
* Longer transaction computation time

A new [experimental feature](#multiple-aggregate-providers-experimental) also allows combining multiple
Aggregate Providers to answer queries more efficiently.

## Example Database Structure

This example will be used throughout the article. The main database table contains trading data with
these fields:

* `TradeId` (Primary Key)
* `Date`
* `Trader`
* `Value`

The database includes additional reference tables that define the cube's axes of data analysis:

* **Date Hierarchy**: Breaks down the `Date` field into `Year/Month/Day` structure
* **Trader Hierarchy**: Provides organizational structure and links traders to their Desk and
  Business Unit, in a `BusinessUnit/Desk/Trader` structure.
* No hierarchy on `TradeId`.

## Key attributes

An Aggregate Provider is a tabular data structure that consists of:

1. A name
2. A collection of levels, defining the lines of the table.
3. A collection of measures, defining the columns of the table.
4. A filter *(optional)*
5. A type
6. A partitioning *(optional)*

## Defining Aggregate Providers

Aggregate providers can be configured using two different approaches: a fluent builder interface or
direct definition.

### Using the Fluent Builders

The fluent builder provides a streamlined, chainable API for configuring aggregate providers. This
approach offers a more readable and maintainable way to define configurations, with built-in
validation and a natural progression of settings.

```java theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
pivotDescription =
    StartBuilding.cube("MyPivot")
        .withAggregatedMeasure()
        .sum("value")
        .withSingleLevelDimension("id")
        .withAggregateProvider()
        .jit()
        .withPartialProvider()
        .withName("MyProvider")
        .bitmap()
        .includingOnlyLevels(LevelIdentifier.simple("Date"), LevelIdentifier.simple("Trader"))
        .filteredOn(Map.of(LevelIdentifier.simple("Date"), LocalDate.now()))
        .build();
```

### Direct definition

Direct definition offers more control for complex scenarios and involves three steps:

1. Definition of the partial aggregate providers:
   ```java theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
   final PartialProviderDefinition partialProvider =
       new PartialProviderDefinition(
           "MyProvider",
           IAggregateProviderDefinition.BITMAP_PLUGIN_TYPE,
           List.of(LevelIdentifier.simple("id")),
           List.of("value.SUM"),
           PartialProviderFilters.noFilter(),
           new Properties(),
           null);
   ```
2. Creation of the global provider that encompasses the partial providers:
   ```java theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
   final AggregateProviderDefinition globalProviderDefinition = new AggregateProviderDefinition();
   globalProviderDefinition.setPartialProviders(List.of(partialProvider));
   ```
3. Building the complete aggregate provider definition and forwarding it to the cube definition:

```java theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
IActivePivotInstanceDescription pivotDescription =
    StartBuilding.cube("MyPivot")
        .withAggregatedMeasure()
        .sum("value")
        .withSingleLevelDimension("id")
        .withAggregateProvider(globalProviderDefinition)
        .build();
```

### Automatic definition

Atoti provides an [out-of-the-box AI functionality](./ai_optimizer) that computes, for a given
application and its associated past workload, the best configuration of partial aggregate providers.

## Configuration details

### Name Selection

The Aggregate Provider's name must uniquely identify the Aggregate Providers amongst all the
providers of an Atoti Cube.
It is optional as of version `6.1`, but will be mandatory in future versions.

A name must be provided to
[add and remove Aggregate Providers on the fly](./dynamic_aggregate_provider).

In a query, if the retrieval of some aggregated data is performed against an Aggregate Provider, the
query plan will mention the name(s) of the Aggregate Provider(s) it used.

### Level Selection

An Aggregate Provider determines the granularity of its pre-aggregated data through its collection
of defining levels.

* Every hierarchy in the Atoti Cube must select exactly one level
* The data is pre-aggregated up to the selected level for each hierarchy

> The Aggregate Provider automatically selects the default level for any hierarchy not explicitly
> specified in its definition:
>
> * For non-slicing hierarchies: The default is the `ALL` level, which represents the entire
>   hierarchy
> * For slicing hierarchies: The default is the first level, which may contain multiple members
>
> Note: When working with slicing hierarchies, be mindful that the first level could contain any
> number of members, which impacts the memory footprint of the Aggregate Provider.
>
> [Analysis Hierarchy](../analysis_hierarchies) Restriction: Analysis hierarchies
> cannot be used in aggregate providers, except for the top levels of
> [introspecting analysis hierarchies](../analysis_hierarchies#introspecting-analysis-hierarchies)
> (the levels that correspond to selection fields).

For a given hierarchy, an Aggregate Provider:

* Cannot serve queries requiring a finer granularity than the selected level
* Can serve queries at the selected level's granularity
* Can serve queries at any coarser granularity above the selected level

For example, an Aggregate Provider on the levels `Year` and `Desk` will contain values for cube
locations like:

| Date             | Trader                      | Measures... |
| ---------------- | --------------------------- | ----------- |
| `AllMember/2025` | `AllMember/Business Unit 1` | ...         |
| `AllMember/2024` | `AllMember/Business Unit 1` | ...         |
| `AllMember/2025` | `AllMember/Business Unit 2` | ...         |
| `AllMember/2024` | `AllMember/Business Unit 2` | ...         |

It will be able to serve queries on locations like:

```json theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
{
  "Date": "AllMember/*",
  "Trader": "AllMember/Business Unit 1"
}
```

but it will not be able to serve queries on finer locations like:

```json theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
{
  "Date": "AllMember/2025/*",
  "Trader": "AllMember/Business Unit 1"
}
```

Finally, the properties of aggregation functions allow the Aggregate Provider to serve queries on
locations like:

```json theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
{
  "Date": "AllMember",
  "Trader": "AllMember/Business Unit 1"
}
```

by further aggregating the data held in the Aggregate Provider.

<Note>
  Level selection is a trade-off between query processing performance and storage cost.
</Note>

### Measure Selection

While the level selection defines the lines of the Aggregate Provider's tabular structure, the
measure selection defines its columns.

An Aggregate Provider can only store values for aggregated measures, defined as
the [aggregation](../aggregation-functions) of one or more database fields.

`Value.SUM` and `Contributors.COUNT` are examples of such measures.
[UDAFs](../../copper/copper_measures#aggregated-measures-with-user-defined-aggregate-functions-udaf)
can also be included in the measure selection.

<Warning>
  <br />Special care must be given to aggregated measures returning arrays/vectors. The associated
  column in the Aggregate Provider will significantly contribute to the memory footprint of the
  application.
</Warning>

During the planning phase of a query, Atoti will automatically retrieve data from Aggregate
Providers when that is possible.

*Advanced considerations*:

* If Atoti can retrieve the data from multiple Aggregate Providers, Atoti will attempt to use the
  Aggregate Provider best matching the retrieval's granularity.
* Atoti does not pull data from an Aggregate Provider if it does not contain the entire list of
  aggregated measures required in the retrieval.
* Atoti does not merge partial data sets coming from multiple Aggregate Providers.

### Filter Selection

Aggregate Providers can be optimized for specific query patterns by applying filters limiting their
scope. This capability is particularly valuable when a subset of the data receives the majority of
queries while needing maximum performance.

An Aggregate Provider's filter determines which subset of data should be pre-aggregated. For each
level of the [level selection](#level-selection), the filter defines a collection of members to
include.

### Type Selection

Atoti offers three types of Aggregate Providers, each with its own characteristics and trade-offs:

1. **Just-in-Time (JIT) Provider**
   * Delegates aggregated-data retrieval execution to the underlying database/datastore
   * Does not store any data
   * Lightweight but slower performance
   * Best suited for scenarios where memory conservation is critical
2. **Leaf Provider**
   * Stores pre-aggregated data in a tabular structure
   * Uses runtime scanning of the tabular structure to match query conditions
   * Provides a balance between memory usage and performance
   * Processes queries by:
     * Converting retrieval location and filter into boolean conditions
     * Scanning the tabular structure to find matching lines
3. **Bitmap Provider**
   * Stores pre-aggregated data in a tabular structure
   * Maintains an additional bitmap index for faster lookups
   * Offers the fastest query performance but highest memory usage
   * Processes queries by:
     * Converting retrieval location and filter into boolean conditions
     * Using the bitmap index for rapid condition matching

<Note>
  The choice between provider types involves balancing query performance requirements with available
  memory resources.
  <br />A bitmap index's memory footprint scales with the cardinality of each selected level.
</Note>

### Partitioning Selection

Aggregate Providers can be configured to support parallel query execution
through [data partitioning](../../concepts/partitioning).
When enabled, this feature:

* Divides pre-aggregated data into separate partitions
* Allows the computation chain to process these partitions independently
* Maintains the separation of data for as long as possible during query execution

Benefits of partitioning include:

* Improved query performance through parallel processing
* Better resource utilization across multiple threads

<Note>
  Selecting a partitioning requires analysis of both Query Patterns and Data Distribution: the goal
  is to evenly distribute the data across the partitions while evenly distributing the workload.
  The number of CPU cores should be taken into account as well.
  <br />Selecting a partitioning is optional. When unspecified, the Aggregate Provider will
  automatically align its partitioning with the underlying datastore (if applicable).
</Note>

## Characteristics

Atoti automatically and transparently maintains the defined Aggregate Providers.

* Whenever possible, Atoti prioritizes using an Aggregate Provider over using the underlying
  database's aggregation capabilities.
* Aggregate Providers natively handle [data versioning](../../concepts/data_versioning),
  ensuring consistent version management across the system.
* With a [datastore](../../datastore/datastore_config) as the core of the application's
  architecture, Aggregate Providers dynamically react
  to [datastore transactions](../../datastore/datastore_transactions), maintaining perfect
  synchronization between the Datastore, the Cube, and its Aggregate Providers.

<Warning>
  With an [external database](../../directquery/getting_started) as the core of the
  application's architecture, Atoti cannot automatically listen to a stream of updates coming from
  the database.
  In this case, [Data Versioning and Consistency](../../directquery/versioning#data-refresh) is
  the responsibility of the end user.
</Warning>

## Multiple Aggregate Providers (experimental)

### What is this feature?

This experimental feature allows a query to be answered using several Aggregate Providers when possible,
instead of relying solely on the Just-In-Time (JIT) provider.
It is designed to improve flexibility by leveraging Aggregate Providers that share similar structures.

### Why use this feature?

The main advantage is the ability to combine Aggregate Providers with different filters to answer queries more efficiently.

#### Example use case

Many similar Aggregate Providers are defined, with their only difference being the value of the date in their filter.

Without this feature, a query where the date can take several values would always fall back to the JIT provider.
With this feature, each Aggregate Provider filtered on one of these dates is used, and the results are merged to answer the query.

### How to enable the feature

In the `QueryExecution` context value, set the property `experimentalMultiplePartialProvidersEnabled` to `true`.

### Limitations

Providers must share the same structure:

1. Identical level selection.
2. Identical filter structure (but with different values).
3. Identical partitioning.

If the query filter includes more than one level in a hierarchy, no provider set will be selected.
This feature is experimental.