Skip to main content

Aggregate Provider

Introduction

Atoti allows technical users to define complex business calculations as measures, and expose these measures for analysis. When processing a query against these measures, Atoti will recursively develop the defined chain of measures until aggregated measures are reached.

Aggregate Providers are optional data structures that store pre-calculated values for these aggregated measures.

Because data aggregation often represents a significant portion of the query execution time, Aggregate Providers greatly reduce query processing time. This optimization comes with two trade-offs:

  • Increased memory usage
  • Longer transaction computation time

Example Database Structure

This example will be used throughout the article. The main database table contains trading data with these fields:

  • TradeId (Primary Key)
  • Date
  • Trader
  • Value

The database includes additional reference tables that define the cube's axes of data analysis:

  • Date Hierarchy: Breaks down the Date field into Year/Month/Day structure
  • Trader Hierarchy: Provides organizational structure and links traders to their Desk and Business Unit, in a BusinessUnit/Desk/Trader structure.
  • No hierarchy on TradeId.

Key attributes

An Aggregate Provider is a tabular data structure that consists of:

  1. A name
  2. A collection of levels, defining the lines of the table.
  3. A collection of measures, defining the columns of the table.
  4. A filter (optional)
  5. A type
  6. A partitioning (optional)

Defining Aggregate Providers

Aggregate providers can be configured using two different approaches: a fluent builder interface or direct definition.

Using the Fluent Builders

The fluent builder provides a streamlined, chainable API for configuring aggregate providers. This approach offers a more readable and maintainable way to define configurations, with built-in validation and a natural progression of settings.

pivotDescription =
StartBuilding.cube("MyPivot")
.withAggregatedMeasure()
.sum("value")
.withSingleLevelDimension("id")
.withAggregateProvider()
.jit()
.withPartialProvider()
.withName("MyProvider")
.bitmap()
.includingOnlyLevels(LevelIdentifier.simple("Date"), LevelIdentifier.simple("Trader"))
.filteredOn(Map.of(LevelIdentifier.simple("Date"), LocalDate.now()))
.build();

Direct definition

Direct definition offers more control for complex scenarios and involves three steps:

  1. Definition of the partial aggregate providers:
    final PartialProviderDefinition partialProvider =
    new PartialProviderDefinition(
    "MyProvider",
    IAggregateProviderDefinition.BITMAP_PLUGIN_TYPE,
    List.of(LevelIdentifier.simple("id")),
    List.of("value.SUM"),
    PartialProviderFilters.noFilter(),
    new Properties(),
    null);
  2. Creation of the global provider that encompasses the partial providers:
    final AggregateProviderDefinition globalProviderDefinition = new AggregateProviderDefinition();
    globalProviderDefinition.setPartialProviders(List.of(partialProvider));
  3. Building the complete aggregate provider definition and forwarding it to the cube definition:

IActivePivotInstanceDescription pivotDescription =
StartBuilding.cube("MyPivot")
.withAggregatedMeasure()
.sum("value")
.withSingleLevelDimension("id")
.withAggregateProvider(globalProviderDefinition)
.build();

Automatic definition

Atoti provides an out-of-the-box AI functionality that computes, for a given application and its associated past workload, the best configuration of partial aggregate providers.

Configuration details

Name Selection

The Aggregate Provider's name must uniquely identify the Aggregate Providers amongst all the providers of an Atoti Cube. It is optional as of version 6.1, but will be mandatory in future versions.

A name must be provided to add and remove Aggregate Providers on the fly.

In a query, if the retrieval of some aggregated data is performed against an Aggregate Provider, the query plan will mention the name(s) of the Aggregate Provider(s) it used.

Level Selection

An Aggregate Provider determines the granularity of its pre-aggregated data through its collection of defining levels.

  • Every hierarchy in the Atoti Cube must select exactly one level
  • The data is pre-aggregated up to the selected level for each hierarchy

The Aggregate Provider automatically selects the default level for any hierarchy not explicitly specified in its definition:

  • For non-slicing hierarchies: The default is the ALL level, which represents the entire hierarchy
  • For slicing hierarchies: The default is the first level, which may contain multiple members

Note: When working with slicing hierarchies, be mindful that the first level could contain any number of members, which impacts the memory footprint of the Aggregate Provider.

For a given hierarchy, an Aggregate Provider:

  • Cannot serve queries requiring a finer granularity than the selected level
  • Can serve queries at the selected level's granularity
  • Can serve queries at any coarser granularity above the selected level

For example, an Aggregate Provider on the levels Year and Desk will contain values for cube locations like:

DateTraderMeasures...
AllMember/2025AllMember/Business Unit 1...
AllMember/2024AllMember/Business Unit 1...
AllMember/2025AllMember/Business Unit 2...
AllMember/2024AllMember/Business Unit 2...

It will be able to serve queries on locations like:

{
"Date": "AllMember/*",
"Trader": "AllMember/Business Unit 1"
}

but it will not be able to serve queries on finer locations like:

{
"Date": "AllMember/2025/*",
"Trader": "AllMember/Business Unit 1"
}

Finally, the properties of aggregation functions allow the Aggregate Provider to serve queries on locations like:

{
"Date": "AllMember",
"Trader": "AllMember/Business Unit 1"
}

by further aggregating the data held in the Aggregate Provider.

note

Level selection is a trade-off between query processing performance and storage cost.

Measure Selection

While the level selection defines the lines of the Aggregate Provider's tabular structure, the measure selection defines its columns.

An Aggregate Provider can only store values for aggregated measures, defined as the aggregation of one or more database fields.

Value.SUM and Contributors.COUNT are examples of such measures. UDAFs can also be included in the measure selection.

warning


Special care must be given to aggregated measures returning arrays/vectors. The associated column in the Aggregate Provider will significantly contribute to the memory footprint of the application.

During the planning phase of a query, Atoti will automatically retrieve data from Aggregate Providers when that is possible.

Advanced considerations:

  • If Atoti can retrieve the data from multiple Aggregate Providers, Atoti will attempt to use the Aggregate Provider best matching the retrieval's granularity.
  • Atoti does not pull data from an Aggregate Provider if it does not contain the entire list of aggregated measures required in the retrieval.
  • Atoti does not merge partial data sets coming from multiple Aggregate Providers.

Filter Selection

Aggregate Providers can be optimized for specific query patterns by applying filters limiting their scope. This capability is particularly valuable when a subset of the data receives the majority of queries while needing maximum performance.

An Aggregate Provider's filter determines which subset of data should be pre-aggregated. For each level of the level selection, the filter defines a collection of members to include.

Type Selection

Atoti offers three types of Aggregate Providers, each with its own characteristics and trade-offs:

  1. Just-in-Time (JIT) Provider
    • Delegates aggregated-data retrieval execution to the underlying database/datastore
    • Does not store any data
    • Lightweight but slower performance
    • Best suited for scenarios where memory conservation is critical
  2. Leaf Provider
    • Stores pre-aggregated data in a tabular structure
    • Uses runtime scanning of the tabular structure to match query conditions
    • Provides a balance between memory usage and performance
    • Processes queries by:
      • Converting retrieval location and filter into boolean conditions
      • Scanning the tabular structure to find matching lines
  3. Bitmap Provider
    • Stores pre-aggregated data in a tabular structure
    • Maintains an additional bitmap index for faster lookups
    • Offers the fastest query performance but highest memory usage
    • Processes queries by:
      • Converting retrieval location and filter into boolean conditions
      • Using the bitmap index for rapid condition matching
note

The choice between provider types involves balancing query performance requirements with available memory resources.
A bitmap index's memory footprint scales with the cardinality of each selected level.

Partitioning Selection

Aggregate Providers can be configured to support parallel query execution through data partitioning. When enabled, this feature:

  • Divides pre-aggregated data into separate partitions
  • Allows the computation chain to process these partitions independently
  • Maintains the separation of data for as long as possible during query execution

Benefits of partitioning include:

  • Improved query performance through parallel processing
  • Better resource utilization across multiple threads
note

Selecting a partitioning requires analysis of both Query Patterns and Data Distribution: the goal is to evenly distribute the data across the partitions while evenly distributing the workload. The number of CPU cores should be taken into account as well.
Selecting a partitioning is optional. When unspecified, the Aggregate Provider will automatically align its partitioning with the underlying datastore (if applicable).

Characteristics

Atoti automatically and transparently maintains the defined Aggregate Providers.

  • Whenever possible, Atoti prioritizes using an Aggregate Provider over using the underlying database's aggregation capabilities.
  • Aggregate Providers natively handle data versioning, ensuring consistent version management across the system.
  • With a datastore as the core of the application's architecture, Aggregate Providers dynamically react to datastore transactions, maintaining perfect synchronization between the Datastore, the Cube, and its Aggregate Providers.
warning

With an external database as the core of the application's architecture, Atoti cannot automatically listen to a stream of updates coming from the database. In this case, Data Versioning and Consistency is the responsibility of the end user.