Aggregate Provider
Introduction
Atoti allows technical users to define complex business calculations as measures, and expose these measures for analysis. When processing a query against these measures, Atoti will recursively develop the defined chain of measures until aggregated measures are reached.
Aggregate Providers are optional data structures that store pre-calculated values for these aggregated measures.
Because data aggregation often represents a significant portion of the query execution time, Aggregate Providers greatly reduce query processing time. This optimization comes with two trade-offs:
- Increased memory usage
- Longer transaction computation time
Example Database Structure
This example will be used throughout the article. The main database table contains trading data with these fields:
TradeId
(Primary Key)Date
Trader
Value
The database includes additional reference tables that define the cube's axes of data analysis:
- Date Hierarchy: Breaks down the
Date
field intoYear/Month/Day
structure - Trader Hierarchy: Provides organizational structure and links traders to their Desk and
Business Unit, in a
BusinessUnit/Desk/Trader
structure. - No hierarchy on
TradeId
.
Key attributes
An Aggregate Provider is a tabular data structure that consists of:
- A name
- A collection of levels, defining the lines of the table.
- A collection of measures, defining the columns of the table.
- A filter (optional)
- A type
- A partitioning (optional)
Defining Aggregate Providers
Aggregate providers can be configured using two different approaches: a fluent builder interface or direct definition.
Using the Fluent Builders
The fluent builder provides a streamlined, chainable API for configuring aggregate providers. This approach offers a more readable and maintainable way to define configurations, with built-in validation and a natural progression of settings.
pivotDescription =
StartBuilding.cube("MyPivot")
.withAggregatedMeasure()
.sum("value")
.withSingleLevelDimension("id")
.withAggregateProvider()
.jit()
.withPartialProvider()
.withName("MyProvider")
.bitmap()
.includingOnlyLevels(LevelIdentifier.simple("Date"), LevelIdentifier.simple("Trader"))
.filteredOn(Map.of(LevelIdentifier.simple("Date"), LocalDate.now()))
.build();
Direct definition
Direct definition offers more control for complex scenarios and involves three steps:
- Definition of the partial aggregate providers:
final PartialProviderDefinition partialProvider =
new PartialProviderDefinition(
"MyProvider",
IAggregateProviderDefinition.BITMAP_PLUGIN_TYPE,
List.of(LevelIdentifier.simple("id")),
List.of("value.SUM"),
PartialProviderFilters.noFilter(),
new Properties(),
null); - Creation of the global provider that encompasses the partial providers:
final AggregateProviderDefinition globalProviderDefinition = new AggregateProviderDefinition();
globalProviderDefinition.setPartialProviders(List.of(partialProvider)); - Building the complete aggregate provider definition and forwarding it to the cube definition:
IActivePivotInstanceDescription pivotDescription =
StartBuilding.cube("MyPivot")
.withAggregatedMeasure()
.sum("value")
.withSingleLevelDimension("id")
.withAggregateProvider(globalProviderDefinition)
.build();
Automatic definition
Atoti provides an out-of-the-box AI functionality that computes, for a given application and its associated past workload, the best configuration of partial aggregate providers.
Configuration details
Name Selection
The Aggregate Provider's name must uniquely identify the Aggregate Providers amongst all the
providers of an Atoti Cube.
It is optional as of version 6.1
, but will be mandatory in future versions.
A name must be provided to add and remove Aggregate Providers on the fly.
In a query, if the retrieval of some aggregated data is performed against an Aggregate Provider, the query plan will mention the name(s) of the Aggregate Provider(s) it used.
Level Selection
An Aggregate Provider determines the granularity of its pre-aggregated data through its collection of defining levels.
- Every hierarchy in the Atoti Cube must select exactly one level
- The data is pre-aggregated up to the selected level for each hierarchy
The Aggregate Provider automatically selects the default level for any hierarchy not explicitly specified in its definition:
- For non-slicing hierarchies: The default is the
ALL
level, which represents the entire hierarchy- For slicing hierarchies: The default is the first level, which may contain multiple members
Note: When working with slicing hierarchies, be mindful that the first level could contain any number of members, which impacts the memory footprint of the Aggregate Provider.
For a given hierarchy, an Aggregate Provider:
- Cannot serve queries requiring a finer granularity than the selected level
- Can serve queries at the selected level's granularity
- Can serve queries at any coarser granularity above the selected level
For example, an Aggregate Provider on the levels Year
and Desk
will contain values for cube
locations like:
Date | Trader | Measures... |
---|---|---|
AllMember/2025 | AllMember/Business Unit 1 | ... |
AllMember/2024 | AllMember/Business Unit 1 | ... |
AllMember/2025 | AllMember/Business Unit 2 | ... |
AllMember/2024 | AllMember/Business Unit 2 | ... |
It will be able to serve queries on locations like:
{
"Date": "AllMember/*",
"Trader": "AllMember/Business Unit 1"
}
but it will not be able to serve queries on finer locations like:
{
"Date": "AllMember/2025/*",
"Trader": "AllMember/Business Unit 1"
}
Finally, the properties of aggregation functions allow the Aggregate Provider to serve queries on locations like:
{
"Date": "AllMember",
"Trader": "AllMember/Business Unit 1"
}
by further aggregating the data held in the Aggregate Provider.
Level selection is a trade-off between query processing performance and storage cost.
Measure Selection
While the level selection defines the lines of the Aggregate Provider's tabular structure, the measure selection defines its columns.
An Aggregate Provider can only store values for aggregated measures, defined as the aggregation of one or more database fields.
Value.SUM
and Contributors.COUNT
are examples of such measures.
UDAFs
can also be included in the measure selection.
Special care must be given to aggregated measures returning arrays/vectors. The associated
column in the Aggregate Provider will significantly contribute to the memory footprint of the
application.
During the planning phase of a query, Atoti will automatically retrieve data from Aggregate Providers when that is possible.
Advanced considerations:
- If Atoti can retrieve the data from multiple Aggregate Providers, Atoti will attempt to use the Aggregate Provider best matching the retrieval's granularity.
- Atoti does not pull data from an Aggregate Provider if it does not contain the entire list of aggregated measures required in the retrieval.
- Atoti does not merge partial data sets coming from multiple Aggregate Providers.
Filter Selection
Aggregate Providers can be optimized for specific query patterns by applying filters limiting their scope. This capability is particularly valuable when a subset of the data receives the majority of queries while needing maximum performance.
An Aggregate Provider's filter determines which subset of data should be pre-aggregated. For each level of the level selection, the filter defines a collection of members to include.
Type Selection
Atoti offers three types of Aggregate Providers, each with its own characteristics and trade-offs:
- Just-in-Time (JIT) Provider
- Delegates aggregated-data retrieval execution to the underlying database/datastore
- Does not store any data
- Lightweight but slower performance
- Best suited for scenarios where memory conservation is critical
- Leaf Provider
- Stores pre-aggregated data in a tabular structure
- Uses runtime scanning of the tabular structure to match query conditions
- Provides a balance between memory usage and performance
- Processes queries by:
- Converting retrieval location and filter into boolean conditions
- Scanning the tabular structure to find matching lines
- Bitmap Provider
- Stores pre-aggregated data in a tabular structure
- Maintains an additional bitmap index for faster lookups
- Offers the fastest query performance but highest memory usage
- Processes queries by:
- Converting retrieval location and filter into boolean conditions
- Using the bitmap index for rapid condition matching
The choice between provider types involves balancing query performance requirements with available
memory resources.
A bitmap index's memory footprint scales with the cardinality of each selected level.
Partitioning Selection
Aggregate Providers can be configured to support parallel query execution through data partitioning. When enabled, this feature:
- Divides pre-aggregated data into separate partitions
- Allows the computation chain to process these partitions independently
- Maintains the separation of data for as long as possible during query execution
Benefits of partitioning include:
- Improved query performance through parallel processing
- Better resource utilization across multiple threads
Selecting a partitioning requires analysis of both Query Patterns and Data Distribution: the goal
is to evenly distribute the data across the partitions while evenly distributing the workload.
The number of CPU cores should be taken into account as well.
Selecting a partitioning is optional. When unspecified, the Aggregate Provider will
automatically align its partitioning with the underlying datastore (if applicable).
Characteristics
Atoti automatically and transparently maintains the defined Aggregate Providers.
- Whenever possible, Atoti prioritizes using an Aggregate Provider over using the underlying database's aggregation capabilities.
- Aggregate Providers natively handle data versioning, ensuring consistent version management across the system.
- With a datastore as the core of the application's architecture, Aggregate Providers dynamically react to datastore transactions, maintaining perfect synchronization between the Datastore, the Cube, and its Aggregate Providers.
With an external database as the core of the application's architecture, Atoti cannot automatically listen to a stream of updates coming from the database. In this case, Data Versioning and Consistency is the responsibility of the end user.