Skip to main content

Incremental Refresh advanced

This section describes some incremental refresh key mechanisms.

Incremental refresh in details

The Atoti application is made of two main components :

  • Cube which answers business queries
  • A database which stores the fact data. This database could be a Datastore or a DirectQuery connector.

Cube can query the database directly to get the data he needs at query time. But often it will preload some data at startup.

These data are stored in two components :

  • Hierarchies which contains the members existing in the cube hierarchy levels at a specific version
  • Aggregate provider which stores precomputed primitive aggregates at a specific version.

These two components are fed with a query executed at startup.
These components are materialized views of the result of the query and are considered to be always up-to-date.
On a new version of the data, database is in charge to warn the cube and provide a description of the data change.

To fulfill this need the cube will register streams on the database. A stream is a registered query along with a listener, the registration usually sends the initial data.
After the registration the listener is warned about new version and data changes. In a DirectQuery application the cube registers a stream for each hierarchy and a stream for each aggregate provider.

When describing a change DirectQuery connector will search into the registered streams the impacted ones.
For each impacted streams, it will compute a list of operations to refresh these streams.

Schema

many_to_one
sales
---------------------
[string] date
[string] product
[double] quantity
products
---------------------
[int] date
[string] product
[string] color
[double] unit_price

Fact hierarchy on sales.date
Factless hierarchy on products.color
Aggregate provider on sales.date, product.color -> sum(sales.quantity * products.price)

Partial or Wrong change description

Partial or wrong change description are not supported.
If this example, we describe what could happen if an incorrect hint is provided.

Changes

One row with a sale on product P1 has been added.
There is already a sale on this product on the same date.

dateproductquantity
2024-01-01P110.0
2024-01-01P220.0
2024-01-01P130.0

Incomplete change description

Here we miss the fact that the provided table update detail does not capture precisely the changed data.
It will capture the added row but also the first row.

Table Update Detailsales
Change typeADD_ROWS
Change Scopesales.product = 'P1'

Impact

Two streams are impacted :

Hierarchy sales.date. 2024-01-01 member count will be off by one (4 instead of 3).

Old version (member / count)New version (member / count)
2024-01-01, blue / 22024-01-01 / 4

Aggregate provider on sales.date, product.color -> sum(sales.quantity * products.price) will be wrong. The first row will be counted twice in the aggregate provider.

Old version (points / count)New version (points / count)
2024-01-01, blue / 12024-01-01, blue / 3
2024-01-01, red / 12024-01-01, red / 1

What should I do ?

If this happens, the recommended behavior is to trigger a full refresh and to abandon all the wrong versions.

Orthogonal condition on remove or update

Sometimes, despite a clear description of the changed data, it is not possible to exploit the condition to compute a remove where operation.
Take an aggregate provider indexed by (K1,K2).
If the condition of the remove or the update is not on some of the two keys, the only solution is to fully refresh the provider.

Impact along Optional Relationship

In this example we will show why Optional relationship could trigger a full refresh.
Here the relationship between Sales and Products. So it means, there could have some sales on a product not present in products.

Data before changes

Sales

dateproductquantity
2024-01-01P110.0
2024-01-01P220.0
2024-01-01P330.0

Products

dateproductcolorunit_price
2024-01-01P1blue5.0
2024-01-01P2red10.0

Aggregate provider before change

Color of the missing product is set to default value N/A.

datecolorsum(quantity*unit_price)
2024-01-01blue50.0
2024-01-01red200.0
2024-01-01N/A0.0

Changes

The change is just the add of the new product.

dateproductcolorunit_price
2024-01-01P1blue5.0
2024-01-01P2red10.0
2024-01-01P3green15.0

Changes Description

The change is easily described, and we could expect that it will trigger only an incremental refresh.

Table Update Detailproducts
Change typeADD_ROWS
Change Scopeproducts.color = 'green'

But because of the Optional relationship we will end up with a full refresh of the aggregate provider.