Skip to main content

Incremental Refresh

caution

Incremental refresh is an experimental feature, the API might be changed in any release.

To use this feature, please add the following JVM argument: -Dactiveviam.feature.experimental.incremental_refresh.enabled=true.

DirectQuery stores some snapshots of the data in the Atoti application.
However, the data in the external database may change (new intra-day data or corrections) and these changes make the snapshot data outdated.
Incremental refresh is the way to solve this issue. The user triggers Incremental refresh to warn the DirectQuery connector that a data change has occurred.
He does so describing how the data has changed in the external database.
For each table which have changed, the user provides the type of the change and the condition allowing to capture the scope of the change.

For instance, let's take the following example Schema:

Schema

We can describe the change by describing the type of change and its scope for each table which has changed :

Change description example

Table Update Detailsalesproducts
Change typeADD_ROWSADD_ROWS
Change Scopesales.date = '2024-01-02'products.date = '2024-01-02'
info

Change description only allows one table update detail per table in the schema.

DirectQuery connector will create a new database version and triggers a new version creation in the cube. If necessary, it will also stream the new data to the cube.

The DirectQuery connector will first identify the streams impacted by the change description. Streams are a way to listen to the database changes, they fulfill several needs :

  • Cube Hierarchy feeding
  • Cube Aggregate provider feeding
  • Custom list query stream registered by the user

After identification of impacted streams the connector will load the new data and send the right change instructions to each stream. Instructions can be :

  • remove where
  • add aggregate fact

If data have been added DirectQuery connector will make a query in the database to capture the new added rows and streams these new values.
If data have been removed, the connector will send a remove where operation to the listener.
If data changes are updates or mixed changes, the connector will refresh the described scope. It will send a remove where operation to the listener and capture current value of the scope in the database.

Sometimes the scope provided is not sufficient to find an incremental way to update the data and the connector will refresh all the data (see Full Refresh heading below).

During the operation, the cube stays available.

caution

Incremental refresh does not support partial refresh.
Scope provided should capture the entirety of the data change.
If not, cube components (aggregate provider, hierarchies) and database version will be de-synchronized.
This could lead to inconsistent results ( see partial or wrong change description).

Table update detail

To describe your change for each table, you can use below change types.

Added rows (ADD_ROWS)

Changes

Add rows change type describes a pure rows addition in your table.

dateproductquantity
2024-01-01P110.0
2024-01-01P220.0
2024-01-02P130.0

Change description

The provided scope must describe ALL the added rows and ONLY the added row.

Table Update Detailsales
Change typeADD_ROWS
Change Scopesales.date = '2024-01-02'

Added rows dirty (ADD_ROWS_WITH_DIRTY_CONDITION)

Changes

Sometimes you are not able to provide a precise enough condition to capture the added lines.
You can provide a scope which contains older data and precise that the scope you are providing is dirty.

dateproductquantity
2024-01-01P110.0
2024-01-01P220.0
2024-01-01P130.0

Change description

The provided scope must describe ALL the added rows but can overlap with some exiting rows.

Table Update Detailsales
Change typeADD_ROWS_WITH_DIRTY_CONDITION
Change Scopesales.product = 'P1'

Updated rows (UPDATE_ROWS)

Changes

Update rows change type describes a pure rows updates in your table. Scope condition must use a non updated column to describe the change.

dateproductquantity
2024-01-01P110.0
2024-01-01P220.0 30

Change description

The provided scope must contain ALL the updated rows.

Table Update Detailsales
Change typeUPDATE_ROWS
Change Scopesales.product = 'P2'

Removed rows (REMOVE_ROWS)

Changes

Remove rows change type describes a pure rows deletion in your table.

dateproductquantity
2024-01-01P110.0
2024-01-01P220.0

Change description

The scope must contain ALL and ONLY the removed rows.

Table Update Detailsales
Change typeREMOVE_ROWS
Change Scopesales.product = 'P2'

Removed rows dirty (REMOVE_ROWS_WITH_DIRTY_CONDITION)

Changes

Sometimes you are not able to provide a precise enough condition to capture the removed rows. You can provide a scope which contains older data and precise that the scope you are providing is dirty.

dateproductquantity
2024-01-01P110.0
2024-01-01P210.0
2024-01-01P220.0

Change description

The scope must contain ALL the removed rows but can overlap with some existing rows.

Table Update Detailsales
Change typeREMOVE_ROWS_WITH_DIRTY_CONDITION
Change Scopesales.product = 'P2'

Mixed changes (MIXED_CHANGES)

Changes

Mixed changes change type describes a change which is composed of various changes (adds, updates, deletions).

dateproductquantity
2024-01-01P110.0
2024-01-01P220.0 30
2024-01-01P220.0
2024-01-02P240.0

Change description

Table Update Detailsales
Change typeMIXED_CHANGES
Change Scopesales.product = 'P2'

Intertwined Changes

If the scope condition uses a column impacted by an update you need to provide the old value and the new value in the scope.

dateproductquantity
2024-01-01P110.0
2024-01-01P310.0
2024-01-01P2 P320.0
2024-01-01P220.0
2024-01-02P240.0

Change description

The provided scope must contain ALL the changed rows.

Table Update Detailsales
Change typeMIXED_CHANGES
Change Scopesales.product = 'P2' OR sales.product='P3'
info

Dirty table update details often result to less efficient refresh comparing to their clean alternatives.

Incremental refresh plan

For each call to incremental refresh, an incremental refresh plan is computed and logged at level FINE ( com.activeviam.database.sql.*).
The incremental refresh restates the processed change description and shows for each stream the triggered refresh operations.
For instance, on a cube with only a hierarchy on sales.product and no aggregate provider, there is only the hierarchy stream to refresh :

FINE: IncrementalRefreshPlan

Change Descriptions:

TableUpdateDetailInternal(tableName=Sales, changeType=ADD_ROWS, columnCondition=Sales DATE = 2024-01-02 )

TableUpdateDetailInternal(tableName=Products, changeType=ADD_ROWS, columnCondition=Product DATE = 2024-01-02 )

Streams:

// PRODUCT HIERARCHY STREAM

Stream -> SqlPreparedAggregateQuery(query=AggregateQuery(baseTable=Sales, condition=TRUE, aggregations=[SingleFieldAggregatedElement(name=contributors.COUNT)], groupByFields=[AliasedField(alias=PRODUCT, fieldPath=`Sales PRODUCT`)], partitioning=ConstantPartitioning, chunkSize=1024,
queryTags={pivot_name=incremental_refresh_cube, **CATEGORY=HIERARCHY_FEEDING**}),
queryResultFormat=QueryResultFormat(plainRecordFormat=RecordFormat [Fields=[[PRODUCT]]], dictionarizedRecordFormat=RecordFormat [Fields=[[PRODUCT]]], dictionaries={PRODUCT=com.activeviam.tech.dictionaries.internal.impl.NullableDictionary@2446fbcd}))

//INCREMENTAL ADD OPERATION

Operations:
IncrementalAddAggViewOperation(
query=AggregateQuery(baseTable=Sales, condition=`Sales DATE` = 2024-01-02, aggregations=[SingleFieldAggregatedElement(name=contributors.COUNT)], groupByFields=[AliasedField(alias=PRODUCT, fieldPath=`Sales PRODUCT`)], partitioning=ConstantPartitioning, chunkSize=1024,
queryTags={category=incremental_refresh, pivot_name=incremental_refresh_cube}))
info

You can identify the nature of the stream with the query tag category ( hierarchy_feeding or aggregate_provider_feeding).

Full refresh

Sometimes it is not possible to resolve incremental operations to refresh an impacted stream.
DirectQuery connector will execute a full refresh operation and stream a full new data snapshot.
Full refresh operation is expressed in the plan as IncrementalRemoveWhereOperation(condition=TRUE).
Full refresh operation are costly, and they should be avoided on aggregate provider stream.

For a given stream, full refresh is triggered when :

  • Several change types are expressed in the TableUpdateDetails which impact the stream.
  • An impacting TableUpdateDetail with a RelativeCondition affects the base store of the stream.
  • An impacting TableUpdateDetail needs a remove where operation to be applied and the stream does not support such operation ( hierarchy streams, custom list query streams).
  • The scope condition is not exploitable to compute a remove where operation.
    For instance if an aggregate provider is indexed by the levels Date and Product, the impact condition must be expressed on one of these to allow a remove where operation.
  • Impact along optional relationship.

To go further