Incremental Refresh
Incremental refresh is an experimental feature, the API might be changed in any release.
To use this feature, please add the following JVM
argument: -Dactiveviam.feature.experimental.incremental_refresh.enabled=true
.
DirectQuery stores some snapshots of the data in the Atoti application.
However, the data in the external database may change (new intra-day data or corrections) and these changes make
the snapshot data outdated.
Incremental refresh is the way to solve this issue. The user triggers Incremental refresh to warn the DirectQuery
connector that a data change has occurred.
He does so describing how the data has changed in the external database.
For each table which have changed, the user provides the type of the change and the condition allowing to capture
the scope of the change.
For instance, let's take the following example Schema:
Schema
We can describe the change by describing the type of change and its scope for each table which has changed :
Change description example
Table Update Detail | sales | products |
---|---|---|
Change type | ADD_ROWS | ADD_ROWS |
Change Scope | sales.date = '2024-01-02' | products.date = '2024-01-02' |
Change description only allows one table update detail per table in the schema.
DirectQuery connector will create a new database version and triggers a new version creation in the cube. If necessary, it will also stream the new data to the cube.
The DirectQuery connector will first identify the streams impacted by the change description. Streams are a way to listen to the database changes, they fulfill several needs :
- Cube Hierarchy feeding
- Cube Aggregate provider feeding
- Custom list query stream registered by the user
After identification of impacted streams the connector will load the new data and send the right change instructions to each stream. Instructions can be :
- remove where
- add aggregate fact
If data have been added DirectQuery connector will make a query in the database to capture the new added rows and
streams these new values.
If data have been removed, the connector will send a remove where operation to the listener.
If data changes are updates or mixed changes, the connector will refresh the described scope. It will send a remove
where operation to the listener and capture current value of the scope in the database.
Sometimes the scope provided is not sufficient to find an incremental way to update the data and the connector will refresh all the data (see Full Refresh heading below).
During the operation, the cube stays available.
Incremental refresh does not support partial refresh.
Scope provided should capture the entirety of the data change.
If not, cube components (aggregate provider, hierarchies) and database version will be de-synchronized.
This could lead to inconsistent results (
see partial or wrong change description).
Table update detail
To describe your change for each table, you can use below change types.
Added rows (ADD_ROWS)
Changes
Add rows change type describes a pure rows addition in your table.
date | product | quantity |
---|---|---|
2024-01-01 | P1 | 10.0 |
2024-01-01 | P2 | 20.0 |
2024-01-02 | P1 | 30.0 |
Change description
The provided scope must describe ALL the added rows and ONLY the added row.
Table Update Detail | sales |
---|---|
Change type | ADD_ROWS |
Change Scope | sales.date = '2024-01-02' |
Added rows dirty (ADD_ROWS_WITH_DIRTY_CONDITION)
Changes
Sometimes you are not able to provide a precise enough condition to capture the added lines.
You can provide a scope which contains older data and precise that the scope you are providing is dirty.
date | product | quantity |
---|---|---|
2024-01-01 | P1 | 10.0 |
2024-01-01 | P2 | 20.0 |
2024-01-01 | P1 | 30.0 |
Change description
The provided scope must describe ALL the added rows but can overlap with some exiting rows.
Table Update Detail | sales |
---|---|
Change type | ADD_ROWS_WITH_DIRTY_CONDITION |
Change Scope | sales.product = 'P1' |
Updated rows (UPDATE_ROWS)
Changes
Update rows change type describes a pure rows updates in your table. Scope condition must use a non updated column to describe the change.
date | product | quantity |
---|---|---|
2024-01-01 | P1 | 10.0 |
2024-01-01 | P2 |
Change description
The provided scope must contain ALL the updated rows.
Table Update Detail | sales |
---|---|
Change type | UPDATE_ROWS |
Change Scope | sales.product = 'P2' |
Removed rows (REMOVE_ROWS)
Changes
Remove rows change type describes a pure rows deletion in your table.
date | product | quantity |
---|---|---|
2024-01-01 | P1 | 10.0 |
Change description
The scope must contain ALL and ONLY the removed rows.
Table Update Detail | sales |
---|---|
Change type | REMOVE_ROWS |
Change Scope | sales.product = 'P2' |
Removed rows dirty (REMOVE_ROWS_WITH_DIRTY_CONDITION)
Changes
Sometimes you are not able to provide a precise enough condition to capture the removed rows. You can provide a scope which contains older data and precise that the scope you are providing is dirty.
date | product | quantity |
---|---|---|
2024-01-01 | P1 | 10.0 |
2024-01-01 | P2 | 10.0 |
Change description
The scope must contain ALL the removed rows but can overlap with some existing rows.
Table Update Detail | sales |
---|---|
Change type | REMOVE_ROWS_WITH_DIRTY_CONDITION |
Change Scope | sales.product = 'P2' |
Mixed changes (MIXED_CHANGES)
Changes
Mixed changes change type describes a change which is composed of various changes (adds, updates, deletions).
date | product | quantity |
---|---|---|
2024-01-01 | P1 | 10.0 |
2024-01-01 | P2 | |
2024-01-02 | P2 | 40.0 |
Change description
Table Update Detail | sales |
---|---|
Change type | MIXED_CHANGES |
Change Scope | sales.product = 'P2' |
Intertwined Changes
If the scope condition uses a column impacted by an update you need to provide the old value and the new value in the scope.
date | product | quantity |
---|---|---|
2024-01-01 | P1 | 10.0 |
2024-01-01 | P3 | 10.0 |
2024-01-01 | 20.0 | |
2024-01-02 | P2 | 40.0 |
Change description
The provided scope must contain ALL the changed rows.
Table Update Detail | sales |
---|---|
Change type | MIXED_CHANGES |
Change Scope | sales.product = 'P2' OR sales.product='P3' |
Dirty table update details often result to less efficient refresh comparing to their clean alternatives.
Incremental refresh plan
For each call to incremental refresh, an incremental refresh plan is computed and logged at level FINE (
com.activeviam.database.sql.*).
The incremental refresh restates the processed change description and shows for each stream the triggered refresh
operations.
For instance, on a cube with only a hierarchy on sales.product
and no aggregate provider, there is only the
hierarchy stream to refresh :
FINE: IncrementalRefreshPlan
Change Descriptions:
TableUpdateDetailInternal(tableName=Sales, changeType=ADD_ROWS, columnCondition=Sales DATE = 2024-01-02 )
TableUpdateDetailInternal(tableName=Products, changeType=ADD_ROWS, columnCondition=Product DATE = 2024-01-02 )
Streams:
// PRODUCT HIERARCHY STREAM
Stream -> SqlPreparedAggregateQuery(query=AggregateQuery(baseTable=Sales, condition=TRUE, aggregations=[SingleFieldAggregatedElement(name=contributors.COUNT)], groupByFields=[AliasedField(alias=PRODUCT, fieldPath=`Sales PRODUCT`)], partitioning=ConstantPartitioning, chunkSize=1024,
queryTags={pivot_name=incremental_refresh_cube, **CATEGORY=HIERARCHY_FEEDING**}),
queryResultFormat=QueryResultFormat(plainRecordFormat=RecordFormat [Fields=[[PRODUCT]]], dictionarizedRecordFormat=RecordFormat [Fields=[[PRODUCT]]], dictionaries={PRODUCT=com.activeviam.tech.dictionaries.internal.impl.NullableDictionary@2446fbcd}))
//INCREMENTAL ADD OPERATION
Operations:
IncrementalAddAggViewOperation(
query=AggregateQuery(baseTable=Sales, condition=`Sales DATE` = 2024-01-02, aggregations=[SingleFieldAggregatedElement(name=contributors.COUNT)], groupByFields=[AliasedField(alias=PRODUCT, fieldPath=`Sales PRODUCT`)], partitioning=ConstantPartitioning, chunkSize=1024,
queryTags={category=incremental_refresh, pivot_name=incremental_refresh_cube}))
You can identify the nature of the stream with the query tag category
( hierarchy_feeding
or aggregate_provider_feeding
).
Full refresh
Sometimes it is not possible to resolve incremental operations to refresh an impacted stream.
DirectQuery connector will execute a full refresh operation and stream a full new data snapshot.
Full refresh operation is expressed in the plan as IncrementalRemoveWhereOperation(condition=TRUE)
.
Full refresh operation are costly, and they should be avoided on aggregate provider stream.
For a given stream, full refresh is triggered when :
- Several change types are expressed in the
TableUpdateDetails
which impact the stream.
- An impacting
TableUpdateDetail
with aRelativeCondition
affects the base store of the stream.
- An impacting
TableUpdateDetail
needs a remove where operation to be applied and the stream does not support such operation ( hierarchy streams, custom list query streams).
- The scope condition is not exploitable to compute a remove where operation.
For instance if an aggregate provider is indexed by the levels Date and Product, the impact condition must be expressed on one of these to allow a remove where operation.
- Impact along optional relationship.