Skip to main content

Versioning

Versioning on a DirectQuery connector is less powerful than on a Datastore.

Old versions access

Not all external databases support native versioning (i.e. time travel). For these databases you can only query the last snapshot of the data.
Non-time travel databases are subject to de-synchronization issues (see heading below).

info

To avoid such issues, it's possible to set up a mechanism to emulate time-travel.

DatabaseNative Time travel SupportEmulated Time travel Support
Amazon Redshiftticktick
ClickHouseticktick
Databrickstick (not on views)tick
Dremioticktick
Google BigQueryticktick
Microsoft Azure Synapseticktick
Microsoft SQL Serverticktick
Snowflaketick (not on views except dynamic table)tick

By default, time travel is enabled on the databases supporting it. However it is possible to disable it in the Database Settings when building the application.

final BigqueryDatabaseSettings databaseSettings =
BigqueryDatabaseSettings.builder().useNativeTimeTravel(true).build();

final Application app =
Application.builder(connector)
.schema(schema)
.managerDescription(managerDescription)
.databaseSettings(databaseSettings)
.build();

Time travel discoveries

DirectQuery will run discovery queries on the database to gather the information about the last modification on the tables. In theory, these queries should be fast as they just request some metadata. However, they can sometimes be slow on some external databases and they can even timeout. It is possible to set a specific timeout for the time travel discovery queries to avoid this issue:

final BigqueryDatabaseSettings databaseSettings =
BigqueryDatabaseSettings.builder()
.timeTravelDiscoveryQueryTimeout(Duration.ofMinutes(1))
.build();

final Application app =
Application.builder(connector)
.schema(schema)
.managerDescription(managerDescription)
.databaseSettings(databaseSettings)
.build();

Data refresh

Unlike the datastore, external databases are not able to push data change event.
The user must warn the DirectQuery connector that a new version of the data is available.

There are two ways to do this :

  • If you do not know the scope of the change, you can trigger a full refresh.

app.refresh();

caution

A full refresh is a costly operation because it implies to rebuild all hierarchies and aggregate providers.

info

During these operations, the cube is still queryable.

De-synchronization issue with non-time travel databases

Atoti cube leverages database streams to maintain some versioned data structures:

  • Hierarchy member tree
  • Aggregate providers

Aggregate providers are used to store already computed primitive measures.

De-synchronization between Atoti data structures and the external database

On a database that does not support time-travel, on data change and before the refresh of the application, aggregate providers are not up-to-date.
So the measures computed from the primitive measures of aggregate provider are outdated too.
However, measures which are not computed from aggregate provider measures (i.e. computed by a database query) are up-to-date.
This could lead to inconsistent results if you are mixing up outdated and up-to-date measures.

For instance, let's take an initial version V1 and an update in the database changing the data to a new version V2.
Before triggering the refresh in Atoti application, the aggregate provider is still in state V1 but the database connector has no more access to the V1 data.
The connector will return the data from the current state of the database (which is V2).

De-synchronization between Atoti data structures themselves

The previous problem can also occur between the different Atoti data structures on databases that do not support time-travel. For instance, as the queries to feed a specific aggregate provider and the queries to feed other data structures (other aggregate providers or hierarchies) are not executed at the exact same time, they can retrieve data from the external database that do not match if there was an update in-between.

caution

When starting-up or refreshing an application, one should be careful not to update the data in the external database at the same time if time-travel is not used.

Branching

DirectQuery databases do not support branching.