Versioning
Versioning on a DirectQuery connector is less powerful than on a Datastore.
Old versions access
All external databases do not support native versioning (i.e. time travel).
For these databases you can only query the last snapshot of the data.
Non-time travel databases are subject to de-synchronization issue (see heading below).
Database | Time travel Support |
---|---|
Amazon Redshift | |
ClickHouse | |
Databricks | (not on views) |
Dremio | |
Google BigQuery | |
Microsoft Azure Synapse | |
Microsoft SQL Server | |
Snowflake | (not on views except dynamic table) |
By default, time travel is enabled on the databases supporting it. However it is possible to disable it in the Database Settings when building the application.
- BigQuery
- Databricks
- Snowflake
final BigqueryDatabaseSettings databaseSettings =
BigqueryDatabaseSettings.builder().useTimeTravel(true).build();
There are some specificities for Databricks as Databricks view do not support time-travel queries. By default, if some tables in the schema are views, time-travel is disabled, but can be activated for mixed data model containing both tables and views, at the risk of data desynchronization as queries on views might return newly inserted/updated data.
The supported settings are disabled, strict to throw an exception if the data model contains a view, or lax to always allow time-travel (views will simply be queried without time-travel).
final DatabricksDatabaseSettings databaseSettings =
DatabricksDatabaseSettings.builder().timeTravelPolicy(TimeTravelPolicy.LAX).build();
final SnowflakeDatabaseSettings databaseSettings =
SnowflakeDatabaseSettings.builder().useTimeTravel(false).build();
final Application app =
Application.builder(connector)
.schema(schema)
.managerDescription(managerDescription)
.databaseSettings(databaseSettings)
.build();
Time travel discoveries
DirectQuery will run discovery queries on the database to gather the information about the last modification on the tables. In theory, these queries should be fast as they just request some metadata. However, they can sometimes be slow on some external databases and they can even timeout. It is possible to set a specific timeout for the time travel discovery queries to avoid this issue:
- BigQuery
- Databricks
- Snowflake
final BigqueryDatabaseSettings databaseSettings =
BigqueryDatabaseSettings.builder()
.timeTravelDiscoveryQueryTimeout(Duration.ofMinutes(1))
.build();
final DatabricksDatabaseSettings databaseSettings =
DatabricksDatabaseSettings.builder()
.timeTravelDiscoveryQueryTimeout(Duration.ofMinutes(1))
.build();
final SnowflakeDatabaseSettings databaseSettings =
SnowflakeDatabaseSettings.builder()
.timeTravelDiscoveryQueryTimeout(Duration.ofMinutes(1))
.build();
final Application app =
Application.builder(connector)
.schema(schema)
.managerDescription(managerDescription)
.databaseSettings(databaseSettings)
.build();
Data refresh
Unlike the datastore, external databases are not able to push data change event.
The user must warn the DirectQuery connector that a new version of the data is available.
There is two ways to do this :
- If you know the scope of your changes you can use an Incremental Refresh.
- If you do not know the scope of the change, you can trigger a full refresh.
app.fullRefresh();
A full refresh is a costly operation because it implies to rebuild all hierarchies and aggregate providers.
During these operations, the cube is still queryable.
De-synchronization issue with non-time travel databases
Atoti cube leverages database streams to maintain some versioned data structures:
- Hierarchy member tree
- Aggregate providers
Aggregate providers are used to store already computed primitive measures.
De-synchonization between Atoti data structures and the external database
On a database that does not support time-travel, on data change and before the refresh of the application, aggregate providers are
not up-to-date.
So the measures computed from the primitive measures of aggregate provider are outdated too.
However, measures which are not computed from aggregate provider measures (i.e. computed by a database query) are
up-to-date.
This could lead to inconsistent results if you are mixing up outdated and up-to-date measures.
For instance, let's take an initial version V1 and an update in the database changing the data to a new version
V2.
Before triggering the refresh in Atoti application, the aggregate provider is still in state V1 but the database
connector has no more access to the V1 data.
The connector will return the data from the current state of the database (which is V2).
De-synchonization between Atoti data structures themselves
The previous problem can also occur between the different Atoti data structures on databases that do not support time-travel. For instance, as the queries to feed a specific aggregate provider and the queries to feed other data structures (other aggregate providers or hierarchies) are not executed at the exact same time, they can retrieve data from the external database that do not match if there was an update in-between.
When starting-up or refreshing an application, one should be careful not to update the data in the external database at the same time if time-travel is not used.
Branching
DirectQuery databases does not support branching.