Skip to main content

Time-Travel Emulation

Emulated time-travel is a mechanism to solve de-synchronization issues, as explained in the versioning documentation. However, unlike native time-travel, some additional set-up is required on the external database itself. It also makes the queries more complex and slower (typically about 10-30 %).

info

This feature is experimental, and requires this option to be enabled -Dactiveviam.feature.experimental.emulated_time_travel.enabled=true.

External database set-up

The general idea is to add some column(s) in each table requiring time-travel; these column(s) contain meta-data about the row validity.

The data type of these columns is left to the user's choice, but needs to support comparison operators in SQL. Natural ones can be timestamp or integer.

danger

The data type of these columns needs to be the same across all the tables in the database.

Table definition

General case

For each table requiring time-travel, the user needs to add 2 columns, let's call them valid_from and valid_to. These columns hold the range of versions for which the row is valid, where valid_from is inclusive and valid_to exclusive.

Append-only case

For append-only tables, i.e. no updates or deletes are allowed, the user needs to add 1 column, let's call it valid_from. This column states from which version the row is valid, where valid_from is inclusive.

tip

If your table already contains an auto-increment column, then you can use it as valid_from.

Data management

Current version

It is the version to be used for a given transaction, and needs to be increasing over each transaction.

If the data type is a timestamp, then the current version is the timestamp of the transaction.

danger

If you use a timestamp type and an external database with several servers, clock synchronization might be an issue for frequent transactions, as the versions might not be increasing (for instance a transaction handled by a server with a late clock might have a timestamp before an earlier transaction done by a different server).

If the data type is an integer, then a counter needs to be kept to give the current version and incremented after each transaction.

Note that for integers, the current version can be different across different tables.

Insertion

During insertion, the valid_from column is set to the current version and the valid_to column is set to the default value for valid rows. This default value can be set as a maximum value (e.g. 2099-12-31T23:59:59) or null. The first choice might be less readable but is more efficient for queries as it avoids the need to check for null values.

For example, let's consider a table with the following rows:

IdDescriptionvalid_fromvalid_to
1First row12,147,483,647
2Second row12,147,483,647

The current version is 2, so after adding a new row, the table is:

IdDescriptionvalid_fromvalid_to
1First row12,147,483,647
2Second row12,147,483,647
3New row22,147,483,647
info

For an append-only table, everything is the same, except there is no valid_to column.

Time travel emulation methodRecord insertionRecord updateRecord deletion
Valid_fromtickcrosscross
Valid_from and valid_totickticktick

When running a query at version 1, the first 2 rows are valid and returned (the last one is filtered out), but when running a query at version 2 then all rows are valid and returned.

With this mechanism, the equivalent data at version 1 is:

IdDescription
1First row
2Second row

And at version 2:

IdDescription
1First row
2Second row
3New row

Deletion

When a row is being deleted, the valid_to column of the deleted row is set to the current version.

For example, let's consider a table with the following rows:

IdDescriptionvalid_fromvalid_to
1First row12,147,483,647
2Second row12,147,483,647

The current version is 2, so after deleting the second row, the table is:

IdDescriptionvalid_fromvalid_to
1First row12,147,483,647
2Second row12

When running a query at version 1, both rows are valid and returned, but when running a query at version 2 then only the first row is valid and returned, the second row is filtered out.

With this mechanism, the equivalent data at version 1 is:

IdDescription
1First row
2Second row

And at version 2:

IdDescription
1First row

Update

When a row is being updated, the valid_to column of the old row is set to the current version and a new row is inserted with the valid_from column set to the current version.

For example, let's consider a table with the following rows:

IdDescriptionValuevalid_fromvalid_to
1First row100.012,147,483,647
2Second row200.012,147,483,647

The current version is 2, so after updating the first row, the table is:

IdDescriptionValuevalid_fromvalid_to
1First row100.012
2Second row200.012,147,483,647
1First row150.022,147,483,647

When running a query at version 1, the first 2 rows are valid and returned (last row is filtered out), but when running a query at version 2 then only the last 2 rows are valid and returned (first row is filtered out).

With this mechanism, the equivalent data at version 1 is:

IdDescriptionValue
1First row100.0
2Second row200.0

And at version 2:

IdDescriptionValue
1First row150.0
2Second row200.0

Data cleaning

For tables with many updates or deletions, there could be many rows which are no longer valid. To keep the size from increasing indefinitely, it is recommended to clean the data from time to time, by dropping all the rows with valid_to less than a given value (e.g. a timestamp one week in the past).

Current version discovery

Similar to versioning for databases with native time-travel, the current version for each table is queried from the external database (by doing a max over valid_from and valid_to columns). Then this version is used to filter out invalid rows, and this ensures the result of a given query is unchanged, even after a table is updated.

This discovery runs at creation of each new Atoti version.

Atoti set-up

The examples below are shown for ClickHouse, but the code is identical for other databases.

An object EmulatedTimeTravelDescription needs to be created as follows:

final EmulatedTimeTravelDescription timeTravelDescription =
EmulatedTimeTravelDescription.builder()
.emptyTableVersion(0)
.validRowToColumnValue(Integer.MAX_VALUE)
.build();

This object contains the following parameters:

  • emptyTableVersion (required): the version to be inferred for empty tables. A discovery query to an empty table is unable to give any information, while a version is required to filter out rows added later on. Anything less than the current version is acceptable (e.g -1 for integers, or 1970-01-01T00:00:00).

  • validRowToColumnValue (optional - default: null): the value of the valid_to column for valid rows. It could be null or a maximum value (e.g. 2099-12-31T23:59:59). A maximum value is more efficient for queries as it avoids the need to check for null values.

Then this objet should be added to the database settings, along with the optional parameter timeTravelDiscoveryQueryTimeout (which defaults to 1 minute):

final ClickhouseDatabaseSettings databaseSettings =
ClickhouseDatabaseSettings.builder()
.emulatedTimeTravelDescription(timeTravelDescription)
.timeTravelDiscoveryQueryTimeout(Duration.ofMinutes(10))
.build();

Finally, for each table with emulated time-travel, a TableTimeTravelDescription needs to be added to its table description:

final TableDescription salesTableWithTimeTravel =
salesTable.toBuilder()
.emulatedTimeTravel(
EmulatedTimeTravelDescription.TableTimeTravelDescription.table(
"valid_from", "valid_to"))
.build();