Skip to main content

Split granular data between in-memory datastore and DirectQuery database

Deploying a 100% DirectQuery application is a valid use case, however often it can be a good idea to put part of the granular data in-memory. This page explains when and how.

Composite DirectQuery and in-memory database

The schema can have both DirectQuery tables and in-memory Datastore tables. However, it’s important to note that joins between DirectQuery tables and in-memory ones are not possible. In other words, the in-memory tables can only exist as isolated stores, not directly linked to the base table.

These isolated stores can be used for instance in measures via a Copper join or via get-by-key queries. Having this data in-memory avoids a round-trip to the external database (which includes network calls) when computing the measures. For instance, if some get-by-key queries are executed for a lot of locations in a measure, having a local datastore can make the query a lot faster.

Hybrid distributed mode

A classic use case for DirectQuery is giving access to the historical data of an in-memory Atoti application. Historical data may be too large to fit entirely in-memory or too costly for “cold” data that users rarely access.

For that, it is possible to have Atoti DirectQuery nodes alongside Atoti in-memory nodes in a Distributed set-up and expose it in a unified and consistent data model to the end users.

DirectQuery hybrid distributed model

Note that you don't have to redefine your entire project to add a DirectQuery node: As DirectQuery connectors and the Datastore are abstracted behind the IDatabase interface, the same cube definitions (measures, hierarchies...) can be used for all the nodes.

A usual set-up to split "hot" and "cold" data is to use a date hierarchy:

  • the recent dates are put in-memory because they will receive a lot of queries and might have frequent real-time updates.
  • the older dates are put in an external data warehouse and accessed via DirectQuery because it accessed less frequently and receive rare updates. Part of this data can be pre-aggregated to provide an historical overview pretty quickly.

DirectQuery hybrid data distribution

These parts of the data will be stored in different places:

DirectQuery hybrid data distribution

With such a setup, the size of the in-memory data can stay reasonably small as most of the data stays in the external database. Moreover, queries will be fast enough for interactive analysis:

  • The majority of queries will be on recent dates and hit the in-memory nodes
  • Historical trends queries will hit the DirectQuery partial aggregate providers which are in-memory
  • When users start to drill down on historical data, the queries will be sent to the external database but at this point there should be a scope filtered enough (1 book on 1 date for instance) for the warehouse to answer quickly.