Skip to main content

DirectQuery Local Cache

This page describes the DirectQuery local cache. It is an advanced feature of Atoti DirectQuery which is used to maintain speed of queries by using a local cache for data.

To learn how to use it, check the how to create and use a local cache.

Introduction to DirectQuery Local Cache

Data from an external database is queried by Atoti using DirectQuery. Many Atoti projects have side stores on which they perform Get-by-key queries or List queries to retrieve data in a post-processor (for instance Market Data tables in Market Risk Solution). Several queries can be done on different coordinates in the same MDX query. This triggers multiple queries on the external database. Saturation can occur if thousands or millions of queries are sent to the external database. When this happens, the global MDX query becomes slow.

get-by-key queries without cache

One solution would be to load the whole side table in memory using the composite database, but often these tables are way too large to fit in memory.

The DirectQuery local cache stores data previously retrieved from the external database and makes it available for subsequent queries. This eliminates the need to further query the external database for this data.

get-by-key queries with cache

How the local cache works

  1. A query is used to query the external database
  2. Atoti checks if the data is available in the local cache before retrieving data from the external database.
  3. If the data is already in the local cache, it is used for the query.
  4. If the data is not already in the local cache, a "part" of external table containing the needed data is downloaded into the local cache
  5. Any data in the local cache is available to use for subsequent queries of the external database.
  6. The cache is emptied when it reaches a specified size.

Cache partition

A "part" of the data which can be downloaded as a single block is called a "cache partition". For instance, a simple way to partition the cache can be per date, each date being one partition of the cache.

Cache partitioning 1 dimension on date

A table can also be partitioned on multiple fields for smaller partitions, for instance on date and market dataset. Each combination of date and dataset will be one partition of the cache.

Cache partitioning 2 dimensions on date and dataset

The way to partition the cache is defined in the data model for each cached table.

One important constraint on the partitioning is that the fields used to partition the table must be included in the key fields of the table. This is required so Atoti can infer the cache partition based on the key in a get-by-key query.

Partitioning fields considerations

It is important to choose the correct way to partition the cache. The partitions must be

  • small enough to download quickly and light enough to fit in-memory
  • but large enough to match all the queries that will be sent to the external database in a single MDX query. This sweet spot is different for each project and depends mostly on the nature of the "normal" queries, but also on the size of the table, the number of queries sent, the number of users, etc. The partitions are based on the queries sent to the external database.

Retrieving data from the local cache

The data in the local cache is used by:

  • All get-by-key queries
  • List queries if the cache partition can be inferred from the query's condition
  • Distinct queries if the cache partition can be inferred from the query's condition

Only the queries that are on the fields available in the cache will use it. For instance, if the cache is on one table, a List query retrieving a joined field from another table will not use the cache.

The performance improvements come from 2 factors:

  • The whole cache partition is downloaded in a single query, which is much faster than sending multiple queries to the external database.
  • The local cache is stored even after the query, so the next queries on the same partition will be answered from the local cache directly without going to the external database.

Cache capacity and eviction

The cache is emptied when it reaches a certain capacity. The oldest partitions in the local cache are removed when a threshold is exceeded. There are two available politics to determine when to empty the cache:

  • Number of partitions
  • Number of rows of data

The policy must be correctly chosen to avoid having too many partitions in the cache (which would saturate the memory) but also to have enough room for multiple users to query different partitions at the same time.