Snowflake

Time-travel support

Snowflake supports time-travel, which means that querying the past is possible and queries stay consistent if the data in Snowflake is updated.
When creating new epochs for the Database, DirectQuery automatically snapshots the current timestamp and use it. Querying past snapshots is done by using the AT TIMESTAMP SQL syntax. As such, when the historical data becomes unavailable, this is reflected in DirectQuery and queries against those old versions will fail. For views, Atoti will retrieve all the underlying tables, and use their latest timestamp to query the view. The logger atoti.server.directquery.query_resolution.time_travel is useful to retrieve the queries generated specifically for time-travel.

Vector supports

Only multi-rows vectors and multi-columns vectors are supported by Snowflake. The array type available in Snowflake does not offer powerful aggregation functions to work with them.

Gotchas

Nullable fields

Snowflake can define fields with nullable types. DirectQuery is capable of detecting this and defines its internal model accordingly.
While this is not a problem in itself, it can conflict with the rule in Atoti cubes that all levels must be based on non-nullable values. This does not create an issue of any sort as the local model is updated behind the scene, assigning a default value based on the type.
It has a side effect on the query performance, as DirectQuery must convert on-the-fly null values to their assigned default values, as well as adding extra conditions during joins to handle null values on both sides.

Connection pool configuration

DirectQuery uses the HikariCP connection pool to manage connections to Snowflake. It is possible to tune this connection pool by specifying a configuration file and providing its path with the system property hikaricp.configurationFile (via -Dhikaricp.configurationFile=path/to/hikari.properties). Not all the parameters can be set that way, as some are set directly by DirectQuery. The associated file format is:

maximumPoolSize=31
idleTimeout=15000
minimumIdle=17

Pool size

maximumPoolSize: if not set, the pool size is configured automatically by the DirectQuery connector, based on the number of CPU cores available.

Connection timeout

idleTimeout: the maximum time a connection can be idle before being closed. This parameter is not set by DirectQuery and defaults to the Hikari default.

Minimum idle connections

minimumIdle: the minimum number of idle connections that the pool will maintain. This parameter is not set by DirectQuery and defaults to the Hikari default.

Feeding warehouse

On modern cloud distributed databases such as Snowflake, a key design point is the separation of compute and storage. This allows to use different computing resources to process queries and brings flexibility and the capability to scale up and down the resources (and the associated bill!). On Snowflake, the compute resources are called warehouses. A bigger warehouse will process queries faster but will also cost more. The load on the external database of a DirectQuery application is quite particular:

At the application startup and refresh, many queries are performed on the database to initialize the cube and cache data in it (the hierarchies and aggregate providers).
After the initial feeding, if the aggregate providers have been chosen wisely, most user queries will hit the cache data and actually never been run on the external database. Additionally, queries hitting the external database should most of the time be drill-down with a very reduced scope and should be easier to handle by Snowflake.

In order to have faster startup time, it is recommended to use bigger computational resources during that time. Afterward, to save on cost, the computational resources can be scaled down. This can be done in Snowflake UI, but DirectQuery provides a way to define two warehouses to use for your application:

one powerful warehouse, used during initial feeding and refresh, to guarantee faster startup and refresh times
a regular warehouse, to serve the queries after the initial feeding in a cost-efficient manner

After the initial feeding, the large warehouse will become idle and be automatically shut down if you configured it as such. This feeding warehouse can be defined as such:

final SnowflakeProperties properties =
    SnowflakeProperties.builder()
        .connectionString(connectionString)
        .warehouse("SMALL_AND_CHEAP_WAREHOUSE_NAME")
        .feedingWarehouse("BIG_AND_EXPENSIVE_WAREHOUSE_NAME")
        .additionalOption(SFSessionProperty.PASSWORD.getPropertyKey(), "your-plain-password")
        .build();

You can then pass it as argument to your session:

final DirectQueryConnector<?> connector =
    SnowflakeConnectorFactory.INSTANCE.createConnector(clientSettings);

If it is not defined, the regular warehouse will be used for the initial feeding.

Releases and upgrades

Getting started

Concepts

Step 1: Bring your data into Atoti

Step 2: Design your Cube

Step 3: Build an Atoti application

Step 4: Query an Atoti application

Configure Atoti

Secure Atoti

Operate Atoti

Reference

Time-travel support

Vector supports

Gotchas

Nullable fields

Connection pool configuration

Pool size

Connection timeout

Minimum idle connections

Feeding warehouse

​Time-travel support

​Vector supports

​Gotchas

​Nullable fields

​Connection pool configuration

​Pool size

​Connection timeout

​Minimum idle connections

​Feeding warehouse

Time-travel support

Vector supports

Gotchas

Nullable fields

Connection pool configuration

Pool size

Connection timeout

Minimum idle connections

Feeding warehouse