ClickHouse Database
This section explains how to connect to a ClickHouse Database.
Overall Sequence
To connect to a remote Database using DirectQuery you need to complete these steps:
- Configure your remote Database in FRTB's expected Database format
- Set FRTB configuration properties
- Deploy FRTB in Horizontal Distribution
- One JVM which consists of a query node and in-memory data node
- Another JVM which consists of a DirectQuery data node
Database Schema
Your remote Database must be configured in the same format as FRTB's in-memory datastores. This means the same Tables and Field Types will need to be replicated. You can do this by defining the Tables as outlined in the Database documentation or by exposing Views on your Database in the same expected Database format.note
All of the Database Tables must be present either as actual Tables or Views on your connected Database.
Required DirectQuery Properties
The following properties must be configured to get started with DirectQuery.Maven Profile
The frtb-directquery module is not included in the classpath by default. You will need to enable thedirect-query
Maven Profile to add the frtb-directquery
module into the classpath. This will need to be completed when building the JAR file.
Disable IMA
For this preview implementation, only SA is supported when running with DirectQuery. To disable IMA, add the following property either to the frtb.properties file or as a command-line argument with the prefix -D as shown:frtb.properties:
ima.disable=true
Command-line argument:
-Dima.disable=true
ClickHouse Configuration
The application.yaml file contains specific properties for enabling and connecting to a remote Database.# ClickHouse Database Connection Parameters
directquery:
enabled: false
database:
type: clickhouse
clickhouse:
username: CLICKHOUSE_USERNAME
password: CLICKHOUSE_PASSWORD
port: PORT
hostName: HOST
database: DATABASE
schema: DATABASE_SCHEMA
The above properties populate the ClickhouseSpringProperties
java class.
username
Full property name: directquery.database.clickhouse.username
The username associated with the database.
password
Full property name: directquery.database.clickhouse.password
The password associated with the username.
port
Full property name: directquery.database.clickhouse.port
The port where the ClickHouse instance is located.
hostName
Full property name: directquery.database.clickhouse.hostName
The name of the host where the ClickHouse application is running.
database
Full property name: directquery.database.clickhouse.database
The name of the ClickHouse Database.
schema
Full property name: directquery.database.clickhouse.schema
The name of the ClickHouse Schema to use.
Deployment Options
We provide two main options to run with DirectQuery:
Option | How to run FRTB |
---|---|
Operate with some data loaded in-memory and the rest available through DirectQuery | In Horizontal Distribution with in-memory Database |
Run purely on DirectQuery remote data | In a Single JVM on DirectQuery only |
Horizontal Distribution
In Horizontal Distribution you have access to the in-memory tools such as What-If, data updates and Sign-Off for the data loaded in-memory as well as access to a large number of historical dates.
When running in Horizontal Distribution you need to run two JVMs:
note
When running in horizontal distribution, the StandardisedApproachCube will only contain data from the in-memory data node. To view the DirectQuery data, you will have to use the CombinedCube. This is because the JVM with DirectQuery data will get attached to the query node under the CombinedCube and thus the DirectQuery data is only visible in the CombinedCube.
initial-load.business-dates
For both JVMs, set the initial-load.business-dates
property located inside frtb.properties
.
Given that the data nodes are distributed by AsOfDate, no two data nodes can contain the same Partition -
meaning that one AsOfDate cannot be present in another node. So to prevent any issues the initial-load.business-dates
property is used to set which dates
to load into the in-memory data node, while also specifying to the DirectQuery data node which AsOfDates to exclude.
note
This property must be provided with the same values to both data nodes.
JVM With Query Node And In-Memory Data Node
This JVM consists of two distributed nodes; a query node and an in-memory data node.
Start the JVM by specifying the following parameters to the FRTB application, either in a .properties file or through command-line
arguments (add -D
before each property).
# Disable IMA
ima.disable=true
# Set FRTB to run as a query node and data node
spring.profiles.active=forceNetty
activeviam.distribution.gossip.router.enable=true
activeviam.distribution.gossip.router.port=16484
JVM With DirectQuery Data Node
Once the JVM with the query and in-memory data node is running, you can start a second JVM with the DirectQuery data node with the following configuration
properties either in a .properties file or through command-line arguments (add -D
before each property).
# Set FRTB to run as a data node and to connect to the second JVM
spring.profiles.active=dist-data-node
# Use a different port than the second JVM
server.port=8081
# Use a different Content Server for this JVM
content-service.db.url=jdbc:h2:mem:content_service;DB_CLOSE_DELAY=-1
content-service.db.hibernate.hbm2ddl.auto=create
# Disable IMA in order to use DirectQuery
ima.disable=true
# Enable and configure the ClickHouse DirectQuery Database
directquery.enabled=true
directquery.database.type=clickhouse
directquery.database.clickhouse.username=database_username
directquery.database.clickhouse.password=database_password
directquery.database.clickhouse.port=port
directquery.database.clickhouse.hostName=host_name
directquery.database.clickhouse.database=database
directquery.database.clickhouse.schema=database_schema
Single JVM
You can run a single JVM consisting of only DirectQuery.
By running in a single JVM with DirectQuery only, you can now see the DirectQuery data in the StandardisedApproachCube.
Properties
To run the Single JVM node we will only need to configure the ClickHouse Properties and Disable IMA and run the application as normal. There are no distributed nodes to configure when running in a single JVM.
Here is an example of the properties to use:
# Disable IMA in order to use DirectQuery
ima.disable=true
# Enable and configure the ClickHouse DirectQuery Database
directquery.enabled=true
directquery.database.type=clickhouse
directquery.database.clickhouse.username=database_username
directquery.database.clickhouse.password=database_password
directquery.database.clickhouse.port=port
directquery.database.clickhouse.hostName=host_name
directquery.database.clickhouse.database=database
directquery.database.clickhouse.schema=database_schema
Single JVM Limitations
By running with a single JVM we will be running purely on DirectQuery. This will come at the compromise of query performance of Trade level queries while also not having access to in-memory only tools such as WhatIf and SignOff.
note
Running both an in-memory data node and DirectQuery data node under a single JVM is not currently supported.