Skip to main content

What-If

The What-If feature allows users to simulate a change of a subset of their data without modifying the production data. For instance, a user might want to change risk data and configuration parameters to evaluate their impact on the capital.

As capital is a non-linear calculation, it can be difficult to predict the impact of a small change simply with a mind experiment. At this point, the user might need to recompute the capital with the modified data to evaluate the impact of the changes.

What-If allows the user to visualize this impact by creating an experimental version, called a branch, and specify the required data changes. The production data (the master branch) remains intact, so other users still see the production data. For the user experimenting with "what-if" changes, switching back to production data is simple, and the results of the experiment can be shared easily.

What-If is also available in a distributed setup.

This feature is not about distributed transactions, which would allow you to create a branch from a query cube and commit data to each data cube.

Principle

Starting a simulation in a data cube creates a new branch.
If it is connected to a cluster with a query node, the latter detects the new branch and also creates a new version, which targets the branch with the corresponding name in the data cube. If another data cube creates another scenario, another branch appears in the query cube, and so on. The query cube's branches are the union of all the data cubes' branches.

To achieve this, the Distribution layer creates a new data node for each branch in the data cube, and exposes all the proxy data nodes to the query cubes.

Please be aware that this induces more network traffic within your application.

Querying for Distributed Branches

A query cube that has detected several branches applies the following logic when treating a query:

  • if the branch (proxy data node) exists in the data cube, the query is performed on that branch.
  • if the branch does not exist in the data cube, the query is perform on the master branch.

For example, query cube Q has two underlying data cubes: A and B. Data cube A defines the branches master and scenario. Data cube B has branches master and test. They contain the following data:

CurrencyCube A masterCube A scenarioCube B masterCube B test
Grand total1213221
BTC11
EUR1361
USD24
JPY5
GBP87
CHF9

Running the query for branch scenario on the query cube results in the following table:

CurrencyMeasure
Grand total35
BTC1
EUR9
USD4
JPY5
GBP7
CHF9

The following changes occur:

  • EUR is modified in branch scenario, so the value is the aggregation of this modified value with the unmodified value from cube B.
  • JPY, present in data cube A for branch scenario, is present in the result. This currency is expected to appear in the query if in the scenario, the company acquires some JPY.
  • GBP, only present in data cube B for branch master, has a value of 7 because it does not exist in data cube A for the branch scenario.
  • BTC's value did not change in the branch scenario, and the result's value stays the same.

Configuration

To activate the epoch dimension in a query node, you have to enable it in the query cube description like this:

builder
.asQueryCube()
.withClusterDefinition()
...
.end()
.withEpochDimension()
...

The description looks like this:

return StartBuilding.cube(CUBE_NAME)
.withContributorsCount()
.withinFolder("Native_measures")
.withAlias("Distributed_Count")
.withFormatter(INT_FORMATTER)
.withUpdateTimestamp()
.withinFolder(NATIVE_MEASURES)
.withAlias("Distributed_Timestamp")
.withFormatter(TIMESTAMP_FORMATTER)
.asQueryCube()
.withClusterDefinition()
.withClusterId(CLUSTER_ID)
.withMessengerDefinition()
.withProtocolPath("jgroups-protocols/protocol-tcp.xml")
.end()
.withApplication(APPLICATION_ID)
.withDistributingFields("AsOfDate")
.end()