Skip to main content

Joining isolated stores in Copper

When building an ActivePivot application, it is common practice to only have a portion of the data available in the datastore added to the hierarchies/levels of the cube.
You may want to access such data, especially for measures. This can be done automatically when the requested data is included in the selection of the Cube. However, some data can be added to the datastore and be isolated from the Selection of a cube.
The Copper API provides a comprehensive mechanism to define and publish measures/hierarchies using the records from any store of a datastore.

We advise you to read the following article of the Copper documentation before proceeding as it contains important concepts relative to the Copper API.

What does "Join operation" mean in Copper?

Join operations in copper provide a high-level abstraction for accessing data from any store of the datastore. This allows you to define a relationship between the members of one or more levels of the cube and the records of the datastore's fields.
Once the relationship between the cube and the datastore is defined, it is possible to define both single-level hierarchies and measures from the data obtained from the isolated store.

Let's consider this simple case:

  • The cube contains the User hierarchy, with the "users" level.
  • The datastore contains a hobbies store, which is isolated from the selection and contains hobby related data:
User/users
ALL
marion
nathan
roger
lukas
hannah
userName (K)hobbybudgetcategory
mariondance1.0art
rogertennis2.0sport
nathantennis2.0sport
lukaspiano4.0art
romainpiano4.0art

Using join operations with Copper allows you to define and use measures and/or hierarchies based on the records of the hobbies store, almost identically to standard measures.

Defining a join operation

Before creating measures and/or hierarchies based on data coming from records of isolated stores, it is necessary to define the relation between the ActivePivot cube and the joined stores: All the information regarding a join operation is defined in a CopperStore object, which is reused later for defining our measures and hierarchies.

The entry point of the Copper API to create a CopperStore is the following call:

final CopperStore store = Copper.store("joinedStoreName").joinToCube();

This code above defines a join operation with a default join type and without mapping. We'll explain the meaning of the italicized terms in the next sections.

As mentioned before, a CopperStore contains all the required information to perform join operations.

For a given Pivot instance, a join Operation will be identified by the name of joined store, the type of join and the cube-to-store mapping.

Copper Join Types

In terms of classic relational databases, JOIN clauses are often prefixed with types that modify the predicate required for the query to return a result.

The Copper API currently supports INNER and LEFT JoinType joins.

The default behavior of the Copper API is the INNER JOIN, but the Join Type performed by a join operation can be changed in the declaration of the CopperStore:

final CopperStore store = Copper.store("joinedStoreName").joinToCube(JoinType type);

Supported JOIN TYPES :

  • INNER JOIN : The join operation returns a result IF there are matching records (according to the operation's mapping) in both the datastore and the joined levels.

  • LEFT JOIN : The join operation returns a result if there are matching records (according to the operation's mapping) in the datastore and the joined levels OR if a location does not have a matching record in the datastore. In that case, the join operation on this specific location returns a value corresponding to the default value of the joined store's field. You can also specify a specific default value for the returned field by using the withDefaultValue() method.

Join mapping

Whe defining Join operations in classic database systems, such as SQL, you must specify the ON clause in order to declare between which columns the join is performed.

A similar operation can be done in the copper API by performing calls of the withMapping(String fieldName) method on a CopperStore.
The returned CopperStore corresponds to a Join operation with an ON clause between the fieldName fields of the joined store and the fieldName level of the Cube.
If the joined fields and the joined cube levels do not have the same name, you can call one of the other signatures of the withMapping() function:

CopperStore storeWithMapping = store.withMapping("fieldAndLevelName");

CopperStore storeWithMapping = store.withMapping("fieldName", "levelName");

CopperStore storeWithMapping = store.withMapping("fieldName", Copper.level("levelName"));

The Copper API support chained calls on this method, meaning the joined store can be joined to the cube data through multiple fields at once.

Looking at the example we mentioned earlier, the natural choice for a join mapping would be to map the "users" level to the "userName" field of the store, with an INNER type join.the

CopperStore storeWithMapping = Copper.store("hobbies")
.joinToCube()
.withMapping("userName", "users");

The defined copper store contains the join operation between the cube and the isolated store:

User/users
ALL
marion
nathan
roger
lukas
hannah
userName (K)hobbybudgetcategory
mariondance1.0art
rogertennis2.0sport
nathantennis2.0sport
lukaspiano4.0art
romainpiano4.0art

We see that the [hannah] member from the cube does not have a corresponding record in the records, and that the datastore entry "Romain" does not correspond to a location in the cube.
Since we're using an INNER JOIN , querying a join-related measure on the location [hannah] returns a null value.
Similarly, there's no cube coordinate corresponding to the Romain data, so it isn't queryable.

What if the mapping I want to use does not correspond to a simple one-to-one relationship in my data

In the previous case, there was an obvious bijective relationship between the records of the store and the mapped levels, as the mapped field (userName) was the primary key of the hobbies store, meaning a user could have only one hobby.

If a user has several hobbies, we have to change the primary key of the "hobbies" store, and consider the (userName, hobby) tuple as the primary key of the store "hobbies".
Let's consider our new scenario:

User/users
ALL
marion
nathan
roger
lukas
hannah
userName (K)hobby(K)budgetcategory
mariondance1.0art
rogertennis2.0sport
nathantennis2.0sport
lukaspiano4.0art
lukassoccer3.0sport
romainpiano4.0art

Lukas now has two hobbies, and if we ever want to define a measure on the data of the isolated store, the [Lukas] location is problematic: we're currently in a 'one to many' scenario.

When the user-defined mapping of a Join operation cannot be translated to a (one/many)-to-one relation between the members and the store facts, the cube WILL be enriched with the necessary single-level hierarchies to obtain this relationship

This means that when using a join operation on a store:

Copper will enrich the cube with single-level hierarchies corresponding to the fields in the primary key of the store that are not included in the mapping of the join operation.

For our previous use case, this means that in order to restore a one-to-one relationship between cube locations and store records, the hobby analysis hierarchy is published, its members populated by the values of the field:

User/usershobby/hobby
(ALL)
(ALL)
dance
tennis
soccer
piano
marion
(ALL)
dance
tennis
soccer
piano
nathan
......
userName (K)hobby(K)budgetcategory
mariondance1.0art
rogertennis2.0sport
nathantennis2.0sport
lukaspiano4.0art
lukassoccer3.0sport
romainpiano4.0art

With the new hobby analysis hierarchy, we see that we are now able to select one record per location in the cube.

Hierarchies implicitly created by the join operation can be customized. This will be explained in further depth under Publishing a join-based hierarchy.

Joining without mapping

In the first snippet defining a CopperStore:

final CopperStore store = Copper.store("joinedStoreName").joinToCube();

The join operation is declared without any call to withMapping().

In that case, the defined operation is a CROSS JOIN. Following the rule of publication of required hierarchies, the cube is enriched with single-level hierarchies corresponding to the key fields of the joined store.
If the joined store is an append-only store (a store without a primary key), all the fields are considered as part of the effective primary key of the store.

Using a join operation

Once the join operation is defined, the corresponding CopperStore can be used to create and publish measures and/or hierarchies that are able to fetch data from the joined store.

Defining join operations does not affect the cube or the stores as long as no measure/hierarchy making use of this join operation is published.

Publishing a join-based hierarchy

In order to navigate through the data of a joined store, you may want to define new hierarchies based on the content of a given field of the store.
This can be done through the following:

final CopperStore store = Copper.store("joinedStoreName").joinToCube();
Copper.newHierarchy("hierarchyName")
.fromStore(store)
.withLevel("levelName", FieldPath.of("fieldName"))
.publish(context);

This explicitly publishes a single-level hierarchy in the cube "hierarchyName". This hierarchy is an analysis hierarchy with its members matching the records of the "fieldName" field in the "joinedStoreName" store.

The created hierarchy can be configured similarly to a single-level hierarchy. For more information, see here.

As mentioned before, publishing a join-related hierarchy may lead to the implicit publication of other hierarchies.
The implicitly created hierarchy can be configured as well, by explicitly publishing it, like in the previous snippet.

Publishing a join-based measure

Once a join operation is defined, you can also declare measures that will use data coming from the joined stores. The Copper API includes two types of join-related measures:

  • Lookup measures: allow to inspect a specific data value from the records of the joined store.
  • Aggregated measures: perform a given aggregation on the values of a given field of the joined store.

Lookup measures

A lookup measure is a Copper measure referencing the records of the field of a joined store, which can then be used in calculations. This measure returns the values contained in the field's datastore records if at least a minimum set of required levels is expressed in the queried location. Otherwise it returns a null value.

As a basic example, let us consider a base store containing basic trade data and an isolated Rates store containing rates data between currencies :

TradeID (k)Currency
1EUR
2USD
3EUR
4USD
Base cur.(k)Target cur.(k)Rate
EUREUR1
EURUSD1.101
USDEUR0.91
USDUSD1

If we consider that the "TradeID" and "Currency" fields of the "Trades" store have been added to the cube, there exist an obvious join relationship between a level of the cube (Currency) and one the "Base cur." field of the Rates store. We can define our join operation this way:

 ```java
CopperStore rateStore = Copper.store("Rates").joinToCube().withMapping("Base cur.", "Currency");
```

Then you can define a lookup measure corresponding to the values or the "Rate" field of the Rates store, and publish it to the cube:

    Copper.newLookupMeasure(rateStore.field("Rate")).withName("rate.value").publish(context);

Now the measure rate.value exists and returns the content of the Rate field for queried locations.

However, rate.value makes sense only when both a Currency AND a Target currency are expressed (it's in the key fields of the joined store). Therefore, a single-level hierarchy corresponding to the target currency members was added to the cube to be able to query rate.value.

Here are the results of querying rate.value for several locations:

TradeIDCurrencyTarget cur.rate.value
(ALL)(ALL)(ALL)null
(ALL)EUR(ALL)null
(ALL)(ALL)USDnull
(ALL)EURUSD1.101
(ALL)USDUSD1
1(ALL)(ALL)null
1(ALL)EURnull
1USDEUR0.91
1EUREUR1

Aggregated measures

Aggregated measures based on fields of a joined store can be defined by calling Copper.agg(CopperStoreField field, String aggregationPluginKey). This measure will return the values contained in the field's datastore records aggregated following the logic of the provided aggregate function. Unlike lookup measures, an aggregated measure returns values even if a level mapped to a key of the store is not expressed in the query.

As a basic example, let us consider a base store containing product prices and an isolated store containing our competitors prices:

Product store :(k) - key field

Id (k)Price
110.0
25.0
37.0
41.0

Competitor store: (k) - key field

ProductID (k)Competitor (k)CompetitorPrice
1A10.0
1B11.0
1C15.0
2A4.0
2B5.5
2C6.5
3A6.0
3B8.0
4C1.2

Let us consider a cube with a single level hierarchy based on the field 'Id'. We want the competitor price average but there is a one-to-many relationship between the Product store and the Competitor store, thus, we cannot define a regular reference in our datastore configuration to compute for instance the average price of our competitors.
However, this relationship can be resolved by Copper on-the-fly, at query time thanks to a join operation:

final CopperStore competitorStore =
Copper.store("CompetitorStore").joinToCube().withMapping("ProductID", "Id");

Then, it is possible to define a measure corresponding to the average price values of the competitors and publish it to the cube :

Copper.avg(competitorStore.field("CompetitorPrice")).as("competitorAvgPrice").publish(context);

Now the measure competitorAvgPrice exists and can be queried for every location.

IdcompetitorAvgPrice
(ALL)7.467
112.0
25.33
37.0
41.2

Even for the newly published hierarchy CompetitorID from the joined store key field Competitor :

CompetitorcompetitorAvgPrice
(ALL)7.467
A6.667
B8.167
C7.567

Warning : You will only see the Competitors whose Products are also in the Product store.

Please note that an aggregated measure is calculated from the base store, meaning that here the average competitor price is weighted by the number of times that each product appeared in the Product store, e.g. if Product 1 appeared twice in the Product Store, each line of Product 1 in the Competitor store would be counted twice in the calculation of the average competitor price. However, in this convenient example, we are able to use Copper to calculate the actual average competitor price because each product appears once in the Product store and at most once per competitor.

Advanced topics

Multiple join

With the Copper API it is possible to join the same store of the datastore to the same cube with multiple join operations, as long as the join operations, defined by the CopperStore object are different.

This means the same stores can be joined differently inside the same cube, for example by using different fields in the mapping, or simply by using a different join type :

final CopperStore STORE_BUY =
Copper.store("rates").joinToCube(JoinType.INNER).withMapping("currency", "buy_currency");
final CopperStore STORE_SELL =
Copper.store("rates").joinToCube(JoinType.INNER).withMapping("currency", "sell_currency");
final CopperStore STORE_BUY_LEFT = Copper.store("rates")
.joinToCube(JoinType.LEFT)
.withDefaultValue(FieldPath.of("rate"), 1.0)
.withDefaultValue(FieldPath.of("bank_change_commission"), 0.1)
.withMapping("currency", "buy_currency");
final CopperStore STORE_SELL_LEFT = Copper.store("rates")
.joinToCube(JoinType.LEFT)
.withDefaultValue(FieldPath.of("rate"), 1.0)
.withDefaultValue(FieldPath.of("bank_change_commission"), 0.05)
.withMapping("currency", "sell_currency");

The previous block showcases four different join operations between a FOREX "rates" store and a cube (Different join types and different mappings ). Each of the CopperStore instances created can be used independently to define its own measures and analysis hierarchies.

Note that the computations will also be independently created and performed. Therefore, querying measures requiring the execution of multiple joins will impact the join computation time with linear cost.

Using multiple stores with Datastore references

It is possible to access data from multiple isolated stores if they are linked together with datastore references:

If the datastore description declares a reference between two isolated stores, as represented below:

.withReference(
ReferenceDescription.builder()
.fromStore("hobbies")
.toStore("hobbyDetails")
.withName(HOBBIES_TO_HOBBY_DETAILS)
.withMapping("hobby", "hobby")
.build())

It will be then possible to use the fields of the referenced store when defining join operations, by using the field's expression rather than using its name:

  Copper.newLookupMeasure(Copper.store("hobbies").joinToCube(JoinType.INNER)
.withMapping("name")
.field(HOBBIES_TO_HOBBY_DETAILS + IDatastoreSchema.PATH_SEPARATOR + referencedField))
.as("referencedField")

Performance considerations about join operations

Copper joins allow the user to seamlessly define a many-to-many relationship between the cube facts and the records of a store isolated from the selection.

The user should be aware that querying a measure using data from the joined store will require the computation of the join operation at query time, as the many-to-many relationship must be resolved to return the corresponding data, since it is not materialized in the stores.

This means that using a join-related measure will have an impact on the performance of the query. That impact corresponds to solving the many-to-many relationship during the query runtime, and is proportional to the amount of queried locations in the largest partitions of the cube's aggregate provider, multiplied by the size of the largest partition of the isolated store.

In a scenario where both the aggregate provider and the isolated store's partitions are large and of equivalent size, the time spent resolving the join may be large enough to significantly impact the overall query computation time.

A way to alleviate this cost is to improve the granularity of the common partitioning between the aggregate provider and the partitioning of the joined store, as the resolution of the join is performed asynchronously on each common partition between the cube and the isolated store.

If the query performance remains an issue after these optimizations, the "flattening" of the many-to-many joined data (i.e the full materialization of the existing many-to-many combinations inside the base store schema) is necessary, in which case the join operation is no longer necessary as the joined store was merged to the base store, but this will induce a large increase of the datastore's size, and therefore will increase the RAM requirement of the application.