Joining isolated stores in Copper2
When building an ActivePivot application, it is a common practice to only have a portion of the data available in the datastore added to the hierarchies/levels of the cube.
One can want to access to such data, especially for measures. This can be done automatically when the requested data is included in the selection of the Cube.
However, some data can be added to the datastore and be isolated from the Selection of a cube.
The Copper API provides a comprehensive mechanism to define and publish measures/hierarchies using the records from any store of a datastore.
We advise the reader to read the following article of the Copper documentation before proceeding as it contains important concepts relative to the Copper API
What does "Join operation" mean in Copper ?
The idea between the join operations in copper is to provide a high-level abstraction for accessing data from any store of the datastore. This allows the user to define a relationship between the members of one or more levels of the cube and the records of the datastore's fields.
Once the relationship between the cube and the datastore is defined, it is possible to define both single-level hierarchies and measures from the data obtained from the isolated store.
Let's consider this simple case :
- The cube contains the User hierarchy, with the "users" level.
- The datastore contains a hobbies store, which is isolated from the selection, and contains hobby related data :
User/users |
---|
ALL |
marion |
nathan |
roger |
lukas |
hannah |
userName (K) | hobby | budget | category |
---|---|---|---|
marion | dance | 1.0 | art |
roger | tennis | 2.0 | sport |
nathan | tennis | 2.0 | sport |
lukas | piano | 4.0 | art |
romain | piano | 4.0 | art |
Using join operations with Copper will allow the user to define and use measures and/or hierarchies based of the records of the hobbies store almost identically to standard measures.
Defining a join operation
Before creating measures and/or hierarchies based on data coming from records of isolated stores, it is necessary to define the relation between the ActivePivot cube and the joined stores:
All the information regarding a join operation is defined in a CopperStore
object, which will be reused later for defining our measures and hierarchies.
The entry point of the Copper API to create a CopperStore
is the following call :
final CopperStore store = Copper.store("joinedStoreName").joinToCube();
This code above defines a join operation with a default join type and without mapping. We'll explain the meaning of the italicized terms in the next sections.
As mentioned before, a CopperStore
contains all the required information to perform join operations.
For a given Pivot instance, a join Operation will be identified by the name of joined store, the type of join and the cube-to-store mapping.
Copper Join Types
In terms of classic relational databases, JOIN
clauses are often prefixed with types that modify the predicate required for the query to return a result.
The Copper API currently only supports INNER
JoinType
joins.
The default behavior of the Copper API is the INNER JOIN
, however the Join Type performed by a join operation can be changed in the declaration of the CopperStore
:
final CopperStore store = Copper.store("joinedStoreName").joinToCube(JoinType type);
Supported JOIN TYPES :
INNER JOIN
: The join operation will return a result IIF there are matching records (according to the operation's mapping) in both the datastore and the joined levels.
Join mapping
Whe defining Join operations in classic database systems such as SQL, one must specify the ON
clause in order to declare on between which columns the join will be performed.
A similar operation can be done in the copper API by performing calls of the withMapping(String fieldName)
method on a CopperStore
.
The returned CopperStore
will correspond to a Join operation with an ON
clause between the fieldName
fields of the joined store and the fieldName
level of the Cube.
If the joined fields and the joined cube levels do not have the same name, it is possible to call one of the other signatures of the withMapping()
function :
CopperStore storeWithMapping = store.withMapping("fieldAndLevelName");
CopperStore storeWithMapping = store.withMapping("fieldName", "levelName");
CopperStore storeWithMapping = store.withMapping("fieldName", Copper.level("levelName"));
The Copper API support chained calls on this method, meaning the joined store can be joined to the cube data through multiple fields at once.
If we reconsider the example we mentioned earlier, the natural choice for a join mapping would be to map the "users" level to the "userName" field of the store, with an INNER
type join.the
CopperStore storeWithMapping = Copper.store("hobbies")
.joinToCube()
.withMapping("userName", "users");
The defined copper store contains the join operation between the cube and the isolated store such as following :
User/users |
---|
ALL |
marion |
nathan |
roger |
lukas |
hannah |
userName (K) | hobby | budget | category |
---|---|---|---|
marion | dance | 1.0 | art |
roger | tennis | 2.0 | sport |
nathan | tennis | 2.0 | sport |
lukas | piano | 4.0 | art |
romain | piano | 4.0 | art |
We see that the [hannah]
member from the cube does not have a corresponding record in the records, and that the datastore entry "Romain" does not correspond to a location in the cube.
Since we're using a INNER JOIN
, querying a join-related measure on the location [hannah]
will return a null
value.
Similarly, there will no cube coordinate corresponding to the Romain
data, so it won't be queryable.
What if the mapping I want to use does not correspond to a simple one-to-one relationship in my data
In the previous case, there was an obvious bijective relationship between the records of the store and the mapped levels, as the mapped field (userName) was the primary key of the hobbies store, meaning a user could have only one hobby.
But, what if I have several hobbies ?
Well we have to change the primary key of the "hobbies" store, and consider the (userName, hobby) tuple as the primary key of the store "hobbies".
Let's consider our new scenario :
User/users |
---|
ALL |
marion |
nathan |
roger |
lukas |
hannah |
userName (K) | hobby(K) | budget | category |
---|---|---|---|
marion | dance | 1.0 | art |
roger | tennis | 2.0 | sport |
nathan | tennis | 2.0 | sport |
lukas | piano | 4.0 | art |
lukas | soccer | 3.0 | sport |
romain | piano | 4.0 | art |
Lukas now has two hobbies, and if we ever want to define a measure on the data of the isolated store, the [Lukas]
location would
be problematic : we're currently in a 'one to many' scenario,
When the user-defined mapping of a Join operation cannot be translated to a (one/many)-to-one relation between the members and the store facts, the cube WILL be enriched with the necessary single-level hierarchies to obtain this relationship
This means that when using a join operation on a store :
Copper will enrich the cube with single-level hierarchies corresponding to the fields in the primary key of the store that are not included in the mapping of the join operation"
For our previous usecase, this means that in order to restore a one-to-one relationship between cube locations and store records, the hobby
analysis hierarchy will be published, its members populated by the values of the field :
User/users | hobby/hobby |
---|---|
(ALL) | |
(ALL) | |
dance | |
tennis | |
soccer | |
piano | |
marion | |
(ALL) | |
dance | |
tennis | |
soccer | |
piano | |
nathan | |
... | ... |
userName (K) | hobby(K) | budget | category |
---|---|---|---|
marion | dance | 1.0 | art |
roger | tennis | 2.0 | sport |
nathan | tennis | 2.0 | sport |
lukas | piano | 4.0 | art |
lukas | soccer | 3.0 | sport |
romain | piano | 4.0 | art |
With the new hobby
analysis hierarchy, we see that we are now able to select one record per location in the cube.
Hierarchies implicitly created by the join operation can be customized. This will be explained in further depth in the Join Hierarchies section.
Joining without mapping
In the first snippet defining a CopperStore
:
final CopperStore store = Copper.store("joinedStoreName").joinToCube();
the join operation is declared without any call to withMapping()
.
In that case , the defined operation is a CROSS JOIN
. Following the rule of publication of required hierarchies, the cube will therefore be enriched with single-level hierarchies corresponding
to the key fields of the joined store.
If the joined store is an append-only store (i.e. a store without primary key), all the fields are considered as a part of the effective primary key of the store.
Using a join operation
Once the join Operation is defined, the corresponding CopperStore
can be used to create and publish
measures and/or hierarchies that will be able to fetch data from the joined store.
One can note that defining join Operations does not affect the cube or the stores as long as no measure/hierarchy making use of this join operation is published .
Publishing a join-based hierarchy
In order to navigate through the data of a joined store, one can desire to define new hierarchies based on the content of a given field of the store.
This can be done through the following :
final CopperStore store = Copper.store("joinedStoreName").joinToCube();
Copper.newSingleLevelHierarchy("hierarchyName").from(store.field("fieldName")).publish(context);
This will explicitly publish a single-level hierarchy in the cube"hierarchyName"
. This hierarchy will be an analysis hierarchy with its members matching the records of the
"fieldName"
field in the "joinedStoreName"
strore.
The created hierarchy can be configured alike to a single-level hierarchy (see [here](/products/atoti/server/5.8.19/docs/copper2/copper2_hierarchies.html#Hierarchy metadata) ).
As mentioned before, publishing a join-related hierarchy may lead to the implicit publication of other hierarchies.
The implicitly created hierarchy can be configured as well, by being explicitly published like in the previous snippet.
Publishing a join-based measure
Once a join operation is defined, it is also possible to declare measures that will use data coming from the joined stores. The Copper API includes two types of join-related measures :
- Lookup measures : they allow to inspect a specific data value from the records of the joined store
- Aggregated measure : performs a given aggregation on the values of a given field of the joined store
Lookup measures
Creates a Copper measure referencing the records of the field of a joined store , which can then be
used in calculations.
This measure will return the values contained in the field's datastore records if at least a minimal set of
required levels is expressed in the queried location, otherwise it will return a null
value.
As a basic example, let us consider a base store containing basic trade data and an isolated Rates store containing rates data between currencies :
TradeID (k) | Currency |
---|---|
1 | EUR |
2 | USD |
3 | EUR |
4 | USD |
Base cur.(k) | Target cur.(k) | Rate |
---|---|---|
EUR | EUR | 1 |
EUR | USD | 1.101 |
USD | EUR | 0.91 |
USD | USD | 1 |
If we consider that the "TradeID" and "Currency" fields of the "Trades" store have been added to the cube, there
exist an obvious join relationship between a level of the cube (Currency) and one the "Base cur." field of the
Rates store.
We can define our join operation this way :
java CopperStore rateStore = Copper.store("Rates").joinToCube().withMapping("Base cur.", "Currency");
Then, it is possible to define a lookup measure corresponding to the values or the "Rate" field of the Rates store, and publish it to the cube :
Copper.newLookupMeasure(rateStore.field("Rate")).as("rate.value").publish(context);
Now the measure rate.value
exists and will return the content of the Rate field for queried
locations.
However, rate.value
makes only sense when both a Currency AND a Target currency are expressed (it's
in the key fields of the joined store). Therefore a single-level hierarchy corresponding to the target currency
members was added to the cube to be able to query rate.value
.
Results when querying rate.value
for several locations:
TradeID | Currency | Target cur. | rate.value |
---|---|---|---|
(ALL) | (ALL) | (ALL) | null |
(ALL) | EUR | (ALL) | null |
(ALL) | (ALL) | USD | null |
(ALL) | EUR | USD | 1.101 |
(ALL) | USD | USD | 1 |
1 | (ALL) | (ALL) | null |
1 | (ALL) | EUR | null |
1 | USD | EUR | 0.91 |
1 | EUR | EUR | 1 |
Aggregated measures
Aggregated measures based on fields of a joined store can be defined by calling Copper.agg(CopperStoreField field, String aggregationPluginKey)
.
This measure will return the values contained in the field's datastore records aggregated following the logic of the provided aggregate function.
Unlike lookup measures, an aggregated measure returns values even if a level mapped to a key of the store is not expressed in the query.
As a basic example, let us consider a base store containing product prices and an isolated store containing our competitors prices:
Product store :(k) - key field
Id (k) | Price |
---|---|
1 | 10.0 |
2 | 5.0 |
3 | 7.0 |
4 | 1.0 |
Competitor store: (k) - key field
ProductID (k) | Competitor (k) | CompetitorPrice |
---|---|---|
1 | A | 10.0 |
1 | B | 11.0 |
1 | C | 15.0 |
2 | A | 4.0 |
2 | B | 5.5 |
2 | C | 6.5 |
3 | A | 6.0 |
3 | B | 8.0 |
4 | C | 1.2 |
Let us consider a cube with a single level hierarchy based on the field 'Id'. We want the competitor price average but
there is a one-to-many relationship between the Product store and the Competitor store, thus, we cannot define
a regular reference in our datastore configuration to compute for instance the average price of our competitors.
However, this relationship can be resolved by Copper on-the-fly, at query time thanks to a join operation:
CopperStore competitorStore = Copper.store("CompetitorStore").joinToCube().withMapping("ProductID", "Id");
Then, it is possible to define a measure corresponding to the average price values of the competitors and publish it to the cube :
Copper.avg(competitorStore.field("CompetitorPrice")).as("competitorAvgPrice").publish(context);
Now the measure competitorAvgPrice
exists and can be queried for every location.
Id | competitorAvgPrice |
---|---|
(ALL) | 7.467 |
1 | 12.0 |
2 | 5.33 |
3 | 7.0 |
4 | 1.2 |
Even for the newly published hierarchy CompetitorID
from the joined store key field Competitor
:
Competitor | competitorAvgPrice |
---|---|
(ALL) | 7.467 |
A | 6.667 |
B | 8.167 |
C | 7.567 |
Warning : You will only see the Competitors whose Products are also in the Product store.
Please note that an aggregated measure is calculated from the base store, meaning that here the average competitor price is weighted by the number of times that each product appeared in the Product store, e.g. if Product 1 appeared twice in the Product Store, each line of Product 1 in the Competitor store would be counted twice in the calculation of the average competitor price. However, in this convenient example, we are able to use Copper to calculate the actual average competitor price because each product appears once in the Product store and at most once per competitor.
Advanced topics
Multiple join
Using multiple store with Datastore references
While the Copper API does not allow to perform "chained" join operations, it is possible to access data from multiple isolated stores if they are linked together with datastore references.