Joining isolated stores in Copper
When building an ActivePivot application, it is common practice to only have a portion of the data available in the datastore added to the hierarchies/levels of the cube.
You may want to access such data, especially for measures. This can be done automatically when the requested data is included in the selection of the Cube.
However, some data can be added to the datastore and be isolated from the Selection of a cube.
The Copper API provides a comprehensive mechanism to define and publish measures/hierarchies using the records from any store of a datastore.
We advise you to read the following article of the Copper documentation before proceeding as it contains important concepts relative to the Copper API.
What does "Join operation" mean in Copper?
Join operations in copper provide a high-level abstraction for accessing data from any store of the datastore. This allows you to define a relationship between the members of one or more levels of the cube and the records of the datastore's fields.
Once the relationship between the cube and the datastore is defined, it is possible to define both single-level hierarchies and measures from the data obtained from the isolated store.
Let's consider this simple case:
- The cube contains the User hierarchy, with the "users" level.
- The datastore contains a hobbies store, which is isolated from the selection and contains hobby related data:
User/users |
---|
ALL |
marion |
nathan |
roger |
lukas |
hannah |
userName (K) | hobby | budget | category |
---|---|---|---|
marion | dance | 1.0 | art |
roger | tennis | 2.0 | sport |
nathan | tennis | 2.0 | sport |
lukas | piano | 4.0 | art |
romain | piano | 4.0 | art |
Using join operations with Copper allows you to define and use measures and/or hierarchies based on the records of the hobbies store, almost identically to standard measures.
Defining a join operation
Before creating measures and/or hierarchies based on data coming from records of isolated stores, it is necessary to define the relation between the ActivePivot cube and the joined stores:
All the information regarding a join operation is defined in a CopperStore
object, which is reused later for defining our measures and hierarchies.
The entry point of the Copper API to create a CopperStore
is the following call:
final CopperStore store = Copper.store("joinedStoreName").joinToCube();
This code above defines a join operation with a default join type and without mapping. We'll explain the meaning of the italicized terms in the next sections.
As mentioned before, a CopperStore
contains all the required information to perform join operations.
For a given Pivot instance, a join Operation will be identified by the name of joined store, the type of join and the cube-to-store mapping.
Copper Join Types
In terms of classic relational databases, JOIN
clauses are often prefixed with types that modify the predicate required for the query to return a result.
The Copper API currently supports INNER and LEFT
JoinType
joins.
The default behavior of the Copper API is the INNER JOIN
, but the Join Type performed by a join operation can be changed in the declaration of the CopperStore
:
final CopperStore store = Copper.store("joinedStoreName").joinToCube(JoinType type);
Supported JOIN TYPES :
INNER JOIN
: The join operation returns a result IF there are matching records (according to the operation's mapping) in both the datastore and the joined levels.LEFT JOIN
: The join operation returns a result if there are matching records (according to the operation's mapping) in the datastore and the joined levels OR if a location does not have a matching record in the datastore. In that case, the join operation on this specific location returns a value corresponding to the default value of the joined store's field. You can also specify a specific default value for the returned field by using thewithDefaultValue()
method.
Join mapping
Whe defining Join operations in classic database systems, such as SQL, you must specify the ON
clause in order to declare between which columns the join is performed.
A similar operation can be done in the copper API by performing calls of the withMapping(String fieldName)
method on a CopperStore
.
The returned CopperStore
corresponds to a Join operation with an ON
clause between the fieldName
fields of the joined store and the fieldName
level of the Cube.
If the joined fields and the joined cube levels do not have the same name, you can call one of the other signatures of the withMapping()
function:
CopperStore storeWithMapping = store.withMapping("fieldAndLevelName");
CopperStore storeWithMapping = store.withMapping("fieldName", "levelName");
CopperStore storeWithMapping = store.withMapping("fieldName", Copper.level("levelName"));
The Copper API support chained calls on this method, meaning the joined store can be joined to the cube data through multiple fields at once.
Looking at the example we mentioned earlier, the natural choice for a join mapping would be to map the "users" level to the "userName" field of the store, with an INNER
type join.the
CopperStore storeWithMapping = Copper.store("hobbies")
.joinToCube()
.withMapping("userName", "users");
The defined copper store contains the join operation between the cube and the isolated store:
User/users |
---|
ALL |
marion |
nathan |
roger |
lukas |
hannah |
userName (K) | hobby | budget | category |
---|---|---|---|
marion | dance | 1.0 | art |
roger | tennis | 2.0 | sport |
nathan | tennis | 2.0 | sport |
lukas | piano | 4.0 | art |
romain | piano | 4.0 | art |
We see that the [hannah]
member from the cube does not have a corresponding record in the records, and that the datastore entry "Romain" does not correspond to a location in the cube.
Since we're using an INNER JOIN
, querying a join-related measure on the location [hannah]
returns a null
value.
Similarly, there's no cube coordinate corresponding to the Romain
data, so it isn't queryable.
What if the mapping I want to use does not correspond to a simple one-to-one relationship in my data
In the previous case, there was an obvious bijective relationship between the records of the store and the mapped levels, as the mapped field (userName) was the primary key of the hobbies store, meaning a user could have only one hobby.
If a user has several hobbies, we have to change the primary key of the "hobbies" store, and consider the (userName, hobby) tuple as the primary key of the store "hobbies".
Let's consider our new scenario:
User/users |
---|
ALL |
marion |
nathan |
roger |
lukas |
hannah |
userName (K) | hobby(K) | budget | category |
---|---|---|---|
marion | dance | 1.0 | art |
roger | tennis | 2.0 | sport |
nathan | tennis | 2.0 | sport |
lukas | piano | 4.0 | art |
lukas | soccer | 3.0 | sport |
romain | piano | 4.0 | art |
Lukas now has two hobbies, and if we ever want to define a measure on the data of the isolated store, the [Lukas]
location is problematic: we're currently in a 'one to many' scenario.
When the user-defined mapping of a Join operation cannot be translated to a (one/many)-to-one relation between the members and the store facts, the cube WILL be enriched with the necessary single-level hierarchies to obtain this relationship
This means that when using a join operation on a store:
Copper will enrich the cube with single-level hierarchies corresponding to the fields in the primary key of the store that are not included in the mapping of the join operation.
For our previous use case, this means that in order to restore a one-to-one relationship between cube locations and store records, the hobby
analysis hierarchy is published, its members populated by the values of the field:
User/users | hobby/hobby |
---|---|
(ALL) | |
(ALL) | |
dance | |
tennis | |
soccer | |
piano | |
marion | |
(ALL) | |
dance | |
tennis | |
soccer | |
piano | |
nathan | |
... | ... |
userName (K) | hobby(K) | budget | category |
---|---|---|---|
marion | dance | 1.0 | art |
roger | tennis | 2.0 | sport |
nathan | tennis | 2.0 | sport |
lukas | piano | 4.0 | art |
lukas | soccer | 3.0 | sport |
romain | piano | 4.0 | art |
With the new hobby
analysis hierarchy, we see that we are now able to select one record per location in the cube.
Hierarchies implicitly created by the join operation can be customized. This will be explained in further depth under Publishing a join-based hierarchy.
Joining without mapping
In the first snippet defining a CopperStore
:
final CopperStore store = Copper.store("joinedStoreName").joinToCube();
The join operation is declared without any call to withMapping()
.
In that case, the defined operation is a CROSS JOIN
. Following the rule of publication of required hierarchies, the cube is enriched with single-level hierarchies corresponding to the key fields of the joined store.
If the joined store is an append-only store (a store without a primary key), all the fields are considered as part of the effective primary key of the store.
Using a join operation
Once the join operation is defined, the corresponding CopperStore
can be used to create and publish measures and/or hierarchies that are able to fetch data from the joined store.
Defining join operations does not affect the cube or the stores as long as no measure/hierarchy making use of this join operation is published.
Publishing a join-based hierarchy
In order to navigate through the data of a joined store, you may want to define new hierarchies based on the content of a given field of the store.
This can be done through the following:
final CopperStore store = Copper.store("joinedStoreName").joinToCube();
Copper.newSingleLevelHierarchy("hierarchyName").from(store.field("fieldName")).publish(context);
This explicitly publishes a single-level hierarchy in the cube "hierarchyName"
. This hierarchy is an analysis hierarchy with its members matching the records of the "fieldName"
field in the "joinedStoreName"
strore.
The created hierarchy can be configured similarly to a single-level hierarchy. For more information, see here.
As mentioned before, publishing a join-related hierarchy may lead to the implicit publication of other hierarchies.
The implicitly created hierarchy can be configured as well, by explicitly publishing it, like in the previous snippet.
Publishing a join-based measure
Once a join operation is defined, you can also declare measures that will use data coming from the joined stores. The Copper API includes two types of join-related measures:
- Lookup measures: allow to inspect a specific data value from the records of the joined store.
- Aggregated measures: perform a given aggregation on the values of a given field of the joined store.
Lookup measures
A lookup measure is a Copper measure referencing the records of the field of a joined store, which can then be
used in calculations.
This measure returns the values contained in the field's datastore records if at least a minimum set of
required levels is expressed in the queried location. Otherwise it returns a null
value.
As a basic example, let us consider a base store containing basic trade data and an isolated Rates store containing rates data between currencies :
TradeID (k) | Currency |
---|---|
1 | EUR |
2 | USD |
3 | EUR |
4 | USD |
Base cur.(k) | Target cur.(k) | Rate |
---|---|---|
EUR | EUR | 1 |
EUR | USD | 1.101 |
USD | EUR | 0.91 |
USD | USD | 1 |
If we consider that the "TradeID" and "Currency" fields of the "Trades" store have been added to the cube, there
exist an obvious join relationship between a level of the cube (Currency) and one the "Base cur." field of the
Rates store.
We can define our join operation this way:
java CopperStore rateStore = Copper.store("Rates").joinToCube().withMapping("Base cur.", "Currency");
Then you can define a lookup measure corresponding to the values or the "Rate" field of the Rates store, and publish it to the cube:
Copper.newLookupMeasure(rateStore.field("Rate")).withName("rate.value").publish(context);
Now the measure rate.value
exists and returns the content of the Rate field for queried
locations.
However, rate.value
makes sense only when both a Currency AND a Target currency are expressed (it's
in the key fields of the joined store). Therefore, a single-level hierarchy corresponding to the target currency
members was added to the cube to be able to query rate.value
.
Here are the results of querying rate.value
for several locations:
TradeID | Currency | Target cur. | rate.value |
---|---|---|---|
(ALL) | (ALL) | (ALL) | null |
(ALL) | EUR | (ALL) | null |
(ALL) | (ALL) | USD | null |
(ALL) | EUR | USD | 1.101 |
(ALL) | USD | USD | 1 |
1 | (ALL) | (ALL) | null |
1 | (ALL) | EUR | null |
1 | USD | EUR | 0.91 |
1 | EUR | EUR | 1 |
Aggregated measures
Aggregated measures based on fields of a joined store can be defined by calling Copper.agg(CopperStoreField field, String aggregationPluginKey)
.
For each location of the query, the value of the datastore result for the requested field are aggregated following the logic of the provided aggregate function.
Unlike lookup measures, an aggregated measure returns values even if a level mapped to a key of the store is not expressed in the query.
As a basic example, let us consider a base store containing product prices and an isolated store containing our competitors prices:
Product store :(k) - key field
Id (k) | Price |
---|---|
1 | 10.0 |
2 | 5.0 |
3 | 7.0 |
4 | 1.0 |
Competitor store: (k) - key field
ProductID (k) | Competitor (k) | Price |
---|---|---|
1 | A | 10.0 |
1 | B | 11.0 |
1 | C | 15.0 |
2 | A | 4.0 |
2 | B | 5.5 |
2 | C | 6.5 |
3 | A | 6.0 |
3 | B | 8.0 |
4 | C | 1.2 |
Let us consider a cube with a single level hierarchy based on the field 'Id'. There is a one-to-many relationship
between the Product store and the Competitor store, thus, we cannot define a regular reference in our datastore
configuration to compute for instance the average price of our competitors.
However, this relation can be resolved by Copper on-the-fly, at query time thanks to a join operation:
CopperStore competitorStore = Copper.store("CompetitorStore").joinToCube().withMapping("Id", "ProductID");
Then, it is possible to define a measure corresponding to the average price values of the competitors and publish it to the cube :
Copper.avg(competitorStore.field("Price")).as("competitorAvgPrice").publish(context);
Now the measure competitorAvgPrice
exists and can be queried for every locations.
Id | competitorAvgPrice |
---|---|
(ALL) | 7.467 |
1 | 12.0 |
2 | 5.33 |
3 | 7.0 |
4 | 1.2 |
Or, looking at the newly published analysis hierarchy CompetitorID
from the store's key field Competitor
:
Competitor | competitorAvgPrice |
---|---|
(ALL) | 7.467 |
A | 6.667 |
B | 8.167 |
C | 7.567 |
Advanced topics
Multiple join
With the Copper API it is possible to join the same store of the datastore to the same cube with multiple join operations, as long as the join operations,
defined by the CopperStore
object are different.
Using multiple store with Datastore references
While the Copper API does not allow to perform "chained" join operations, it is possible to access data from multiple isolated stores if they are linked together with datastore references.
Using Factless Levels in the mapping of the join
For now it is not possible to define join operations using factless levels of the cube in the join mapping.