Creating and publishing hierarchies
As explained in the previous section, complex calculations can be done by defining new operations on measures to create new ones. However, some use cases require the definition of new hierarchies to navigate through the stream of data.
Bucketing
Bucketing is the process of creating a new hierarchy in the cube whose members are calculated based on members of other hierarchies or measures, or context values. These new hierarchies are often used to create coarser groupings of the facts in buckets.
For instance, let's say that in our current data we have a granularity to the month. If we want to see aggregated values per quarter instead, we can do the following:
CopperLevelValues quarterValues = Copper.level("month")
.map((Integer m) -> "Q" + ((m - 1) / 3 + 1));
Copper.newHierarchy("Quarter")
.fromValues(quarterValues)
.publish(context);
There are a few things to note here:
- We used a lambda to create the quarter based on the month. In this lambda we gave a type to the input of this lambda (month) and this is mandatory, since at compile time we don't know the type of the level.
Copper.newHierarchy(...)
returns aSelectHierarchyType
to start defining a hierarchy.- When applied to a level, the
map
function returns aCopperLevelValues
object that represents how the members of a level will be computed. Its sole purpose is to be used as the argument ofSelectHierarchyType#fromValues(...)
to create a single level hierarchy. - After defining how a level's members are computed and in which hierarchy this level will end up, you can publish the hierarchy, or customize it further by indicating whether the hierarchy is slicing with
.slicing()
(it is not slicing by default), or provide a custom level comparator. - The quarters bucketing works with all measures.
The example above is one of the simplest bucketings you can make. It is made by grouping aggregates based on cube level member values, but we can also create buckets using underlying measures and context values.
For the next example let's say we have:
- A datastore with a single store called tweets:
id | text | sender_id | likes | year | month | day |
---|---|---|---|---|---|---|
0 | Hello World | 0 | 23 | 2017 | 11 | 2 |
1 | Lol | 0 | 2 | 2017 | 12 | 14 |
2 | Foo | 0 | 0 | 2018 | 1 | 4 |
3 | Test | 1 | 0 | 2018 | 2 | 9 |
4 | Hola | 2 | 999 | 2018 | 3 | 14 |
- A selection and a cube on top of this dataset:
StartBuilding.selection(datastoreDescription)
.fromBaseStore("tweets")
.withAllReachableFields()
.build();
StartBuilding.cube("tweets")
.withSingleLevelDimensions("sender_id")
.withDimension("time")
.withHierarchy("time")
.withLevel("year")
.withLevel("month")
.withLevel("day")
Let's say we want to collect a user's interest for tweets. We've created a dedicated context value that represent these interest. We've left the code below for thoroughness:
/**
* Represents the interests of a user in the content of the tweets.
*/
protected interface UserInterests extends IContextValue {
/**
* Indicates how much the user represented by this context value is interested in tweets sent by
* a given account.
*
* @param senderId The sender of the tweet.
* @param numberOfLikes The number of likes the tweets of this sender accumulated.
* @return The estimated interest in the tweet.
*/
String getUserSenderIdInterest(long senderId, long numberOfLikes);
@Override
default Class<? extends IContextValue> getContextInterface() {
return UserInterests.class;
}
}
/**
* An implementation liking users in a white list or having at list a given amount of likes in
* total.
*/
protected static class MinLikesAndWhiteListInterests implements UserInterests {
private static final long serialVersionUID = 1L;
/**
* The other users liked by the current user.
*/
protected final Set<Long> likedSenders;
/**
* The minimum accumulated number of likes a user needs to have to be appreciated by the current
* user.
*/
protected final long minTotalLikes;
/**
* Fully defining constructor.
*
* @param minTotalLikes The minimum accumulated number of likes a user needs to have to be
* appreciated by the current user.
* @param likedSenders The list of senders the current user likes anyway.
*/
public MinLikesAndWhiteListInterests(long minTotalLikes, Long... likedSenders) {
this.minTotalLikes = minTotalLikes;
this.likedSenders = new HashSet<>();
this.likedSenders.addAll(Arrays.<Long>asList(likedSenders));
}
/**
* Copy constructor.
*
* @param minTotalLikes See {@link #minTotalLikes}.
* @param likedSenders See {@link #likedSenders}.
*/
protected MinLikesAndWhiteListInterests(long minTotalLikes, Set<Long> likedSenders) {
this.minTotalLikes = minTotalLikes;
this.likedSenders = likedSenders;
}
@Override
public String getUserSenderIdInterest(long senderId, long numberOfLikes) {
if (likedSenders.contains(senderId) || numberOfLikes > this.minTotalLikes) {
return "Interested";
} else {
return "Not interested";
}
}
@Override
public MinLikesAndWhiteListInterests clone() {
return new MinLikesAndWhiteListInterests(minTotalLikes, likedSenders);
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((likedSenders == null) ? 0 : likedSenders.hashCode());
result = prime * result + (int) (minTotalLikes ^ (minTotalLikes >>> 32));
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
MinLikesAndWhiteListInterests other = (MinLikesAndWhiteListInterests) obj;
if (likedSenders == null) {
if (other.likedSenders != null) {
return false;
}
} else if (!likedSenders.equals(other.likedSenders)) {
return false;
}
if (minTotalLikes != other.minTotalLikes) {
return false;
}
return true;
}
}
With this context value we can bucket with this calculation:
/**
* Creates calculation bucketing by a context value.
*/
protected static void contextValueBucketing(ICopperContext context) {
CopperLevelValues interestLevelValues = Copper
.combine(Copper.contextValue(UserInterests.class).withName("contextValue('UserInterests')"), Copper.level("sender_id"),
Copper.sum("likes").withName("likes.SUM"))
.map(
a -> ((UserInterests) a.read(0)).getUserSenderIdInterest(a.readLong(1), a.readLong(2)));
Copper.newHierarchy("interest")
.fromValues(interestLevelValues)
.withMemberList("Interested", "Not interested")
.publish(context);
}
And test the bucketing produced with an MDX query using different context values:
The first one with a threshold of 200 and having sender 1 as favorite (UserInterests userInterests = new MinLikesAndWhiteListInterests(200L, 1L)
):
UserInterests userInterests = new MinLikesAndWhiteListInterests(200L, 1L);
this.builder
.build(context -> contextValueBucketing(context))
.mdxQuery("SELECT "
+ "[interest].[interest].[interest] ON ROWS "
+ "FROM [tweets] WHERE [Measures].[contributors.COUNT]", userInterests)
Executing the following MDX query SELECT [interest].[interest].[interest] ON ROWS FROM [tweets] WHERE [Measures].[contributors.COUNT]
produces:
interest | contributors.COUNT |
---|---|
Interested | 2 |
Not interested | 3 |
The second one with a threshold of 10 and having sender 2 as favorite (UserInterests userInterests = new MinLikesAndWhiteListInterests(10L, 2L)
):
UserInterests userInterests = new MinLikesAndWhiteListInterests(10L, 2L);
this.builder
.build(context -> contextValueBucketing(context))
.mdxQuery("SELECT "
+ "[interest].[interest].[interest] ON ROWS "
+ "FROM [tweets] WHERE [Measures].[contributors.COUNT]", userInterests)
The same MDX query produces:
interest | contributors.COUNT |
---|---|
Interested | 4 |
Not interested | 1 |
There are multiple things to explain in this example:
- We retrieve the value of the context value via
Copper.contextValue(UserInterests.class)
as explained in the API. - We combine the elements
UserInterests.class
,sender_id
andlikes.SUM
to produce a newCopperLevelValues
. We can safely cast the first object we receive in our lambda to the context value class since the first element is the context value measure, and can then directly call a method of the context value in our lambda with, as parameters, the content of thesender_id
andlikes.SUM
member level value and aggregated value. - We use a new function:
.withMemberList()
on our built hierarchy. This is needed when Copper can't automatically find the list of members required by ActivePivot to work properly. Here the calculation uses a context value and aggregated values so it is actually impossible for Copper to know all the different values the function can return. This is why it asks the user to provide the actual list of distinct values that can be returned by this function. You will notice that this method wasn't needed for the previous example. This is because in the case of the previous example Copper could automatically find the list of the members of the level Quarter. So you don't need to always call this method. You should write your calculation without calling it, and then if Copper detects it can't find the member list automatically it will throw an exception with an explicit error message that should guide you on where to add.withMemberList
.
The list of members can be stored in a store instead and specified with
withMembers(String store, String fieldName)
Hierarchy metadata
A singe level hierarchy and its level have several elements of metadata that can be configured in Copper such as:
- whether it is slicing or not with
slicing
. If not indicated, it is NOT slicing. - its dimension with
.inDimension(String dimensionName)
- its level type with
.withType(LevelType type)
- its level formatter with
.withFormatter(String formatter)
. The formatter changes the visual representation of the member level values in a pivot table for instance. - its measure groups with
.withMeasureGroups()
- its folder with
.withinFolder()
. It impacts the path to be used when displaying the dimension in the user interface. - its visibility. A hierarchy can be visible or not in the UI, but it's still available in queries as long as it is part of the cube description. Hierarchies explicitly published are always visible, unless
.hidden()
is called.