Creating and publishing hierarchies
As explained in the previous section, complex calculations can be done by defining new operations on measures to create new ones. However, some usecases require the definition of new hierarchies to navigate through the stream of data.
Copper can only create single level hierarchies for the time being
Bucketing
We call bucketing the process of creating a new hierarchy in the cube whose members are calculated based on members of other hierarchies or measures or context values. These new hierarchies are often used to create coarser grouping of the facts in buckets hence the name. For instance let's say that in our current data we have a granularity to the month. If we want to see aggregated values per quarter instead, we can do the following:
CopperLevelValues quarterValues = Copper.level("month").map((Integer m) -> "Q" + ((m - 1) / 3 + 1));
Copper.newSingleLevelHierarchy("Quarter")
.from(quarterValues)
.publish(context);
There are a few things to note here:
- We used a lambda to create the quarter based on the month. In this lambda we gave a type to the input of this lambda (month) and this is mandatory since we don't know at compile time what is the type of the level.
Copper.newSingleLevelHierarchy(...)
returns aSingleLevelHierarchyBuilder
to start defining a hierarchy.- The
map
function when applied to a level returns aCopperLevelValues
object that represents how the members of a level will be computed. It is sole purpose is to be used as the argument ofSingleLevelHierarchyBuilder#from(...)
to create a single level hierarchy. - After having defining how a level members are computed and in which hierarchy this level will end up, it becomes possible to publish the hierarchy but one may want to further customize by
indicating the dimension
.inDimension(...)
(it is the name of the hierarchy by default), whether the hierarchy is slicing with.slicing()
(it is not slicing by default), the level comparator... - The quarters bucketing works with all measures.
The example above is one of the simplest bucketing one can make. It is made by grouping aggregates based on cube level member values, but we can also create buckets using underlying measures and context values.
For the next example let's say we have:
- A datastore with a single store called tweets containing
id | text | sender_id | likes | year | month | day |
---|---|---|---|---|---|---|
0 | Hello World | 0 | 23 | 2017 | 11 | 2 |
1 | Lol | 0 | 2 | 2017 | 12 | 14 |
2 | Foo | 0 | 0 | 2018 | 1 | 4 |
3 | Test | 1 | 0 | 2018 | 2 | 9 |
4 | Hola | 2 | 999 | 2018 | 3 | 14 |
- A selection and a cube on top of this dataset:
StartBuilding.selection(datastoreDescription)
.fromBaseStore("tweets")
.withAllReachableFields()
.build();
StartBuilding.cube("tweets")
.withSingleLevelDimensions("sender_id")
.withDimension("time")
.withHierarchy("time")
.withLevel("year")
.withLevel("month")
.withLevel("day")
Let's say we want to collect a user interest for tweets, we created a dedicated context value that represent these interest. We left the code below for thoroughness:
/**
* Represents the interests of a user in the content of the tweets.
*/
protected interface UserInterests extends IContextValue {
/**
* Indicates how much the user represented by this context value is
* interested in tweets sent by a given account.
*
* @param senderId The sender of the tweet.
* @param numberOfLikes The number of likes the tweets of this sender accumulated.
*
* @return The estimated interest in the tweet.
*/
String getUserSenderIdInterest(long senderId, long numberOfLikes);
@Override
default Class<? extends IContextValue> getContextInterface() {
return UserInterests.class;
}
}
/**
* An implementation liking users in a white list or having at list a given amount of likes in total.
*/
protected static class MinLikesAndWhiteListInterests implements UserInterests {
private static final long serialVersionUID = 1L;
/** The other users liked by the current user. */
protected final Set<Long> likedSenders;
/** The minimum accumulated number of likes a user needs to have to be appreciated by the current user. */
protected final long minTotalLikes;
/**
* Fully defining constructor.
*
* @param minTotalLikes The minimum accumulated number of likes a user needs to have to be appreciated by the current user.
* @param likedSenders The list of senders the current user likes anyway.
*/
public MinLikesAndWhiteListInterests(long minTotalLikes, Long ...likedSenders) {
this.minTotalLikes = minTotalLikes;
this.likedSenders = new HashSet<Long>();
this.likedSenders.addAll(Arrays.<Long>asList(likedSenders));
}
/**
* Copy constructor.
*
* @param minTotalLikes See {@link #minTotalLikes}.
* @param likedSenders See {@link #likedSenders}.
*/
protected MinLikesAndWhiteListInterests(long minTotalLikes, Set<Long> likedSenders) {
this.minTotalLikes = minTotalLikes;
this.likedSenders = likedSenders;
}
@Override
public String getUserSenderIdInterest(long senderId, long numberOfLikes) {
if (likedSenders.contains(senderId) || numberOfLikes > this.minTotalLikes) {
return "Interested";
} else {
return "Not interested";
}
}
@Override
public MinLikesAndWhiteListInterests clone() {
return new MinLikesAndWhiteListInterests(minTotalLikes, likedSenders);
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((likedSenders == null) ? 0 : likedSenders.hashCode());
result = prime * result + (int) (minTotalLikes ^ (minTotalLikes >>> 32));
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
MinLikesAndWhiteListInterests other = (MinLikesAndWhiteListInterests) obj;
if (likedSenders == null) {
if (other.likedSenders != null)
return false;
} else if (!likedSenders.equals(other.likedSenders))
return false;
if (minTotalLikes != other.minTotalLikes)
return false;
return true;
}
}
Now with this context value we can bucket with this calculation:
CopperLevelValues interestLevelValues = Copper.combine(Copper.contextValue(UserInterests.class), Copper.level("sender_id"), Copper.sum("likes"))
.map(a -> ((UserInterests) a.read(0)).getUserSenderIdInterest(a.readLong(1), a.readLong(2)));
Copper.newSingleLevelHierarchy("interest")
.from(interestLevelValues)
.withMemberList("Interested", "Not interested")
.publsih(context);
And test the bucketing produced with a MDX query using different context values:
The first one with a threshold of 200 and having sender 1 as favorite (UserInterests userInterests = new MinLikesAndWhiteListInterests(200L, 1L)
):
UserInterests userInterests = new MinLikesAndWhiteListInterests(200L, 1L);
contextValueBucketing()
.toCellSet("SELECT "
+ "[interest].[interest].[interest] ON ROWS "
+ "FROM [tweets] WHERE [Measures].[contributors.COUNT]", userInterests)
Executing the following MDX query SELECT [interest].[interest].[interest] ON ROWS FROM [tweets] WHERE [Measures].[contributors.COUNT]
produces
interest | contributors.COUNT |
---|---|
Interested | 2 |
Not interested | 3 |
The second one with a threshold of 10 and having sender 2 as favorite (UserInterests userInterests = new MinLikesAndWhiteListInterests(10L, 2L)
):
UserInterests userInterests = new MinLikesAndWhiteListInterests(10L, 2L);
contextValueBucketing()
.toCellSet("SELECT "
+ "[interest].[interest].[interest] ON ROWS "
+ "FROM [tweets] WHERE [Measures].[contributors.COUNT]", userInterests)
The same MDX query produces
interest | contributors.COUNT |
---|---|
Interested | 4 |
Not interested | 1 |
There are multiple things to explain in this example:
- We retrieve the value of the context value via
Copper.contextValue(UserInterests.class)
as explained in the API - We combine the elements
UserInterests.class
,sender_id
andlikes.SUM
to produce a newCopperLevelValues
. We can safely cast the first object we receive in our lambda to the context value class since the first element is the context value measure, and can then directly call a method of the context value in our lambda with as parameters the content of thesender_id
andlikes.SUM
member level value and aggregated value. - We use a new function:
.withMemberList()
on our built hierarchy because it is needed when Copper detects it can't figure automatically the list of members which is required by ActivePivot to work properly. Here the calculation uses a context value and aggregated values so it is actually impossible for Copper to know all the different values the function can return. This is why it asks the user to provide the actual list of distinct values that can be returned by this function. You will notice that this method wasn't needed for the previous example although this previous example. This is because in the case of the previous example Copper could automatically find the list of the members of the level Quarter. So you don't need to always call this method. You should write your calculation without calling it, and then if Copper detects it can't find the member list automatically it will throw an exception with a quite explicit error message that should guide you on where to add.withMemberList
.
The list of members can be stored in a store instead and specified with
withMembers(String store, String fieldName)
Hierarchy metadata
A singe level hierarchy and its level have several elements of metadata that can be configured in Copper such as:
- whether it is slicing or not with
slicing
. If not indicated, it is NOT slicing. - its dimension with
.inDimension(String dimensionName)
- its level type with
.withType(LevelType type)
- its level formatter with
.withFormatter(String formatter)
. The formatter changes the visual representation of the member level values in a pivot table for instance. - its measure groups with
.withMeasureGroups()
- its folder with
.withinFolder()
. It impacts the path to be used when displaying the dimension in the user interface. - its visibility. A hierarchy can be visible or not in the UI, but will still be available in queries as long as it is part of the cube description. Hierarchies explicitly published are always visible, unless
.hidden()
is called.