Creating and publishing measures
This section introduces the various ways of creating measures with the API, or referring to existing ones to use them in calculations.
Basics
Here we'll cover the most elementary Copper measures, created in just one step.
Native measures
The native measures contributors.COUNT
and update.TIMESTAMP
are available by calling respectively Copper.count()
and Copper.timestamp()
.
Aggregated measures
Aggregated measures refer to selection fields aggregated using a function plugin key. In the ActivePivot sandbox, such measures include pnl.SUM
.
In Copper, you can create such a measure by calling Copper.agg(fieldName, pluginKey)
.
We have provided shorthand methods for the most common aggregations -- sum, avg, min, max -- to free the user from having to specify the plugin key every time.
The creation of pnl.SUM
becomes:
Copper.sum("pnl").withName("pnl.SUM").publish(context);
All measures published and used by the user must be named using the withName
function.
The default name generated for such measures, unless overridden by the user, is of the form field.PLUGIN_KEY + CopperGeneratedSuffix
.
Level member value
Copper allows you to create a measure whose value is the member of a selected level in the evaluated location, when the level is expressed. This measure may be used for further calculations, or as is.
Copper.member(Copper.level("Desk")).withName("DeskMemberValue").publish(context);
Here, Desk is the name of a level.
Example:
Desk | DeskMemberValue |
---|---|
(ALL) | |
DeskA | DeskA |
DeskB | DeskB |
As you can see, the measure named DeskMemberValue
returns null
on the grand total because the level Desk is not expressed.
Context value
Context values can be used in Copper, and creating a measure whose value is that of the context value of a given class
at every cube coordinate is done with Copper.contextValue(contextValueClass).withName(measureName)
.
Constant
Measures that have the same constant value at every cube coordinate can be used to express arithmetic calculations (see next section).
They are created using Copper.constant(value)
.
Named measure
Copper allows you to refer to any measure by its name.
You can use this feature to manipulate measures which were initially present in the cube, as well as any previously published Copper measure.
This is done with Copper.measure(name)
.
The name of a measure is the user-given name if any (with CopperMeasure#withName(String)
, see Measure metadata about measure metadata for more information),
or the default name generated by Copper if the referenced measure is implicit/not visible.
Copper.sum("pnl").withName("pnl.SUM").publish(context);
CopperMeasure m1 = function(Copper.measure("pnl.SUM")); // reuse pnl.SUM in a custom function
m1.publish(context);
Note that in this particular example, the whole body can be boiled down to a single equivalent instruction:
function(Copper.sum("pnl")).publish(context)
. See the section about measure publication for more details.
Simple operations
In this section we will review some elementary operations that can be performed on Copper measures to create new measures.
Arithmetic operations
Arithmetic operations such as +
, -
, *
, /
are naturally supported in Copper, with the methods plus
, minus
, multiply
, divide
available for any CopperMeasure
.
CopperMeasure pnlSUM = Copper.sum("pnl").withName("pnl.SUM").publish(context);
pnlSUM.multiply(Copper.constant(2)).withName("pnl2.SUM").publish(context);
pnlSUM.plus(pnlSUM).withName("doubleSum").publish(context);
Lambda functions
For more complex calculations, lambdas can be used: measure.map(value -> ...)
.
Copper.sum("pnl").map((Double pnl) -> pnl.longValue()).withName("complexMeasure").publish(context);
// mapToDouble can be used to manipulate primitive double types
Copper.sum("pnl").mapToDouble((Double pnl) -> Math.sqrt(pnl)).withName("sqrtPNL").publish(context);
It is however advised to use the arithmetic operations as much as possible, since Copper is able to understand and optimize calculations when they are used, which it cannot do with lambdas.
Copper.sum("pnl").map((Double pnl) -> Math.pow(pnl, 2.)).withName(...).publish(context); // DON'T
Copper.sum("pnl").multiply(Copper.sum("pnl")).withName(...).publish(context); // DO
Measure combination
Multiple measures can be combined using Copper.combine(CopperMeasure...)
and then used in a calculation to produce a single measure with map(lambda)
.
When a query is executed, the mapping function is given an IArrayReader
as an argument.
This is an array-like object which contains the values of the combined measures at the evaluated location, in the same order.
The array reader provides a method to get their values like read
, as well as a null-check with isNull
and primitive reads to prevent from boxing: readInt
, readLong
, readDouble
, readFloat
, and readBoolean
.
Copper.combine(
Copper.member(Copper.level("Desk")),
Copper.sum("pnl").withName("pnl.SUM"))
.map(a -> {
final String desk = (String) a.read(0);
final double pnl = a.readDouble(1);
if ("Desk A".equals(desk)) {
return pnl * 2;
} else {
return pnl;
}
})
.publish(context);
Measures used within the combine method are considered visible, and must be named by the user.
Advanced operations
Here we describe some of the more advanced measure operations in Copper.
Measure filtering
Computing the value of a measure considering only a subset of the records is called filtering.
It is performed with a condition on a level, much like one would think "the value of the measure when
only considering the contributions for the city of Paris".
The CopperLevel
interface provides many condition methods, such as eq
, lt
, geq
, etc., for =
, <
, ≥
, and so on.
Filtering will generate a new instance of the measure. In the following example, two measures are published: pnl.SUM
and pnlInParis.SUM
.
CopperMeasure pnlSum = Copper.sum("pnl").withName("pnl.SUM").publish(context);
pnlSum
.filter(Copper.level("City").eq("Paris"))
.withName("pnlInParis.SUM")
.publish(context);
Complex conditions can be constructed using the and
and or
methods in CopperLevelConditions
. Chaining filter
calls with different conditions will have the same result as and
'ing them.
However, Copper does enforce some rules on these conditions.
The operator or
can only be used between two conditions that apply to a single level. We do not support cross-level or
, even within a hierarchy.
- SUPPORTED:
Copper.level("City").eq("Paris").or(Copper.level("City").eq("London"))
- NOT SUPPORTED:
Copper.level("City").eq("Paris").or(Copper.level("Country").eq("UK"))
- NOT SUPPORTED:
(Copper.level("City").eq("Paris").and(Copper.level("Country").eq("France")))
.or(Copper.level("Country").eq("UK"))
If this use case really proves mandatory, one can get around this restriction by combining the two levels (here: Country and City) with the measure (here: pnl.SUM) and manually filtering in the mapping function:
Copper.combine(Copper.level("City"), Copper.level("Country"), Copper.measure("pnl.SUM"))
.map(reader -> {
final String city = (String) reader.read(0);
final String country = (String) reader.read(1);
if (city.equals("Paris") || country.equals("UK")) {
return reader.readDouble(2);
}
return null;
});
Dynamic aggregation/Leaf operation
Dynamic aggregation refers to the aggregation of a measure using a certain function up to a level, and another aggregation function (or no aggregation) above that level.
You may want to compute the sum of the sales for each category of products, but display the average sale amount when aggregating product categories. This is done like so:
Copper.sum("sales").withName("sales.SUM")
.per(Copper.level("Product Category"))
.avg()
.withName("Sales average per category")
.publish(context);
Some calculations only make sense when a certain level is expressed: as previously seen in the example above with the combine operator, the value of Copper.member(Copper.level("Desk")))
is null for the Grand Total. It is the user's responsibility to handle this case. If there is no reason to compute anything above the level Desk
, it is possible to do:
Copper.combine(
Copper.member(Copper.level("Desk")),
Copper.sum("pnl").withName("pnl.SUM"))
.map(a -> function(a))
.per(Copper.level("Desk"))
.doNotAggregateAbove();
It forces the combine operation to happen at Desk level thanks to .per(Copper.level("Desk"))
.
Because it is mandatory to indicate what to do above the "per levels", we use .doNotAggregateAbove()
here to say
that the combine operation does not need to be executed above the Desk level.
An equivalent, but more cumbersome and less efficient way of writing this would be:
Copper.combine(
Copper.member(Copper.level("Desk")),
Copper.sum("pnl").withName("pnl.SUM"))
.map(a -> {
if (a.isNull(0)) {
return null;
} else {
return function(a);
}
})
.per(Copper.level("Desk"))
.doNotAggregateAbove();
Aggregation levels hiding
It is possible to specify the aggregation behavior and decide when it should stop by using the doNotAggregateAbove(AggregationLevels mask)
method :
By default, when defining a measure with a dynamic aggregation operation followed by a doNotAggregateAbove()
,
the aggregation will be stopped along a hierarchy when at least one of the levels provided in the list is expressed, this corresponds to the doNotAggregateAbove(ANY)
behaviour.
A measure will not return values at a queried location if the aggregation has been stopped at a more granular level.
In some cases however, especially when having multi-level hierarchies with levels having functional overlapping, we want to be able to aggregate the measure
until all the specified levels in a hierarchy are expressed in a query. This can be achieved by specifying doNotAggregateAbove(ALL)
when defining the measure.
In other specific cases, a measure may not have any functional meaning when showing values below a given level. In this case, it is possible to specify maskAtOrBelow()
when defining the measures. Please note that such a measure will only return values when the location is at an aggregation level strictly superior to the predefined leaf levels.
The maskAtOrBelow(AggregationLevels mask)
method can be used to specify the behaviour to follow when multiple leaf levels are defined.
Analytic functions
An analytic function is a function that computes aggregated values over a group of other aggregated values. The group is defined by a window built along a particular hierarchy used as an axis.
Window specification
For each row of the cell set (i.e. for each location), it indicates the set of other rows (i.e. the other locations) that can be read to compute the result of the operation defined by the analytic (window) function.
The window specification can be created in three parts:
- An ordering specification (mandatory) to define how aggregated values are ordered along the indicated hierarchy:
Window.orderBy("time")
. The ordering follows the rules implied by each level comparator. It can be reversed withWindow#reverse()
. - A partitioning specification (optional) to break up the window into separate smaller windows (partitions; one per level value) over which the analytic function is independently evaluated. Notice it only makes sense to indicate a level belonging to the ordered hierarchy because the window is automatically partitioned by the level values of other hierarchies.
- A frame specification (optional) that states which aggregated values will be included in the frame to evaluate the result at a particular location. For example,
Window.orderBy("time").rangeBetween(-5, 0)
describes a frame including the five values preceding the current value to the current value (included). Not specifying a frame is equivalent to defining an unbounded preceding frame, which means it includes all the values preceding the current value and including the current value.
Siblings
Window.siblings(CopperHierarchy hierarchy)
creates a special window that contains for a given row (evaluated location) of the cellset the set of other rows (locations) whose paths along the given hierarchy have the same set of ancestor coordinates.
For example, if the requested cellset contains the following locations, the window for each location is indicated by the set of corresponding row ids:
row id | location | window |
---|---|---|
0 | AllMember\2018\January | [0, 1, 2] |
1 | AllMember\2018\February | [0, 1, 2] |
2 | AllMember\2018\March | [0, 1, 2] |
3 | AllMember\2019\January | [3, 4, 5] |
4 | AllMember\2019\February | [3, 4, 5] |
5 | AllMember\2019\March | [3, 4, 5] |
The rows 0, 1 and 2 have the same window because they have the same ancestor (parent): [AllMember, 2018] and the rows 3, 4, and 5 have the same window because they have the same ancestor: [AllMember, 2019] However, for the following cellset, the window is identical for all locations because they all have a common ancestor which is AllMember:
row id | location | window |
---|---|---|
0 | AllMember\2018 | [0, 1, 2] |
1 | AllMember\2019 | [0, 1, 2] |
2 | AllMember\2020 | [0, 1, 2] |
By default, the window contains the row being evaluated. You can exclude it with:
Window.siblings(Copper.hierarchy("time")).excludeCurrent();
The previous result becomes:
row id | location | window |
---|---|---|
0 | AllMember\2018 | [1, 2] |
1 | AllMember\2019 | [0, 2] |
2 | AllMember\2020 | [0, 1] |
Aggregate analytic function
An aggregation analytic function is a function used to do aggregation over the values belonging to the window. It is typically useful to define a 'running sum' or a 'prefix sum'.
Copper.sum(Copper.measure("pnl.SUM")).over(Window.orderBy("time"));
Every aggregation function can be used, such as max, min, longSum, or shortSum. You can also use custom ones with Copper.agg(CopperMeasure measure, String pluginKey)
.
Examples of ordering, partitioning, and using different aggregation functions illustrated with a pivot table, given:
- a cube with a "time" hierarchy with three levels: "year", "month", "day".
- a pre-aggregated measure "pnl.SUM":
CopperMeasure pnl = Copper.sum("pnl")
. - A measure to compute a running total of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.sum(pnl).over(Window.orderBy("time").withName("run. tot.")
. - A measure to compute a running total of "pnl.SUM" in reverse order along the time hierarchy, each level being naturally ordered:
Copper.sum(pnl).over(Window.orderBy("time").reverse().withName("run. tot. reverse")
. - A measure to compute a max total of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.max(pnl).over(Window.orderBy("time").withName("run. max")
. - A measure to compute a running total of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.sum(pnl).over(Window.orderBy("time").partitionBy("year").withName("run. tot. partitionBy(year)")
. The cumulative sum starting over for each different year.
Year/Month/Day | pnl.SUM | run. tot. | run. tot. reverse | run. max | run. tot. partitionBy(year) |
---|---|---|---|---|---|
(ALL) | 17 | 17 | 17 | 17 | 17 |
2018 | 10 | 10 | 17 | 10 | 10 |
Jan | 7 | 7 | 17 | 7 | 7 |
01 | 5 | 5 | 17 | 5 | 5 |
02 | 1 | 6 | 12 | 5 | 6 |
04 | 1 | 7 | 11 | 5 | 7 |
Feb | 3 | 10 | 10 | 7 | 10 |
01 | 1 | 8 | 10 | 5 | 8 |
02 | 2 | 10 | 9 | 5 | 10 |
2019 | 7 | 17 | 7 | 10 | 7 |
Jan | 2 | 12 | 7 | 7 | 2 |
02 | 2 | 12 | 7 | 5 | 2 |
03 | 0 | 12 | 5 | 5 | 2 |
Feb | 5 | 17 | 5 | 7 | 7 |
01 | 3 | 15 | 5 | 5 | 5 |
02 | 2 | 17 | 2 | 5 | 7 |
Navigation analytic function
A navigation function computes the value at a given location by copying the value over a different location from the current location in the window. Notice that the window frame specification is ignored by every navigation analytic function.
The supported navigation function are:
- lag: returns the value of the given measure for a location at a given offset before the current location.
- lead: returns the value of the given measure for a location at a given offset after the current location.
- first: returns the value of the given measure with respect to the first location in the window
- last: returns the value of the given measure with respect to the last location in the window
Examples of ordering, partitioning, and using different navigation functions illustrated with a pivot table, given:
- a cube with a "time" hierarchy with three "year", "month", "day" levels.
- a pre-aggregated measure "pnl.SUM":
CopperMeasure pnl = Copper.sum("pnl)
. - A measure that returns the next values of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.lead(pnl, 1).over(Window.orderBy("time")).withName("lead")
. - A measure that returns the previous values of "pnl.SUM" along the time hierarchy, each level being naturally
ordered:
Copper.lag(pnl, 1).over(Window.orderBy("time")).withName("lag")
. - A measure that returns the first value of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.first(pnl).over(Window.orderBy("time")).withName("first")
. - A measure that returns the last value of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.last(pnl).over(Window.orderBy("time")).withName("last")
. - A measure that returns the first value of "pnl.SUM" along the time hierarchy for each year, each level
being naturally ordered:
Copper.first(pnl).over(Window.orderBy("time").partitionBy("year")).withName("first per year")
.
Year/Month/Day | pnl.SUM | lag | lead | first | last | first per year |
---|---|---|---|---|---|---|
(ALL) | 17 | null | null | 17 | 17 | 17 |
2018 | 10 | null | 7 | 10 | 7 | 10 |
Jan | 7 | null | 3 | 7 | 5 | 7 |
01 | 5 | null | 1 | 5 | 2 | 5 |
02 | 1 | 5 | 1 | 5 | 2 | 5 |
04 | 1 | 1 | 1 | 5 | 2 | 5 |
Feb | 3 | 7 | 2 | 7 | 5 | 7 |
01 | 1 | 1 | 2 | 5 | 2 | 5 |
02 | 2 | 1 | 2 | 5 | 2 | 5 |
2019 | 7 | 10 | null | 10 | 7 | 7 |
Jan | 2 | 3 | 5 | 7 | 5 | 2 |
02 | 2 | 2 | 0 | 5 | 2 | 2 |
03 | 0 | 2 | 3 | 5 | 2 | 2 |
Feb | 5 | 2 | null | 7 | 5 | 2 |
01 | 3 | 0 | 2 | 5 | 2 | 2 |
02 | 2 | 3 | null | 5 | 2 | 2 |
Notice that aggregate and navigation analytic functions can apply query cube filter on prefetches (i.e. reading location) as follows:
Copper.first(Copper.sum("pnl"))
.over(Window.orderBy("time"))
.withFilter(AnalyticFilter.PREFETCH_AND_RESULT)
.withName("first");
Shift Measures
Shifting refers to copying the value of a measure at a location other than the evaluated location.
Location shift
Shifting a measure requires the definition of:
- the level(s) on which to shift
- the operation describing the shift
When the query expresses enough levels, the aggregated value is read from the location specified by the shift function.
There are two ways of defining a shift function:
- through a constant value. Example: shifting at
day = 1
. Every time the shifted measure is evaluated at a certain day, the value for day 1 is taken instead - provided that day 1 exists; otherwise, it returns an empty result.
For constant shifts, the query must express the parent level of a shifted level. Considering a multi-level hierarchy "year" then "month" then "day", the value of the shifted measure at "year 2020 month 6" is the value at "year 2020 month 6 & day 1". If month is absent, we are asking for "day 1 of year 2020" but without knowing the month. As this does not make sense, the result is empty. - through a lambda. This allows you to create dynamic shifts, such as "previous day", "next month", and so on.
Example: shifting atday -> day - 1
shows the value of "day 5" when evaluated at "day 6", and returns an empty result when evaluated at "day 1" - provided that "day 0" does not exist.
Unlike constant shifts, the shift is performed only when all the query location expresses the levels involved in the lambda. Otherwise, the result is empty.
Using the same example as above, the value of the shifted measure at "month 6" or at "year 2019" are empty.
Here is an example of various shift measures illustrated with a pivot table, given:
- a "date" hierarchy with the levels "year", "month", "day"
- a classic measure pnl.SUM
- a constant-shifted measure d1
d1 = pnlSum.shift(
Copper.level("date", "date", "day").atValue(1)) - a constant-shifted measure d1m6
d1m6 = pnlSum.shift(
Copper.level("date", "date", "day").atValue(1),
Copper.level("date", "date", "month").atValue(6)) - a constant-shifted measure d1m6y2019
d1m6y2019 = pnlSum.shift(
Copper.level("date", "date", "day").atValue(1),
Copper.level("date", "date", "month").atValue(6),
Copper.level("date", "date", "year").atValue(2019)) - a lambda-shifted measure prevDay
prevDay = pnlSum.shift(
Copper.level("date", "date", "day")
.at((Integer d) -> d - 1)) - a lambda-shifted measure prevDayPrevMonth
prevDayPrevMonth = pnlSum.shift(
CopperLevel.at(Arrays.asList(
Copper.level("date", "date", "day"),
Copper.level("date", "date", "month")),
a -> {
a.writeInt(0, a.readInt(0) - 1);
a.writeInt(1, a.readInt(1) - 1);
})) - a lambda-shifted measure prevDayPrevMonthPrevYear
prevDayPrevMonthPrevYear = pnlSum.shift(
CopperLevel.at(Arrays.asList(
Copper.level("date", "date", "day"),
Copper.level("date", "date", "month")),
a -> {
a.writeInt(0, a.readInt(0) - 1);
a.writeInt(1, a.readInt(1) - 1);
a.writeInt(2, a.readInt(2) - 1);
}))
ALL/year/month/day | pnl.SUM | d1 | d1m6 | d1m6y2019 | prevDay | prevDayPrevMonth | prevDayPrevMonthPrevYear |
---|---|---|---|---|---|---|---|
(ALL) | 1000 | 7 | |||||
2018 | 200 | 1 | 7 | 200 | |||
2018/05 | 40 | 4 | 1 | 7 | 40 | 40 | |
2018/05/01 | 4 | 4 | 1 | 7 | |||
2018/05/02 | 3 | 4 | 1 | 7 | 4 | ||
... | |||||||
2018/06 | 30 | 1 | 1 | 7 | 30 | 30 | |
2018/06/01 | 1 | 1 | 1 | 7 | |||
2018/06/02 | 2 | 1 | 1 | 7 | 1 | 4 | |
... | |||||||
... | |||||||
2019 | 300 | 7 | 7 | 300 | |||
2019/05 | 50 | 5 | 7 | 7 | 50 | 50 | |
2019/05/01 | 5 | 5 | 7 | 7 | |||
2019/05/02 | 6 | 5 | 7 | 7 | 5 | ||
... | |||||||
2019/06 | 10 | 7 | 7 | 7 | 10 | 10 | |
2019/06/01 | 7 | 7 | 7 | 7 | |||
2019/06/02 | 1 | 7 | 7 | 7 | 7 | 5 | 4 |
... |
It is possible to mix all types of shift function together in a single call to
shift
.
Parent value
To compute the value of a measure at its parent member in a multi-level hierarchy (or for at AllMember
if the hierarchy is single-level), use measure.parentValueOn(Copper.hierarchy(hierarchyName))
. This operation is also known as a drillup. You can additionally choose how many levels the operation should "go up" with a third int
parameter:
/*
* compute the "grandparent" along the date hierarchy.
* Since it is slicing, all locations will show the aggregated value for the current year.
*/
Copper.sum("pnl").parentValueOn(Copper.hierarchy("Time"), 2).withName(...).publish(context);
If the number of levels to drill up is greater than the level depth of the hierarchy in the evaluated location, the first level of the hierarchy is used instead (
AllMember
for non-slicing hierarchies, and the member of the first level otherwise). This means that unless the targeted value is actuallynull
, the parent value is nevernull
.
Total
measure.totalOn([hierarchies])
is used when instead of drilling up a constant number of levels one needs to
always refer to the aggregated value of the first level of the hierarchy.
Copper.sum("quantity")
.totalOn(Copper.hierarchy("time"))
.withName("Year total")
.publish(context);
Multiple hierarchies can be specified. In this case, the measure is aggregated at the top-level of all specified hierarchies.
Copper.sum("quantity")
.totalOn(Copper.hierarchy("time"),
Copper.hierarchy("product"))
.withName("Year total, all products")
.publish(context);
For example, considering sample sales data with on which we defined:
- a slicing
time
hierarchy, with two levelsYear
andQuarter
- a non-slicing single-level
product
hierarchy - a non-slicing single-level
store
hierarchy - a
quantity
measure, aggregated withCopper.sum()
The two total measures defined above would produce the following results:
Year | Quarter | Product | Store | Quantity | Year total | Year total, all products |
---|---|---|---|---|---|---|
2019 | Q1 | bike | Paris | 25 | 25 | 55 |
2019 | Q2 | glue | Lille | 10 | 10 | 10 |
2019 | Q4 | glue | Paris | 30 | 30 | 55 |
2020 | Q1 | glue | Paris | 50 | 55 | 55 |
2020 | Q3 | bike | Lille | 15 | 15 | 25 |
2020 | Q4 | glue | Paris | 5 | 55 | 55 |
2020 | Q4 | glue | Lille | 10 | 10 | 25 |
Grand total
measure.grandTotal()
can be used to drill across all hierarchies up to their top-level member.
Considering the same sales data as above, the grand total would produce the following results:
Copper.sum("quantity").grandTotal().publish(context);
Year | Quarter | Product | Store | Quantity | Grand total |
---|---|---|---|---|---|
2019 | Q1 | bike | Paris | 25 | 65 |
2019 | Q2 | glue | Lille | 10 | 65 |
2019 | Q4 | glue | Paris | 30 | 65 |
2020 | Q1 | glue | Paris | 50 | 80 |
2020 | Q3 | bike | Lille | 15 | 80 |
2020 | Q4 | glue | Paris | 5 | 80 |
2020 | Q4 | glue | Lille | 10 | 80 |
Marginal total
To drill up along all hierarchies except for a specific few, measure.marginalTotalOn([hierarchies])
can be used.
All non-specified hierarchies are drilled up to their top-level member.
The specified hierarchies are also called marginalized hierarchies.
Let us consider the same sales data as above. The total quantity sold for each product, regardless of which store or quarter they were sold in, can be obtained using a marginal total over the product hierarchy.
Copper.sum("quantity")
.marginalTotalOn(Copper.hierarchy("product")).withName(...)
.publish(context);
This marginal total would produce the following results:
Year | Quarter | Product | Store | Quantity | Marginal total |
---|---|---|---|---|---|
2019 | Q1 | bike | Paris | 25 | 25 |
2019 | Q2 | glue | Lille | 10 | 40 |
2019 | Q4 | glue | Paris | 30 | 40 |
2020 | Q1 | glue | Paris | 50 | 65 |
2020 | Q3 | bike | Lille | 15 | 15 |
2020 | Q4 | glue | Paris | 5 | 65 |
2020 | Q4 | glue | Lille | 10 | 65 |
For each location, the total is aggregated to the AllMember
of the Store
hierarchy, and to the
top-level member of the time hierarchy, which is the relevant year member since the hierarchy is
slicing.
The marginal total of a measure over a given set of hierarchies is equivalent to the regular total of this measure over all the other hierarchies of the cube.
Marginalized measures allows for creating these kinds of totals without having to specify every hierarchy in the cube.
In this example, the calculated marginal total is equivalent to:
Copper.sum("quantity")
.totalOn(Copper.hierarchy("time"), Copper.hierarchy("store"))
Aggregated measures with User-defined aggregate functions (UDAF)
Advanced aggregated measures reuse the concept of aggregated measures, but extend it to obtain much more flexible and potentially complex measures by:
- Allowing multiple fields of the selection as input for the aggregation
- Allowing the storage of multiple intermediate aggregate buffers to be used for computations.
This allows for the definition of complex measures displaying metrics, such as the variance of a field, the correlation coefficient between two fields, or the aggregated sum of a vector field scaled by the values of another field. The latter example is showcased here:
Considering the following data, we want to obtain a ScaledSum
measure returning the aggregated sum of the Vector
field values (containing integer vectors), scaled by the value of the Factor
field (containing double scalars):
Year | Quarter | Factor | Vector |
---|---|---|---|
2018 | Q1 | 0.6 | [10;15] |
2018 | Q2 | 0.7 | [20;14] |
2018 | Q3 | 0.8 | [8;10] |
2018 | Q4 | 0.9 | [15;12] |
2019 | Q1 | 1.0 | [14;10] |
2019 | Q2 | 1.0 | [16;17] |
2019 | Q3 | 1.0 | [6;15] |
2019 | Q4 | 1.0 | [13;15] |
2020 | Q1 | 2.0 | [4;5] |
This can be done with the following Copper code:
Copper.userDefinedAgg("Factor", "Vector")
.aggregationBuffer(Types.TYPE_DOUBLE_ARRAY)
.contribute(
(fact, buffer) -> {
final double factor = fact.readDouble(0);
final IVector vector = fact.readVector(1);
if (buffer.isNull(0)) {
// First call: create a new vector to hold the values.
IVector sumVector = buffer.allocateVector(0, vector.size());
sumVector.applyAsDouble(vector, (ignore, value) -> factor * value);
} else {
final IVector sumVector = buffer.readWritableVector(0);
sumVector.applyAsDouble(vector, (sum, value) -> sum + factor * value);
}
})
.merge((in, out) -> {
final IVector inVector = in.readVector(0);
out.readWritableVector(0).plus(inVector);
})
.withRemovalSupport(
(fact, buffer) -> {
final double factor = fact.readDouble(0);
final IVector vector = fact.readVector(1);
final IVector sumVector = buffer.readWritableVector(0);
sumVector.applyAsDouble(vector, (sum, value) -> sum - factor * value);
})
.outputFromBuffer(0)
.withName("ScaledSum");
Please note the use of
readWritableVector()
instead ofreadVector()
. It is done to avoid modifying the initial vector with theapplyAsDouble
method, which modifies a vector in place. One must always be careful not to modify stored data within an operation. Vectors must be cloned before modification to avoid side effects, either on other queries or even corrupting the aggregates store. The vector returned byreadWritableVector()
takes care of lazy-cloning (e.g. cloning it only if we need to modify it).
This measure returns the following values:
row id | location | ScaledSum |
---|---|---|
0 | AllMember | [96.9, 104.6] |
1 | AllMember\2018 | [39.9, 37.6] |
2 | AllMember\2019 | [49.0; 57.0] |
3 | AllMember\2020 | [8.0;10.0] |
The previous block can be divided into three parts, corresponding to the steps required to fully define an advanced aggregate measure as well as its UDAF:
First,
Copper.userDefinedAgg([...])
accepts a vararg ofString
values that correspond to the fields that will be aggregated. This method returns aCopperUserDefinedAggregateFunctionBuilder
builder to start defining the aggregate function. In our example, we need to aggregate theFactor
andVector
fields:Copper.userDefinedAgg("Factor", "Vector")
Calling
aggregationBuffer(int... types)
then specifies the types of the aggregation buffer that will be used by the function to store intermediary results.This argument defines the type of each buffer used for the intermediary computations of the aggregation. In our example, we have the following value for the
bufferTypes
argument:.aggregationBuffer(Types.TYPE_DOUBLE_ARRAY)
This is due to the fact that even though the
Vector
field contains vectors with integer values, the aggregates contain data scaled with double values, and therefore can be vector with double data.Since there is no need for more data in order to compute the final aggregates, that single buffer will be enough to perform the aggregation. This however is not the case for all the aggregate functions, and we might need to use multiple buffers and buffer types in order to compute the aggregates. Those will be declared in the
bufferTypes
argument.The
bufferTypes
argument defines the size and types of the buffers used when performing the aggregation. Judiciously choosing the buffers of your UDAF is of the utmost importance, both for performance and memory consumption reasons.The
contribute(SerializableBiConsumer<IArrayReader, IWritableBuffer> contributor)
method then declares the explicit operation performed by the aggregation function when aggregating a row of facts into the aggregates buffer. TheSerializableBiConsumer<IArrayReader, IWritableArray> contributor
argument is a lambda defining the operation performed when a row of the records is aggregated into the buffer of the aggregation. Let's consider the previous example, in which thecontributor
was:.contribute(
(fact, buffer) -> {
final double factor = fact.readDouble(0);
final IVector vector = fact.readVector(1);
if (buffer.isNull(0)) {
// First call: create a new vector to hold the values.
IVector sumVector = buffer.allocateVector(0, vector.size());
sumVector.applyAsDouble(vector, (ignore, value) -> factor * value);
} else {
final IVector sumVector = buffer.readWritableVector(0);
sumVector.applyAsDouble(vector, (sum, value) -> sum + factor * value);
}
})Here,
fact
is anArrayReader
containing the data of the fields previously declared as input of the aggregation, in the same order as the declaration.buffer
corresponds to theIWritableBuffer
. It contains the contributions of the previously aggregated facts. The content of this object is defined by thebufferTypes
argument.When defining the contribution lambda, it is crucial to avoid any unnecessary copies, as well as trying to minimize the amount of read/write operations as this lambda will be applied for every fact to aggregate.
Here, we read once and for all the values of the scaling factor, and only one vector is cloned per aggregate.
This vector must be cloned before being scaled to prevent corruption of the stored vector.
It is also important to note that when manipulating non-primitive types, the aggregation function must handle
null
values properly.Then, the
merge(SerializableBiConsumer<IArrayReader, IWritableBuffer> merger)
declares the operation performed when merging two aggregates buffers together. Themerger
argument is a lambda defining the operation performed when merging the contributions of two aggregated buffers into a single one. In our example, themerger
lambda was simple, as we need to sum the values of the vector already scaled when contributing the rows to the aggregates:.merge((in, out) -> {
final IVector inVector = in.readVector(0);
out.readWritableVector(0).plus(inVector);
})After applying the consumer, the second argument of the lambda is expected to contain the merged contribution of both input aggregates. Here,
in
is anArrayReader
containing the values of the pre-aggregated values that will be added into the aggregates of theout
buffer.It is also crucial to avoid mutating objects coming from the first input aggregate or from the Aggregate Providers because they could be reused by other queries. This is why
readWritableVector()
is used instead ofreadVector()
.Then an optional step allows to define how to decontribute facts from the buffer.
This is only used by Aggregate Providers that precompute the aggregates (like the BITMAP and the LEAF) but not for the JUST_IN_TIME which computes everything at query time.
.withRemovalSupport(
(fact, buffer) -> {
final double factor = fact.readDouble(0);
final IVector vector = fact.readVector(1);
final IVector sumVector = buffer.readWritableVector(0);
sumVector.applyAsDouble(vector, (sum, value) -> sum - factor * value);
})The
.withRemovalSupport([...])
method takes a lambda as input corresponding to the operation of removing the contribution of a fact from an aggregate, and can be seen as inverse operations to the contribute() lambda.Finally, the
output(SerializableFunction<IArrayReader, Object> terminateFunction, int outputType)
method completes the definition of both the aggregate function and the advanced measure by providing a third lambdaterminateFunction
corresponding to the final evaluation function applied to the aggregated buffers in order to return the value of the aggregates.In our example, the final aggregate is already available in our first buffer, so obtaining the aggregates can be a simple read operation:
.output(o -> o.read(0))
Since terminate functions are expected to often be simple read operations on the buffers, a shortcut method is available as shown in the full snippet:
.outputFromBuffer(0)
The
outputType
argument declares the output type of the aggregate function. Note that this is only a declaration and not a cast.
User-defined aggregate functions on Measures
User-defined aggregate functions can be used to perform high performance dynamic aggregation on CopperMeasure
s as well.
A typical use case that benefits directly from this feature is a variant of the vector scaling example presented previously.
Lets consider the following model: a base store including a Vector
field and a second store including the scaling factor.
Considering the mapping (Year, InstrumentId)
, there is a one-to-many relationship from the base store to the factor store,
making the use of a simple reference impossible, and thus the application of User-defined aggregate functions on fields.
Year | InstrumentId | Vector |
---|---|---|
2018 | 1 | [10;15] |
2018 | 2 | [20;14] |
2018 | 3 | [8;10] |
2019 | 1 | [15;12] |
2019 | 2 | [14;10] |
2019 | 3 | [16;17] |
Year | InstrumentId | TradeId | Factor |
---|---|---|---|
2018 | 1 | 1 | 0.6 |
2018 | 1 | 2 | 0.7 |
2018 | 2 | 2 | 1.0 |
2018 | 2 | 5 | 1.0 |
2018 | 3 | 5 | 1.0 |
2019 | 1 | 4 | 0.8 |
2019 | 2 | 1 | 0.8 |
2019 | 3 | 4 | 2.0 |
A naive approach could be to perform a copper join operation between both stores and define a Dynamic aggregation as depicted in the following code blocks. First, the join:
final CopperStore joinedStore = Copper.store(FACTOR_STORE).joinToCube()
.withMapping(YEAR)
.withMapping(INSTRUMENT_ID);
Copper.newHierarchy(TRADE_ID)
.fromField(joinedStore.field(TRADE_ID)).publish(context);
Then the copper calculation:
CopperMeasure vectorSum = Copper.sum(VECTOR).as("VectorSum");
CopperMeasure factorSum = Copper.sum(joinedStore.field(FACTOR)).as("FactorSum");
vectorSum.multiply(factorSum)
.per(Copper.level(TRADE_ID), Copper.level(YEAR)).sum()
.as("ScaledVectorRegular").publish(context);
Although this approach produces the expected results, it may exhibit performance bottlenecks. Particularly, performing vector scaling using dynamic aggregation will generate additional transient vectors in order to achieve the scaling operation, which increases the memory consumption considerably.
The Copper API resolves these constraints and provides a high-performance dynamic aggregation using user-defined aggregate function on measures. This is illustrated in the following code:
Copper.userDefinedAgg(factorSum, vectorSum)
.aggregationBuffer(Types.TYPE_DOUBLE_ARRAY)
.contribute((measures, buffer) -> {
final double factor = measures.readDouble(0);
final IVector vector = measures.readVector(1);
if (buffer.isNull(0)) {
final IVector scaledVector = buffer.allocateVector(0, vector.size());
scaledVector.applyAsDouble(vector, (ignore, value) -> factor * value);
} else {
final IVector scaledVector = buffer.readWritableVector(0);
scaledVector.applyAsDouble(vector, (current, value) -> current + factor * value);
}
})
.merge((in, out) -> {
final IVector inVector = in.readVector(0);
out.readWritableVector(0).plus(inVector);
})
.outputFromBuffer(0)
.per(Copper.level(TRADE_ID), Copper.level(YEAR))
.as("ScaledVectorUDAF").publish(context);
The required steps to fully define an advanced aggregate measure and its UDAF are the same as previously described, with the only following differences:
- Instead of manipulating facts, the underlying UDAF operates on measures
per(CopperLevel... levels)
defines the level above which the aggregation can be done differentlymerge(SerializableBiConsumer<IArrayReader, IWritableArray> merger)
declares the operation performed above the level defined in theper()
Store Lookup Measures
Store Lookups are supported in Copper. A store lookup is a measure that runs a get-by-key query on the given store. Each key field must be associated with either a level, or a measure. For a given location, the get-by-key query will be performed using the values of the selected levels and/or measures.
If a key field is mapped to a level, the value used for the get-by-key query is the discriminator of this level for the queried location. (If this level is expressed at the location) If it is mapped to a measure, the value used for the gey-by-key query is the value of the measure for the queried location.
If several fields are selected, a function must be provided to reduce their values to the final value of this post-processor for the given location.
Hobbies store: (k) - key field. | Name.(k) | Hobby.(k) | Budget | | -------- | ----------- | ------ | | Chris | Tennis | 4 | | Chris | Dance | 2 | | Pierre | Video Games | 0 | | Fiona | Dance | 1 |
There is also a cube with single level hierarchy : Name. It is possible to get the budget for each person that dances:
Copper.storeLookup("people")
.withMapping("name", Copper.level("name"))
.withMapping("hobby", Copper.constant("dance"))
.valuesOf("Budget")
.map(r -> "Money Spent " + r.readInt(0), Types.TYPE_STRING)
.as("Dance Budget")
.publish(context);
Cellset:
Chris | Money Spent 2 |
Fiona | Money Spent 1 |
Post-processor
Legacy post-processors are supported in Copper.
Calling Copper.newPostProcessor
gives you access to a builder which allows you to specify the post-processor's plugin key, its underlying measures, its properties and so on:
Copper.newPostProcessor(PLUGIN_KEY)
.withProperty("propKey", "propValue")
.withName(ppName)
.publish(context);
Measure metadata
A measure has several elements of metadata that can be configured in Copper, such as:
its name with
.withName()
. It uniquely identifies a measure so that two measures cannot have the same name. The measure name is the one used in MDX queries, in the UI wizard, and in Copper to reuse the measure in other calculations withCopper.measure(name)
. Measures named with.withName()
will always be published and available for queries by their name. Unnamed measures not explicitly published may not end up in the cube due to optimization that Copper may apply.its designated output type with
.withType()
. It is sometimes necessary to designate the correct type to reuse it later. For example, after usingCopperMeasure#map
.Note that
.withType()
does not change the type of the output values of a measure, but it changes the designated typing, as well as the behavior of measures having this measure as an underlying.As an example, the following measure :
Copper.count().withType(Types.TYPE_DOUBLE)
returns long values.
However,Copper.count().withType(Types.TYPE_DOUBLE).divide(Copper.constant(2L))
returns double values as the result of the division between double and long values.
This behavior is to oppose the following measure declaration:Copper.count().divide(Copper.constant(2L))
that always has long values as the result of the division operation between two long values.its formatter with
.withFormatter()
. The formatter changes the visual representation of the measure values in the MDX cell set.its measure group with
.withMeasureGroup()
.its folder with
.withinFolder()
. It impacts the path to be used when displaying the measure in the user interface.its visibility. A measure can be visible or not in the UI, but will still be available in queries as long as it is part of the cube description. Measures explicitly published are always visible, unless
.hidden()
is called. Other measures can be made visible with.visible()
. Please refer to the section about publication to know more about why some measures are not visible by default.