Creating and publishing measures
This section introduces the various ways of creating measures with the API, or referring to existing ones to use them in calculations.
Basics
Here we'll cover the most elementary Copper measures, which are created in just one step.
Native measures
The native measure contributors.COUNT
is available by calling Copper.count()
.
Aggregated measures
Aggregated measures refer to selection fields aggregated using a function plugin key. In the ActivePivot sandbox, such measures include pnl.SUM
.
In Copper, one can create such a measure by calling Copper.agg(fieldName, pluginKey)
. We have provided shorthand methods for the most common aggregations -- sum, avg, min, max -- to free the user from having to specify the plugin key every time.
The creation of pnl.SUM
becomes:
Copper.sum("pnl").publish(context);
The default name generated for such measures, unless overridden by the user, is of the form field.PLUGIN_KEY
.
Level member value
Copper allows the user to create a measure whose value is the member of a selected level in the evaluated location, when the level is expressed. This measure may be used for further calculations, or as is.
Copper.member(Copper.level("Desk")).as("DeskMemberValue").publish(context)
Here, Desk is the name of a level.
Example:
Desk | DeskMemberValue |
---|---|
(ALL) | |
DeskA | DeskA |
DeskB | DeskB |
As you can see, the measure named DeskMemberValue
returns null
on the grand total because the level Desk is not expressed.
Context value
Context values can be used in Copper, and creating a measure whose value is that of the context value of a given class at every cube coordinate is done with Copper.contextValue(contextValueClass)
.
Constant
Measures that have the same constant value at every cube coordinate can be used to express arithmetic calculations (see next section). They are created using Copper.constant(value)
.
Named measure
Copper allows to refer to any measure by its name. One can use such a feature to manipulate measures which were initially present in the cube, as well as any previously published Copper measure. This is done with Copper.measure(name)
.
The name of a measure is the user-given name if any (with CopperMeasure#as(String)
, see section about measure metadata for more information), or the default name generated by Copper.
Copper.sum("pnl").publish(context);
CopperMeasure m1 = function(Copper.measure("pnl.SUM")); // reuse pnl.SUM in a custom function
m1.publish(context);
Note that in this particular example, the whole body can be boiled down to a single equivalent instruction:
function(Copper.sum("pnl")).publish(context)
. See the section about measure publication for more details.
Simple operations
In this section we will review some of the elementary operations that can be performed on Copper measures to create new measures.
Arithmetic operations
Arithmetic operations such as +
, -
, *
, /
are naturally supported in Copper, with the methods plus
, minus
, multiply
, divide
available for any CopperMeasure
.
CopperMeasure pnlSUM = Copper.sum("pnl").publish(context);
pnlSUM.multiply(Copper.constant(2)).publish(context);
pnlSUM.plus(pnlSUM).publish(context);
Lambda functions
For more complex calculations, lambdas can be used: measure.map(value -> ...)
.
Copper.sum("pnl").map((Double pnl) -> pnl.longValue()).publish(context);
// mapToDouble can be used to manipulate primitive double types
Copper.sum("pnl").mapToDouble((Double pnl) -> Math.sqrt(pnl)).publish(context);
It is however advised to use the arithmetic operations as much as possible, since Copper is able to understand and optimize calculations when they are used, which it cannot do with lambdas.
Copper.sum("pnl").map((Double pnl) -> Math.pow(pnl, 2.)).publish(context); // DON'T
Copper.sum("pnl").multiply(Copper.sum("pnl")).publish(context); // DO
Measure combination
Multiple measures can be combined using Copper.combine(CopperMeasure...)
and then used in a calculation to produce a single measure with map(lambda)
. When a query is executed, the mapping function is given an IArrayReader
as an argument; this is an array-like object which contains the values of the combined measures at the evaluated location, in the same order. The array reader provides a method to get their values like read
, as well as a null-check with isNull
and primitive reads to prevent from boxing: readInt
, readLong
, readDouble
, readFloat
, and readBoolean
.
Copper.combine(
Copper.member(Copper.level("Desk"))),
Copper.sum("pnl")
.map(a -> {
final String desk = (String) a.read(0);
final double pnl = a.readDouble(1);
if ("Desk A".equals(desk)) {
return a * 2.;
} else {
return a;
}
})
.publish(context);
Advanced operations
Here we describe some of the more advanced measure operations in Copper.
Measure filtering
Computing the value of a measure considering only a subset of the records is called filtering. It is performed with a condition on a level, much like one would think "the value of the measure when only considering the contributions for the city of Paris". The CopperLevel
interface provides many condition methods such as eq
, lt
, geq
, etc., for =
, <
, ≥
, etc.
Copper.sum("pnl")
.filter(Copper.level("City").eq("Paris"))
.publish(context);
Complex conditions can be constructed using the and
and or
methods in CopperLevelConditions
; chaining filter
calls with different conditions will have the same result as and
'ing them.
Dynamic aggregation/Leaf operation
Dynamic aggregation refers to the aggregation of a measure using a certain function up to a level, and another aggregation function (or no aggregation) above that level.
One may wish to compute the sum of the sales for each category of products, but display the average sale amount when aggregating product categories. This is done like so:
Copper.sum("sales")
.per(Copper.level("Product Category"))
.avg()
.publish(context);
Some calculations only make sense when a certain level is expressed: as previously seen in the example above with the combine operator, the value of Copper.member(Copper.level("Desk")))
is null for the Grand Total. It is the user's responsibility to handle this case. If there is no reason to compute anything above the level Desk
, it is possible to do:
Copper.combine(Copper.member(Copper.level("Desk"))), Copper.sum("pnl")).map(a -> function(a))
.per(Copper.level("Desk"))
.doNotAggregateAbove();
It forces the combine operation to happen at Desk level thanks to .per(Copper.level("Desk"))
. Because it is mandatory to indicate what to do above the "per levels", we use .doNotAggregateAbove()
here to say that the combine operation does not need to be executed above the Desk level.
An equivalent, but more cumbersome and less efficient way of writing this would be:
Copper.combine(Copper.member(Copper.level("Desk"))), Copper.sum("pnl")).map(a -> {
if (a.isNull(0)) {
return null;
} else {
return function(a);
}
})
.per(Copper.level("Desk"))
.doNotAggregateAbove();
Aggregation levels hiding
It is possible to specify the aggregation behavior and decide when it should stop by using the doNotAggregateAbove(AggregationLevels mask)
method :
By default, when defining a measure with a dynamic aggregation operation followed by a doNotAggregateAbove()
,
the aggregation will be stopped along a hierarchy when at least one of the levels provided in the list is expressed, this corresponds to the doNotAggregateAbove(ANY)
behaviour.
A measure will not return values at a queried location if the aggregation has been stopped at a more granular level.
In some cases however, especially when having multi-level hierarchies with levels having functional overlapping, we want to be able to aggregate the measure
until all of the specified levels in a hierarchy are expressed in a query. This can be achieved by specifying doNotAggregateAbove(ALL)
when defining the measure.
Analytic functions
An analytic function is a function that computes aggregated values over a group of other aggregated values. The group is defined by a window built along a particular hierarchy used as an axis.
Window specification
It indicates for each row of the cell set (i.e. for each location) what is the set of other rows (i.e. what other locations) that can be read to compute the result of the operation defined by the analytic (window) function.
The window specification can be created with three parts:
- An ordering specification (mandatory) to define how aggregated values are ordered along the indicated hierarchy:
Window.orderBy("time")
. The ordering follows the rules imply by each level comparator. It can be reversed withWindow#reverse()
. - A partitioning specification (optional) to break up the window into separated smaller windows (partitions; one per level value) over which the analytic function is independently evaluated. Notice it only makes sense to indicate a level belonging to the ordered hierarchy because the window is automatically partitioned by the level values of other hierarchies.
- A frame specification (optional) that states which aggregated values will be included in the frame to evaluate the result at a particular location. For example,
Window.orderBy("time").rangeBetween(-5, 0)
describes a frame including the five values preceding the current value to the current value (included). Not specifying a frame is equivalent to defining an unbounded preceding frame i.e it includes all the values preceding the current value and including the current value.
Siblings
An other type of window exist: Window.siblings(CopperHierarchy hierarchy)
.
It creates a special window that contains for a given row (evaluated location) of the cellset the set of other rows (locations) whose paths along the given hierarchy have the same set of ancestor coordinates.
For example, if the requested cellset contains the following locations, the window for each location is indicated by the set of corresponding row ids:
row id | location | window |
---|---|---|
0 | AllMember\2018\January | [0, 1, 2] |
1 | AllMember\2018\February | [0, 1, 2] |
2 | AllMember\2018\March | [0, 1, 2] |
3 | AllMember\2019\January | [3, 4, 5] |
4 | AllMember\2019\February | [3, 4, 5] |
5 | AllMember\2019\March | [3, 4, 5] |
The rows 0, 1 and 2 have the same window because they have the same ancestor (parent): [AllMember, 2018] and the rows 3, 4 and 5 have the same window because they have the same ancestor: [AllMember, 2019] However, for the following cellset, the window is identical for all locations because they all have a common ancestor which AllMember:
row id | location | window |
---|---|---|
0 | AllMember\2018 | [0, 1, 2] |
1 | AllMember\2019 | [0, 1, 2] |
2 | AllMember\2020 | [0, 1, 2] |
Note that by default, the window contains the row being evaluated. It is possible to exclude it though with:
Window.siblings(Copper.hierarchy("time")).excludeCurrent();
The previous result becomes:
row id | location | window |
---|---|---|
0 | AllMember\2018 | [1, 2] |
1 | AllMember\2019 | [0, 2] |
2 | AllMember\2020 | [0, 1] |
Aggregate analytic function
An aggregation analytic function is a function used to do aggregation over the values belonging to the window. It is typically useful to define a 'running sum' or a 'prefix sum'
Copper.sum(Copper.measure("pnl.SUM")).over(Window.orderBy("time"));
Every aggregation function can be used: max, min, longSum, shortSum... or custome one with Copper.agg(CopperMeasure measure, String pluginKey)
.
Examples of ordering, partitioning and usage of different aggregation function illustrated with a pivot table, given:
- a cube with a "time" hierarchy with three levels: "year", "month", "day"
- a pre-aggregated measure "pnl.SUM": {@code CopperMeasure pnl = Copper.sum("pnl)}
- A measure to compute a running total of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.sum(pnl).over(Window.orderBy("time").as("run. tot.")
- A measure to compute a running total of "pnl.SUM" in reverse order along the time hierarchy, each level being naturally ordered:
Copper.sum(pnl).over(Window.orderBy("time").reverse().as("run. tot. reverse")
- A measure to compute a max total of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.max(pnl).over(Window.orderBy("time").as("run. max")
- A measure to compute a running total of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.sum(pnl).over(Window.orderBy("time").partitionBy("year").as("run. tot. partitionBy(year)")
. The cumulative sum starting over for each different year.
Year/Month/Day | pnl.SUM | run. tot. | run. tot. reverse | run. max | run. tot. partitionBy(year) |
---|---|---|---|---|---|
(ALL) | 17 | 17 | 17 | 17 | 17 |
2018 | 10 | 10 | 17 | 10 | 10 |
Jan | 7 | 7 | 17 | 7 | 7 |
01 | 5 | 5 | 17 | 5 | 5 |
02 | 1 | 6 | 12 | 5 | 6 |
04 | 1 | 7 | 11 | 5 | 7 |
Feb | 3 | 10 | 10 | 7 | 10 |
01 | 1 | 8 | 10 | 5 | 8 |
02 | 2 | 10 | 9 | 5 | 10 |
2019 | 7 | 17 | 7 | 10 | 7 |
Jan | 2 | 12 | 7 | 7 | 2 |
02 | 2 | 12 | 7 | 5 | 2 |
03 | 0 | 12 | 5 | 5 | 2 |
Feb | 5 | 17 | 5 | 7 | 7 |
01 | 3 | 15 | 5 | 5 | 5 |
02 | 2 | 17 | 2 | 5 | 7 |
Navigation analytic function
A navigation function computes the value at a given location by copying the value over a different location from the current location in the Window. Notice that the window frame specification is ignored by every navigation analytic function.
The supported navigation function are:
- lag: The lag function returns the value of the given measure for a location at a given offset before the current location.
- lead: The lead function returns the value of the given measure for a location at a given offset after the current location.
- first: The first function returns the value of the given measure with respect to the first location in the window
- last: The last function returns the value of the given measure with respect to the last location in the window
Examples of ordering, partitioning and usage of different navigation function illustrated with a pivot table, given:
- a cube with a "time" hierarchy with three "year", "month", "day" levels
- a pre-aggregated measure "pnl.SUM":
CopperMeasure pnl = Copper.sum("pnl)
- A measure that returns the next values of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.lead(pnl, 1).over(Window.orderBy("time")).as("lead")
- A measure that returns the previous values of "pnl.SUM" along the time hierarchy, each level being naturally
ordered:
Copper.lag(pnl, 1).over(Window.orderBy("time")).as("lag")
- A measure that returns the first value of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.first(pnl).over(Window.orderBy("time")).as("first")
- A measure that returns the last value of "pnl.SUM" along the time hierarchy, each level being naturally ordered:
Copper.last(pnl).over(Window.orderBy("time")).as("last")
- A measure that returns the first value of "pnl.SUM" along the time hierarchy for each year, each level
being naturally ordered:
Copper.first(pnl).over(Window.orderBy("time").partitionBy("year")).as("first per year")
Year/Month/Day | pnl.SUM | lag | lead | first | last | first per year |
---|---|---|---|---|---|---|
(ALL) | 17 | null | null | 17 | 17 | 17 |
2018 | 10 | null | 7 | 10 | 7 | 10 |
Jan | 7 | null | 3 | 7 | 5 | 7 |
01 | 5 | null | 1 | 5 | 2 | 5 |
02 | 1 | 5 | 1 | 5 | 2 | 5 |
04 | 1 | 1 | 1 | 5 | 2 | 5 |
Feb | 3 | 7 | 2 | 7 | 5 | 7 |
01 | 1 | 1 | 2 | 5 | 2 | 5 |
02 | 2 | 1 | 2 | 5 | 2 | 5 |
2019 | 7 | 10 | null | 10 | 7 | 7 |
Jan | 2 | 3 | 5 | 7 | 5 | 2 |
02 | 2 | 2 | 0 | 5 | 2 | 2 |
03 | 0 | 2 | 3 | 5 | 2 | 2 |
Feb | 5 | 2 | null | 7 | 5 | 2 |
01 | 3 | 0 | 2 | 5 | 2 | 2 |
02 | 2 | 3 | null | 5 | 2 | 2 |
Shift Measures
Shifting refers to copying the value of a measure at a location other than the evaluated location.
Location shift
Shifting a measure requires the definition of:
- the level(s) on which to shift
- the operation describing the shift
When all of the levels are expressed in a query location, the aggregated value is read from the modified location specified by the shift function. This function is applied to an array initially containing the values of the members at the evaluated location, in the order specified by the list of level names.
There are two ways of defining a shift function:
- via a constant value (or a series of constant values). This will always copy the aggregated value from the same location.
Example: shifting at "day = 1". Every time the shifted measure is evaluated at a certain day, the value for day 1 is taken instead (if it exists; otherwise, an empty result is returned).
For constant shifts, when the parent level of a shifted level is expressed in a retrieval, the shifted value is also taken. In the example, if "month" is the parent of "day" in a multi-level hierarchy, the value of the shifted measure at "month 6" will be the one of the underlying measure at "month 6 & day 1". One can see this as propagating the shift to higher aggregate levels. A benefit of having this is to be able to create a measure which has the same value at every location.
When neither the shifted level nor its parent are expressed, the non-shifted aggregated value is taken instead. In the previous example, if "year" is the parent level of "month", then the value of the shifted measure at "year 2019" is the same as the non-shifted measure at "year 2019". - via a lambda. This allows to create dynamic shifts such as "previous day", "next month", etc.
Example: shifting at "day -> day - 1" will show the value of "day 5" when evaluated at "day 6", and will return an empty result when evaluated at "day 1" (if there are no values for the day 0).
Contrarily to the constant shifts, the shift will be performed only when all of the levels involved in the lambda are expressed in the retrieval location. Otherwise, the non-shifted aggregated value is taken.
Using the same example as above, the value of the shifted measure at "month 6" is the same as the non-shifted measure at "month 6" and the value of the shifted measure at "year 2019" is the same as the non-shifted measure at "year 2019".\
For simplicity and result consistency reasons, a user should define all of their calculations on a given hierarchy in the same method call. This is because all permutations of
.at(Copper.level("date", "date", "year"), 2019)
.at(Copper.level("date", "date", "month"), 6)
.at(Copper.level("date", "date", "day"), 1)
are not equivalent. This is why this syntax is illegal.
Here is an example of various shift measures illustrated with a pivot table, given:
a "date" hierarchy with three "year", "month", "day" levels
a classic measure "pnl.SUM"
a constant-shifted measure d1
d1 = pnlSUM.at(Copper.level("date", "date", "day"), 1)
a constant-shifted measure d1m6
d1m6 = pnlSUM.at(Copper.level("date", "date", "day"), 1, Copper.level("date", "date", "month"), 6)
a constant-shifted measure d1m6y2019
d1m6y2019 = pnlSUM.at(Copper.level("date", "date", "day"), 1, Copper.level("date", "date", "month"), 6, Copper.level("date", "date", "year"), 2019)
a lambda-shifted measure prevDay
prevDay = pnlSUM.at(Copper.level("date", "date", "day"), (Integer d) -> d - 1)
a lambda-shifted measure prevDayPrevMonth
prevDayPrevMonth = pnlSUM.at( Arrays.asList(Copper.level("date", "date", "day"), Copper.level("date", "date", "month")), a -> { a.writeInt(0, a.readInt(0) - 1); a.writeInt(1, a.readInt(1) - 1); })
a lambda-shifted measure prevDayPrevMonthPrevYear
prevDayPrevMonthPrevYear = pnlSUM.at( Arrays.asList(Copper.level("date", "date", "day"), Copper.level("date", "date", "month"), Copper.level("date", "date", "year")), a -> { a.writeInt(0, a.readInt(0) - 1); a.writeInt(1, a.readInt(1) - 1); a.writeInt(2, a.readInt(2) - 1); })
ALL/year/month/day | pnl.SUM | d1 | d1m6 | d1m6y2019 | prevDay | prevDayPrevMonth | prevDayPrevMonthPrevYear |
---|---|---|---|---|---|---|---|
(ALL) | 1000 | 1000 | 1000 | 7 | 1000 | 1000 | 1000 |
2018 | 200 | 200 | 1 | 7 | 200 | 200 | 200 |
2018/05 | 40 | 4 | 1 | 7 | 40 | 40 | 40 |
2018/05/01 | 4 | 4 | 1 | 7 | |||
2018/05/02 | 3 | 4 | 1 | 7 | 4 | ||
... | |||||||
2018/06 | 30 | 1 | 1 | 7 | 30 | 30 | 30 |
2018/06/01 | 1 | 1 | 1 | 7 | |||
2018/06/02 | 2 | 1 | 1 | 7 | 1 | 4 | |
... | |||||||
... | |||||||
2019 | 300 | 300 | 7 | 7 | 300 | 300 | 300 |
2019/05 | 50 | 5 | 7 | 7 | 50 | 50 | 50 |
2019/05/01 | 5 | 5 | 7 | 7 | |||
2019/05/02 | 6 | 5 | 7 | 7 | 5 | ||
... | |||||||
2019/06 | 10 | 7 | 7 | 7 | 10 | 10 | 10 |
2019/06/01 | 7 | 7 | 7 | 7 | |||
2019/06/02 | 1 | 7 | 7 | 7 | 7 | 5 | 4 |
... |
Parent value
To compute the value of a measure at its parent member in a multi-level hierarchy (or for at AllMember
if the hierarchy is single-level), use Copper.parentValue(measure, Copper.hierarchy(hierarchyName))
. This operation is also known as a drillup. One may additionally choose of how many levels the operation should "go up" with a third int
parameter:
/*
* compute the "grandparent" along the date hierarchy.
* Since it is slicing, all locations will show the aggregated value for the current year.
*/
Copper.parentValue(Copper.sum("pnl"), Copper.hierarchy("Time"), 2).publish(context);
If the numbers of levels to drill up is greater than the level depth of the hierarchy in the evaluated location, the first level of the hierarchy is used instead (
AllMember
for non-slicing hierarchies, and the member of the first level otherwise); this means that unless the targeted value is actuallynull
, the parent value will never benull
.
Total
Copper.total
is used when instead of drilling up a constant number of levels one needs to always refer to the aggregated value of the first level of the hierarchy.
Copper.total(Copper.sum("pnl"), Copper.hierarchy("time")).publish(context);
Post-processor
"Legacy" post-processors are supported in Copper.
Calling Copper.newPostProcessor
will give the user access to a builder which will allow to specify the post-processor's plugin key, its underlying measures, its properties and so on:
Copper.newPostProcessor(PLUGIN_KEY)
.withProperty("propKey", "propValue")
.as(ppName)
.publish(context);
Measure metadata
A measure has several elements of metadata that can be configured in Copper, such as:
its name with
.as()
. It uniquely identifies a measure so that two measures cannot have the same name. The measure name is the one used in MDX queries, in the UI wizard, and in Copper to reuse the measure in other calculations withCopper.measure(name)
. Measures named with.as()
will always be published and available for queries by their name. Unnamed measures not explicitly published may not end up in the cube due to optimization that Copper could applied.its designated output type with
.withType()
. It is sometimes necessary to designate the correct type of a measure to reuse it later: after usingCopperMeasure#map
, for instance.Note that
.withType()
does not change the type of the output values of a measure , it however changes the designated typing, as well as the behaviour of measures having this measure as an underlying.As an example, the following measure :
Copper.count().withType(Types.TYPE_DOUBLE)
returns long values.
However ,Copper.count().withType(Types.TYPE_DOUBLE).divide(Copper.constant(2L))
will return double values, as the result of the division between double and long values.
This behavior is to oppose to the following measure declaration :Copper.count().divide(Copper.constant(2L))
that will always have long values, as the result of the division operation between two long values.its formatter with
.withFormatter()
. The formatter changes the visual representation of the measure values in the MDX cell set.its measure group with
.withMeasureGroup()
its folder with
.withinFolder()
. It impacts the path to be used when displaying the measure in the user interface.its visibility. A measure can be visible or not in the UI, but will still be available in queries as long as it is part of the cube description. Measures explicitly published are always visible, unless
.hidden()
is called. Other measures can be made visible with.visible()
. Please refer to the section about publication to know more about why some measures are not visible by default.