Skip to main content

Creating and publishing measures

This section introduces the various ways of creating measures with the API, or referring to existing ones to use them in calculations.

Basics

Here we'll cover the most elementary Copper measures, created in just one step.

Native measures

The native measures contributors.COUNT and update.TIMESTAMP are available by calling respectively Copper.count() and Copper.timestamp().

Aggregated measures

Aggregated measures refer to selection fields aggregated using a function plugin key. In the ActivePivot sandbox, such measures include pnl.SUM.

In Copper, you can create such a measure by calling Copper.agg(fieldName, pluginKey). We have provided shorthand methods for the most common aggregations -- sum, avg, min, max -- to free the user from having to specify the plugin key every time.

The creation of pnl.SUM becomes:

Copper.sum("pnl").withName("pnl.SUM").publish(context);

All measures published and used by the user must be named using the withName function. The default name generated for such measures, unless overridden by the user, is of the form field.PLUGIN_KEY + CopperGeneratedSuffix.

Level member value

Copper allows you to create a measure whose value is the member of a selected level in the evaluated location, when the level is expressed. This measure may be used for further calculations, or as is.

Copper.member(Copper.level("Desk")).withName("DeskMemberValue").publish(context);

Here, Desk is the name of a level.

Example:

DeskDeskMemberValue
(ALL)
DeskADeskA
DeskBDeskB

As you can see, the measure named DeskMemberValue returns null on the grand total because the level Desk is not expressed.

Context value

Context values can be used in Copper, and creating a measure whose value is that of the context value of a given class at every cube coordinate is done with Copper.contextValue(contextValueClass).withName(measureName).

Constant

Measures that have the same constant value at every cube coordinate can be used to express arithmetic calculations (see next section). They are created using Copper.constant(value).

Named measure

Copper allows you to refer to any measure by its name. You can use this feature to manipulate measures which were initially present in the cube, as well as any previously published Copper measure. This is done with Copper.measure(name).

The name of a measure is the user-given name if any (with CopperMeasure#withName(String), see Measure metadata about measure metadata for more information), or the default name generated by Copper if the referenced measure is implicit/not visible.

Copper.sum("pnl").withName("pnl.SUM").publish(context);
CopperMeasure m1 = function(Copper.measure("pnl.SUM")); // reuse pnl.SUM in a custom function
m1.publish(context);

Note that in this particular example, the whole body can be boiled down to a single equivalent instruction: function(Copper.sum("pnl")).publish(context). See the section about measure publication for more details.

Simple operations

In this section we will review some elementary operations that can be performed on Copper measures to create new measures.

Arithmetic operations

Arithmetic operations such as +, -, *, / are naturally supported in Copper, with the methods plus, minus, multiply, divide available for any CopperMeasure.

CopperMeasure pnlSUM = Copper.sum("pnl").withName("pnl.SUM").publish(context);
pnlSUM.multiply(Copper.constant(2)).withName("pnl2.SUM").publish(context);
pnlSUM.plus(pnlSUM).withName("doubleSum").publish(context);

Lambda functions

For more complex calculations, lambdas can be used: measure.map(value -> ...).

Copper.sum("pnl").map((Double pnl) -> pnl.longValue()).withName("complexMeasure").publish(context);
// mapToDouble can be used to manipulate primitive double types
Copper.sum("pnl").mapToDouble((Double pnl) -> Math.sqrt(pnl)).withName("sqrtPNL").publish(context);

It is however advised to use the arithmetic operations as much as possible, since Copper is able to understand and optimize calculations when they are used, which it cannot do with lambdas.

Copper.sum("pnl").map((Double pnl) -> Math.pow(pnl, 2.)).withName(...).publish(context);   // DON'T
Copper.sum("pnl").multiply(Copper.sum("pnl")).withName(...).publish(context); // DO

Measure combination

Multiple measures can be combined using Copper.combine(CopperMeasure...) and then used in a calculation to produce a single measure with map(lambda). When a query is executed, the mapping function is given an IArrayReader as an argument. This is an array-like object which contains the values of the combined measures at the evaluated location, in the same order. The array reader provides a method to get their values like read, as well as a null-check with isNull and primitive reads to prevent from boxing: readInt, readLong, readDouble, readFloat, and readBoolean.

Copper.combine(
Copper.member(Copper.level("Desk")),
Copper.sum("pnl").withName("pnl.SUM"))
.map(a -> {
final String desk = (String) a.read(0);
final double pnl = a.readDouble(1);
if ("Desk A".equals(desk)) {
return pnl * 2;
} else {
return pnl;
}
})
.publish(context);

Measures used within the combine method are considered visible, and must be named by the user.

Advanced operations

Here we describe some of the more advanced measure operations in Copper.

Measure filtering

Computing the value of a measure considering only a subset of the records is called filtering. It is performed with a condition on a level, much like one would think "the value of the measure when only considering the contributions for the city of Paris". The CopperLevel interface provides many condition methods, such as eq, lt, geq, etc., for =, <, , and so on.

Filtering will generate a new instance of the measure. In the following example, two measures are published: pnl.SUM and pnlInParis.SUM.

CopperMeasure pnlSum = Copper.sum("pnl").withName("pnl.SUM").publish(context);
pnlSum
.filter(Copper.level("City").eq("Paris"))
.withName("pnlInParis.SUM")
.publish(context);

Complex conditions can be constructed using the and and or methods in CopperLevelConditions. Chaining filter calls with different conditions will have the same result as and'ing them.

However, Copper does enforce some rules on these conditions. The operator or can only be used between two conditions that apply to a single level. We do not support cross-level or, even within a hierarchy.

  • SUPPORTED:
    Copper.level("City").eq("Paris").or(Copper.level("City").eq("London"))
  • NOT SUPPORTED:
    Copper.level("City").eq("Paris").or(Copper.level("Country").eq("UK"))
  • NOT SUPPORTED:
    (Copper.level("City").eq("Paris").and(Copper.level("Country").eq("France")))
    .or(Copper.level("Country").eq("UK"))

If necessary, one can get around these restrictions by combining the levels (here: Country and City) with the measure (here: pnl.SUM) and manually filtering in the mapping function. The following code block demonstrates how to emulate the example unsupported conditions:

Copper.combine(Copper.level("City"), Copper.level("Country"), Copper.measure("pnl.SUM"))
.map(reader -> {
final String city = (String) reader.read(0);
final String country = (String) reader.read(1);
if (city.equals("Paris") || country.equals("UK")) {
return reader.readDouble(2);
}
return null;
});

However, doing this has a significant performance cost since it computes the measure on all locations and filters afterwards. A way to limit the performance impact is to use a regular filter prior to mapping, to reduce the number of irrelevant locations on which the measure will be computed. This technique is used in the following example, where we can filter out every fact that does not pertain to either Paris or London.

Copper.combine(Copper.level("City"), Copper.level("Country"), Copper.sum("pnl")
.filter(Copper.level("City").eq("Paris").or(Copper.level("City").eq("London"))))
.map(reader -> {
final String city = (String) reader.read(0);
final String country = (String) reader.read(1);
if (country.equals("France")) {
if (city .equals("Paris")) {
return reader.readDouble(2);
}
} else if (country.equals("UK")) {
if (city .equals("London")) {
return reader.readDouble(2);
}
}
return null;
});

Dynamic aggregation/Leaf operation

Dynamic aggregation refers to the aggregation of a measure using a certain function up to a level, and another aggregation function (or no aggregation) above that level.

You may want to compute the sum of the sales for each category of products, but display the average sale amount when aggregating product categories. This is done like so:

Copper.sum("sales").withName("sales.SUM")
.per(Copper.level("Product Category"))
.avg()
.withName("Sales average per category")
.publish(context);

Some calculations only make sense when a certain level is expressed: as previously seen in the example above with the combine operator, the value of Copper.member(Copper.level("Desk"))) is null for the Grand Total. It is the user's responsibility to handle this case. If there is no reason to compute anything above the level Desk, it is possible to do:

Copper.combine(
Copper.member(Copper.level("Desk")),
Copper.sum("pnl").withName("pnl.SUM"))
.map(a -> function(a))
.per(Copper.level("Desk"))
.doNotAggregateAbove();

It forces the combine operation to happen at Desk level thanks to .per(Copper.level("Desk")). Because it is mandatory to indicate what to do above the "per levels", we use .doNotAggregateAbove() here to say that the combine operation does not need to be executed above the Desk level. An equivalent, but more cumbersome and less efficient way of writing this would be:

Copper.combine(
Copper.member(Copper.level("Desk")),
Copper.sum("pnl").withName("pnl.SUM"))
.map(a -> {
if (a.isNull(0)) {
return null;
} else {
return function(a);
}
})
.per(Copper.level("Desk"))
.doNotAggregateAbove();

Aggregation levels hiding

It is possible to specify the aggregation behavior and decide when it should stop by using the doNotAggregateAbove(AggregationLevels mask) method :

By default, when defining a measure with a dynamic aggregation operation followed by a doNotAggregateAbove(), the aggregation will be stopped along a hierarchy when at least one of the levels provided in the list is expressed, this corresponds to the doNotAggregateAbove(ANY) behaviour. A measure will not return values at a queried location if the aggregation has been stopped at a more granular level.

In some cases however, especially when having multi-level hierarchies with levels having functional overlapping, we want to be able to aggregate the measure until all the specified levels in a hierarchy are expressed in a query. This can be achieved by specifying doNotAggregateAbove(ALL) when defining the measure.

In other specific cases, a measure may not have any functional meaning when showing values below a given level. In this case, it is possible to specify maskAtOrBelow() when defining the measures. Please note that such a measure will only return values when the location is at an aggregation level strictly superior to the predefined leaf levels.

The maskAtOrBelow(AggregationLevels mask) method can be used to specify the behaviour to follow when multiple leaf levels are defined.

Analytic functions

An analytic function is a function that computes aggregated values over a group of other aggregated values. The group is defined by a window built along a particular hierarchy used as an axis.

Window specification

For each row of the cell set (i.e. for each location), it indicates the set of other rows (i.e. the other locations) that can be read to compute the result of the operation defined by the analytic (window) function.

The window specification can be created in three parts:

  • An ordering specification (mandatory) to define how aggregated values are ordered along the indicated hierarchy: Window.orderBy("time"). The ordering follows the rules implied by each level comparator. It can be reversed with Window#reverse().
  • A partitioning specification (optional) to break up the window into separate smaller windows (partitions; one per level value) over which the analytic function is independently evaluated. Notice it only makes sense to indicate a level belonging to the ordered hierarchy because the window is automatically partitioned by the level values of other hierarchies.
  • A frame specification (optional) that states which aggregated values will be included in the frame to evaluate the result at a particular location. For example, Window.orderBy("time").rangeBetween(-5, 0) describes a frame including the five values preceding the current value to the current value (included). Not specifying a frame is equivalent to defining an unbounded preceding frame, which means it includes all the values preceding the current value and including the current value.

Siblings

Window.siblings(CopperHierarchy hierarchy) creates a special window that contains for a given row (evaluated location) of the cellset the set of other rows (locations) whose paths along the given hierarchy have the same set of ancestor coordinates.

For example, if the requested cellset contains the following locations, the window for each location is indicated by the set of corresponding row ids:

row idlocationwindow
0AllMember\2018\January[0, 1, 2]
1AllMember\2018\February[0, 1, 2]
2AllMember\2018\March[0, 1, 2]
3AllMember\2019\January[3, 4, 5]
4AllMember\2019\February[3, 4, 5]
5AllMember\2019\March[3, 4, 5]

The rows 0, 1 and 2 have the same window because they have the same ancestor (parent): [AllMember, 2018] and the rows 3, 4, and 5 have the same window because they have the same ancestor: [AllMember, 2019] However, for the following cellset, the window is identical for all locations because they all have a common ancestor which is AllMember:

row idlocationwindow
0AllMember\2018[0, 1, 2]
1AllMember\2019[0, 1, 2]
2AllMember\2020[0, 1, 2]

By default, the window contains the row being evaluated. You can exclude it with:

Window.siblings(Copper.hierarchy("time")).excludeCurrent();

The previous result becomes:

row idlocationwindow
0AllMember\2018[1, 2]
1AllMember\2019[0, 2]
2AllMember\2020[0, 1]

Aggregate analytic function

An aggregation analytic function is a function used to do aggregation over the values belonging to the window. It is typically useful to define a 'running sum' or a 'prefix sum'.

Copper.sum(Copper.measure("pnl.SUM")).over(Window.orderBy("time"));

Every aggregation function can be used, such as max, min, longSum, or shortSum. You can also use custom ones with Copper.agg(CopperMeasure measure, String pluginKey).

Examples of ordering, partitioning, and using different aggregation functions illustrated with a pivot table, given:

  • a cube with a "time" hierarchy with three levels: "year", "month", "day".
  • a pre-aggregated measure "pnl.SUM": CopperMeasure pnl = Copper.sum("pnl").
  • A measure to compute a running total of "pnl.SUM" along the time hierarchy, each level being naturally ordered: Copper.sum(pnl).over(Window.orderBy("time").withName("run. tot.").
  • A measure to compute a running total of "pnl.SUM" in reverse order along the time hierarchy, each level being naturally ordered: Copper.sum(pnl).over(Window.orderBy("time").reverse().withName("run. tot. reverse").
  • A measure to compute a max total of "pnl.SUM" along the time hierarchy, each level being naturally ordered: Copper.max(pnl).over(Window.orderBy("time").withName("run. max").
  • A measure to compute a running total of "pnl.SUM" along the time hierarchy, each level being naturally ordered: Copper.sum(pnl).over(Window.orderBy("time").partitionBy("year").withName("run. tot. partitionBy(year)"). The cumulative sum starting over for each different year.
Year/Month/Daypnl.SUMrun. tot.run. tot. reverserun. maxrun. tot. partitionBy(year)
(ALL)1717171717
20181010171010
Jan771777
01551755
02161256
04171157
Feb31010710
01181058
022109510
20197177107
Jan212772
02212752
03012552
Feb517577
01315555
02217257

A navigation function computes the value at a given location by copying the value over a different location from the current location in the window. Notice that the window frame specification is ignored by every navigation analytic function.

The supported navigation function are:

  • lag: returns the value of the given measure for a location at a given offset before the current location.
  • lead: returns the value of the given measure for a location at a given offset after the current location.
  • first: returns the value of the given measure with respect to the first location in the window
  • last: returns the value of the given measure with respect to the last location in the window

Examples of ordering, partitioning, and using different navigation functions illustrated with a pivot table, given:

  • a cube with a "time" hierarchy with three "year", "month", "day" levels.
  • a pre-aggregated measure "pnl.SUM": CopperMeasure pnl = Copper.sum("pnl).
  • A measure that returns the next values of "pnl.SUM" along the time hierarchy, each level being naturally ordered: Copper.lead(pnl, 1).over(Window.orderBy("time")).withName("lead").
  • A measure that returns the previous values of "pnl.SUM" along the time hierarchy, each level being naturally ordered: Copper.lag(pnl, 1).over(Window.orderBy("time")).withName("lag").
  • A measure that returns the first value of "pnl.SUM" along the time hierarchy, each level being naturally ordered: Copper.first(pnl).over(Window.orderBy("time")).withName("first").
  • A measure that returns the last value of "pnl.SUM" along the time hierarchy, each level being naturally ordered: Copper.last(pnl).over(Window.orderBy("time")).withName("last").
  • A measure that returns the first value of "pnl.SUM" along the time hierarchy for each year, each level being naturally ordered: Copper.first(pnl).over(Window.orderBy("time").partitionBy("year")).withName("first per year").
Year/Month/Daypnl.SUMlagleadfirstlastfirst per year
(ALL)17nullnull171717
201810null710710
Jan7null3757
015null1525
02151525
04111525
Feb372757
01112525
02212525
2019710null1077
Jan235752
02220522
03023522
Feb52null752
01302522
0223null522

Notice that aggregate and navigation analytic functions can apply query cube filter on prefetches (i.e. reading location) as follows:

Copper.first(Copper.sum("pnl"))
.over(Window.orderBy("time"))
.withFilter(AnalyticFilter.PREFETCH_AND_RESULT)
.withName("first");

Shift Measures

Shifting refers to copying the value of a measure at a location other than the evaluated location.

Location shift

Shifting a measure requires the definition of:

  • the level(s) on which to shift
  • the operation describing the shift

When the query expresses enough levels, the aggregated value is read from the location specified by the shift function.

There are two ways of defining a shift function:

  • through a constant value. Example: shifting at day = 1. Every time the shifted measure is evaluated at a certain day, the value for day 1 is taken instead - provided that day 1 exists; otherwise, it returns an empty result.
    For constant shifts, the query must express the parent level of a shifted level. Considering a multi-level hierarchy "year" then "month" then "day", the value of the shifted measure at "year 2020 month 6" is the value at "year 2020 month 6 & day 1". If month is absent, we are asking for "day 1 of year 2020" but without knowing the month. As this does not make sense, the result is empty.
  • through a lambda. This allows you to create dynamic shifts, such as "previous day", "next month", and so on.
    Example: shifting at day -> day - 1 shows the value of "day 5" when evaluated at "day 6", and returns an empty result when evaluated at "day 1" - provided that "day 0" does not exist.
    Unlike constant shifts, the shift is performed only when all the query location expresses the levels involved in the lambda. Otherwise, the result is empty.
    Using the same example as above, the value of the shifted measure at "month 6" or at "year 2019" are empty.

Here is an example of various shift measures illustrated with a pivot table, given:

  • a "date" hierarchy with the levels "year", "month", "day"
  • a classic measure pnl.SUM
  • a constant-shifted measure d1
    d1 = pnlSum.shift(
    Copper.level("date", "date", "day").atValue(1))
  • a constant-shifted measure d1m6
    d1m6 = pnlSum.shift(
    Copper.level("date", "date", "day").atValue(1),
    Copper.level("date", "date", "month").atValue(6))
  • a constant-shifted measure d1m6y2019
    d1m6y2019 = pnlSum.shift(
    Copper.level("date", "date", "day").atValue(1),
    Copper.level("date", "date", "month").atValue(6),
    Copper.level("date", "date", "year").atValue(2019))
  • a lambda-shifted measure prevDay
    prevDay = pnlSum.shift(
    Copper.level("date", "date", "day")
    .at((Integer d) -> d - 1))
  • a lambda-shifted measure prevDayPrevMonth
    prevDayPrevMonth = pnlSum.shift(
    CopperLevel.at(Arrays.asList(
    Copper.level("date", "date", "day"),
    Copper.level("date", "date", "month")),
    a -> {
    a.writeInt(0, a.readInt(0) - 1);
    a.writeInt(1, a.readInt(1) - 1);
    }))
  • a lambda-shifted measure prevDayPrevMonthPrevYear
    prevDayPrevMonthPrevYear = pnlSum.shift(
    CopperLevel.at(Arrays.asList(
    Copper.level("date", "date", "day"),
    Copper.level("date", "date", "month")),
    a -> {
    a.writeInt(0, a.readInt(0) - 1);
    a.writeInt(1, a.readInt(1) - 1);
    a.writeInt(2, a.readInt(2) - 1);
    }))
ALL/year/month/daypnl.SUMd1d1m6d1m6y2019prevDayprevDayPrevMonthprevDayPrevMonthPrevYear
(ALL)10007
201820017200
2018/05404174040
2018/05/014417
2018/05/0234174
...
2018/06301173030
2018/06/011117
2018/06/02211714
...
...
201930077300
2019/05505775050
2019/05/015577
2019/05/0265775
...
2019/06107771010
2019/06/017777
2019/06/021777754
...

It is possible to mix all types of shift function together in a single call to shift.

Parent value

To compute the value of a measure at its parent member in a multi-level hierarchy (or for at AllMember if the hierarchy is single-level), use measure.parentValueOn(Copper.hierarchy(hierarchyName)). This operation is also known as a drillup. You can additionally choose how many levels the operation should "go up" with a third int parameter:

/*
* compute the "grandparent" along the date hierarchy.
* Since it is slicing, all locations will show the aggregated value for the current year.
*/
Copper.sum("pnl").parentValueOn(Copper.hierarchy("Time"), 2).withName(...).publish(context);

If the number of levels to drill up is greater than the level depth of the hierarchy in the evaluated location, the first level of the hierarchy is used instead (AllMember for non-slicing hierarchies, and the member of the first level otherwise). This means that unless the targeted value is actually null, the parent value is never null.

Total

measure.totalOn([hierarchies]) is used when instead of drilling up a constant number of levels one needs to always refer to the aggregated value of the first level of the hierarchy.

Copper.sum("quantity")
.totalOn(Copper.hierarchy("time"))
.withName("Year total")
.publish(context);

Multiple hierarchies can be specified. In this case, the measure is aggregated at the top-level of all specified hierarchies.

Copper.sum("quantity")
.totalOn(Copper.hierarchy("time"),
Copper.hierarchy("product"))
.withName("Year total, all products")
.publish(context);

For example, considering sample sales data with on which we defined:

  • a slicing time hierarchy, with two levels Year and Quarter
  • a non-slicing single-level product hierarchy
  • a non-slicing single-level store hierarchy
  • a quantity measure, aggregated with Copper.sum()

The two total measures defined above would produce the following results:

YearQuarterProductStoreQuantityYear totalYear total, all products
2019Q1bikeParis252555
2019Q2glueLille101010
2019Q4glueParis303055
2020Q1glueParis505555
2020Q3bikeLille151525
2020Q4glueParis55555
2020Q4glueLille101025

Grand total

measure.grandTotal() can be used to drill across all hierarchies up to their top-level member.

Considering the same sales data as above, the grand total would produce the following results:

Copper.sum("quantity").grandTotal().publish(context);
YearQuarterProductStoreQuantityGrand total
2019Q1bikeParis2565
2019Q2glueLille1065
2019Q4glueParis3065
2020Q1glueParis5080
2020Q3bikeLille1580
2020Q4glueParis580
2020Q4glueLille1080

Marginal total

To drill up along all hierarchies except for a specific few, measure.marginalTotalOn([hierarchies]) can be used. All non-specified hierarchies are drilled up to their top-level member. The specified hierarchies are also called marginalized hierarchies.

Let us consider the same sales data as above. The total quantity sold for each product, regardless of which store or quarter they were sold in, can be obtained using a marginal total over the product hierarchy.

Copper.sum("quantity")
.marginalTotalOn(Copper.hierarchy("product")).withName(...)
.publish(context);

This marginal total would produce the following results:

YearQuarterProductStoreQuantityMarginal total
2019Q1bikeParis2525
2019Q2glueLille1040
2019Q4glueParis3040
2020Q1glueParis5065
2020Q3bikeLille1515
2020Q4glueParis565
2020Q4glueLille1065

For each location, the total is aggregated to the AllMember of the Store hierarchy, and to the top-level member of the time hierarchy, which is the relevant year member since the hierarchy is slicing.

The marginal total of a measure over a given set of hierarchies is equivalent to the regular total of this measure over all the other hierarchies of the cube.

Marginalized measures allows for creating these kinds of totals without having to specify every hierarchy in the cube.

In this example, the calculated marginal total is equivalent to:

Copper.sum("quantity")
.totalOn(Copper.hierarchy("time"), Copper.hierarchy("store"))

Aggregated measures with User-defined aggregate functions (UDAF)

Advanced aggregated measures reuse the concept of aggregated measures, but extend it to obtain much more flexible and potentially complex measures by:

  • Allowing multiple fields of the selection as input for the aggregation
  • Allowing the storage of multiple intermediate aggregate buffers to be used for computations.

This allows for the definition of complex measures displaying metrics, such as the variance of a field, the correlation coefficient between two fields, or the aggregated sum of a vector field scaled by the values of another field. The latter example is showcased here:

Considering the following data, we want to obtain a ScaledSum measure returning the aggregated sum of the Vector field values (containing integer vectors), scaled by the value of the Factor field (containing double scalars):

YearQuarterFactorVector
2018Q10.6[10;15]
2018Q20.7[20;14]
2018Q30.8[8;10]
2018Q40.9[15;12]
2019Q11.0[14;10]
2019Q21.0[16;17]
2019Q31.0[6;15]
2019Q41.0[13;15]
2020Q12.0[4;5]

This can be done with the following Copper code:

Copper.userDefinedAgg("Factor", "Vector")
.aggregationBuffer(Types.TYPE_DOUBLE_ARRAY)
.contribute(
(fact, buffer) -> {
final double factor = fact.readDouble(0);
final IVector vector = fact.readVector(1);
if (buffer.isNull(0)) {
// First call: create a new vector to hold the values.
IVector sumVector = buffer.allocateVector(0, vector.size());
sumVector.applyAsDouble(vector, (ignore, value) -> factor * value);
} else {
final IVector sumVector = buffer.readWritableVector(0);
sumVector.applyAsDouble(vector, (sum, value) -> sum + factor * value);
}
})
.merge((in, out) -> {
final IVector inVector = in.readVector(0);
out.readWritableVector(0).plus(inVector);
})
.withRemovalSupport(
(fact, buffer) -> {
final double factor = fact.readDouble(0);
final IVector vector = fact.readVector(1);
final IVector sumVector = buffer.readWritableVector(0);
sumVector.applyAsDouble(vector, (sum, value) -> sum - factor * value);
})
.outputFromBuffer(0)
.withName("ScaledSum");

Please note the use of readWritableVector() instead of readVector(). It is done to avoid modifying the initial vector with the applyAsDouble method, which modifies a vector in place. One must always be careful not to modify stored data within an operation. Vectors must be cloned before modification to avoid side effects, either on other queries or even corrupting the aggregates store. The vector returned by readWritableVector() takes care of lazy-cloning (e.g. cloning it only if we need to modify it).

This measure returns the following values:

row idlocationScaledSum
0AllMember[96.9, 104.6]
1AllMember\2018[39.9, 37.6]
2AllMember\2019[49.0; 57.0]
3AllMember\2020[8.0;10.0]

The previous block can be divided into three parts, corresponding to the steps required to fully define an advanced aggregate measure as well as its UDAF:

  • First, Copper.userDefinedAgg([...]) accepts a vararg of String values that correspond to the fields that will be aggregated. This method returns a CopperUserDefinedAggregateFunctionBuilder builder to start defining the aggregate function. In our example, we need to aggregate the Factor and Vector fields:

        Copper.userDefinedAgg("Factor", "Vector")
  • Calling aggregationBuffer(int... types) then specifies the types of the aggregation buffer that will be used by the function to store intermediary results.

    This argument defines the type of each buffer used for the intermediary computations of the aggregation. In our example, we have the following value for the bufferTypes argument:

       .aggregationBuffer(Types.TYPE_DOUBLE_ARRAY)

    This is due to the fact that even though the Vector field contains vectors with integer values, the aggregates contain data scaled with double values, and therefore can be vector with double data.

    Since there is no need for more data in order to compute the final aggregates, that single buffer will be enough to perform the aggregation. This however is not the case for all the aggregate functions, and we might need to use multiple buffers and buffer types in order to compute the aggregates. Those will be declared in the bufferTypes argument.

    The bufferTypes argument defines the size and types of the buffers used when performing the aggregation. Judiciously choosing the buffers of your UDAF is of the utmost importance, both for performance and memory consumption reasons.

  • The contribute(SerializableBiConsumer<IArrayReader, IWritableBuffer> contributor) method then declares the explicit operation performed by the aggregation function when aggregating a row of facts into the aggregates buffer. The SerializableBiConsumer<IArrayReader, IWritableArray> contributor argument is a lambda defining the operation performed when a row of the records is aggregated into the buffer of the aggregation. Let's consider the previous example, in which the contributor was:

    .contribute(
    (fact, buffer) -> {
    final double factor = fact.readDouble(0);
    final IVector vector = fact.readVector(1);
    if (buffer.isNull(0)) {
    // First call: create a new vector to hold the values.
    IVector sumVector = buffer.allocateVector(0, vector.size());
    sumVector.applyAsDouble(vector, (ignore, value) -> factor * value);
    } else {
    final IVector sumVector = buffer.readWritableVector(0);
    sumVector.applyAsDouble(vector, (sum, value) -> sum + factor * value);
    }
    })

    Here, fact is an ArrayReader containing the data of the fields previously declared as input of the aggregation, in the same order as the declaration. buffer corresponds to the IWritableBuffer. It contains the contributions of the previously aggregated facts. The content of this object is defined by the bufferTypes argument.

    When defining the contribution lambda, it is crucial to avoid any unnecessary copies, as well as trying to minimize the amount of read/write operations as this lambda will be applied for every fact to aggregate.

    Here, we read once and for all the values of the scaling factor, and only one vector is cloned per aggregate.

    This vector must be cloned before being scaled to prevent corruption of the stored vector.

    It is also important to note that when manipulating non-primitive types, the aggregation function must handle null values properly.

  • Then, the merge(SerializableBiConsumer<IArrayReader, IWritableBuffer> merger) declares the operation performed when merging two aggregates buffers together. The merger argument is a lambda defining the operation performed when merging the contributions of two aggregated buffers into a single one. In our example, the merger lambda was simple, as we need to sum the values of the vector already scaled when contributing the rows to the aggregates:

    .merge((in, out) -> {
    final IVector inVector = in.readVector(0);
    out.readWritableVector(0).plus(inVector);
    })

    After applying the consumer, the second argument of the lambda is expected to contain the merged contribution of both input aggregates. Here, in is an ArrayReader containing the values of the pre-aggregated values that will be added into the aggregates of the out buffer.

    It is also crucial to avoid mutating objects coming from the first input aggregate or from the Aggregate Providers because they could be reused by other queries. This is why readWritableVector() is used instead of readVector().

  • Then an optional step allows to define how to decontribute facts from the buffer.

    This is only used by Aggregate Providers that precompute the aggregates (like the BITMAP and the LEAF) but not for the JUST_IN_TIME which computes everything at query time.

    .withRemovalSupport(
    (fact, buffer) -> {
    final double factor = fact.readDouble(0);
    final IVector vector = fact.readVector(1);
    final IVector sumVector = buffer.readWritableVector(0);
    sumVector.applyAsDouble(vector, (sum, value) -> sum - factor * value);
    })

    The .withRemovalSupport([...]) method takes a lambda as input corresponding to the operation of removing the contribution of a fact from an aggregate, and can be seen as inverse operations to the contribute() lambda.

  • Finally, the output(SerializableFunction<IArrayReader, Object> terminateFunction, int outputType) method completes the definition of both the aggregate function and the advanced measure by providing a third lambda terminateFunction corresponding to the final evaluation function applied to the aggregated buffers in order to return the value of the aggregates.

    In our example, the final aggregate is already available in our first buffer, so obtaining the aggregates can be a simple read operation:

        .output(o -> o.read(0))

    Since terminate functions are expected to often be simple read operations on the buffers, a shortcut method is available as shown in the full snippet:

       .outputFromBuffer(0)

    The outputType argument declares the output type of the aggregate function. Note that this is only a declaration and not a cast.

User-defined aggregate functions on Measures

User-defined aggregate functions can be used to perform high performance dynamic aggregation on CopperMeasures as well. A typical use case that benefits directly from this feature is a variant of the vector scaling example presented previously.

Lets consider the following model: a base store including a Vector field and a second store including the scaling factor. Considering the mapping (Year, InstrumentId), there is a one-to-many relationship from the base store to the factor store, making the use of a simple reference impossible, and thus the application of User-defined aggregate functions on fields.

YearInstrumentIdVector
20181[10;15]
20182[20;14]
20183[8;10]
20191[15;12]
20192[14;10]
20193[16;17]
YearInstrumentIdTradeIdFactor
2018110.6
2018120.7
2018221.0
2018251.0
2018351.0
2019140.8
2019210.8
2019342.0

A naive approach could be to perform a copper join operation between both stores and define a Dynamic aggregation as depicted in the following code blocks. First, the join:

final CopperStore joinedStore = Copper.store(FACTOR_STORE).joinToCube()
.withMapping(YEAR)
.withMapping(INSTRUMENT_ID);
Copper.newHierarchy(TRADE_ID)
.fromField(joinedStore.field(TRADE_ID)).publish(context);

Then the copper calculation:

CopperMeasure vectorSum = Copper.sum(VECTOR).as("VectorSum");
CopperMeasure factorSum = Copper.sum(joinedStore.field(FACTOR)).as("FactorSum");
vectorSum.multiply(factorSum)
.per(Copper.level(TRADE_ID), Copper.level(YEAR)).sum()
.as("ScaledVectorRegular").publish(context);

Although this approach produces the expected results, it may exhibit performance bottlenecks. Particularly, performing vector scaling using dynamic aggregation will generate additional transient vectors in order to achieve the scaling operation, which increases the memory consumption considerably.

The Copper API resolves these constraints and provides a high-performance dynamic aggregation using user-defined aggregate function on measures. This is illustrated in the following code:

Copper.userDefinedAgg(factorSum, vectorSum)
.aggregationBuffer(Types.TYPE_DOUBLE_ARRAY)
.contribute((measures, buffer) -> {
final double factor = measures.readDouble(0);
final IVector vector = measures.readVector(1);
if (buffer.isNull(0)) {
final IVector scaledVector = buffer.allocateVector(0, vector.size());
scaledVector.applyAsDouble(vector, (ignore, value) -> factor * value);
} else {
final IVector scaledVector = buffer.readWritableVector(0);
scaledVector.applyAsDouble(vector, (current, value) -> current + factor * value);
}
})
.merge((in, out) -> {
final IVector inVector = in.readVector(0);
out.readWritableVector(0).plus(inVector);
})
.outputFromBuffer(0)
.per(Copper.level(TRADE_ID), Copper.level(YEAR))
.as("ScaledVectorUDAF").publish(context);

The required steps to fully define an advanced aggregate measure and its UDAF are the same as previously described, with the only following differences:

  • Instead of manipulating facts, the underlying UDAF operates on measures
  • per(CopperLevel... levels) defines the level above which the aggregation can be done differently
  • merge(SerializableBiConsumer<IArrayReader, IWritableArray> merger) declares the operation performed above the level defined in the per()

Store Lookup Measures

Store Lookups are supported in Copper. A store lookup is a measure that runs a get-by-key query on the given store. Each key field must be associated with either a level, or a measure. For a given location, the get-by-key query will be performed using the values of the selected levels and/or measures.

If a key field is mapped to a level, the value used for the get-by-key query is the discriminator of this level for the queried location. (If this level is expressed at the location) If it is mapped to a measure, the value used for the gey-by-key query is the value of the measure for the queried location.

If several fields are selected, a function must be provided to reduce their values to the final value of this post-processor for the given location.

Hobbies store: (k) - key field. | Name.(k) | Hobby.(k) | Budget | | -------- | ----------- | ------ | | Chris | Tennis | 4 | | Chris | Dance | 2 | | Pierre | Video Games | 0 | | Fiona | Dance | 1 |

There is also a cube with single level hierarchy : Name. It is possible to get the budget for each person that dances:

Copper.storeLookup("people")
.withMapping("name", Copper.level("name"))
.withMapping("hobby", Copper.constant("dance"))
.valuesOf("Budget")
.map(r -> "Money Spent " + r.readInt(0), Types.TYPE_STRING)
.as("Dance Budget")
.publish(context);

Cellset:

ChrisMoney Spent 2
FionaMoney Spent 1

Post-processor

Legacy post-processors are supported in Copper.

Calling Copper.newPostProcessor gives you access to a builder which allows you to specify the post-processor's plugin key, its underlying measures, its properties and so on:

Copper.newPostProcessor(PLUGIN_KEY)
.withProperty("propKey", "propValue")
.withName(ppName)
.publish(context);

Measure metadata

A measure has several elements of metadata that can be configured in Copper, such as:

  • its name with .withName(). It uniquely identifies a measure so that two measures cannot have the same name. The measure name is the one used in MDX queries, in the UI wizard, and in Copper to reuse the measure in other calculations with Copper.measure(name). Measures named with .withName() will always be published and available for queries by their name. Unnamed measures not explicitly published may not end up in the cube due to optimization that Copper may apply.

  • its designated output type with .withType(). It is sometimes necessary to designate the correct type to reuse it later. For example, after using CopperMeasure#map.

    Note that .withType() does not change the type of the output values of a measure, but it changes the designated typing, as well as the behavior of measures having this measure as an underlying.

    As an example, the following measure : Copper.count().withType(Types.TYPE_DOUBLE) returns long values.
    However, Copper.count().withType(Types.TYPE_DOUBLE).divide(Copper.constant(2L)) returns double values as the result of the division between double and long values.
    This behavior is to oppose the following measure declaration: Copper.count().divide(Copper.constant(2L)) that always has long values as the result of the division operation between two long values.

  • its formatter with .withFormatter(). The formatter changes the visual representation of the measure values in the MDX cell set.

  • its measure group with .withMeasureGroup().

  • its folder with .withinFolder(). It impacts the path to be used when displaying the measure in the user interface.

  • its visibility. A measure can be visible or not in the UI, but will still be available in queries as long as it is part of the cube description. Measures explicitly published are always visible, unless .hidden() is called. Other measures can be made visible with .visible(). Please refer to the section about publication to know more about why some measures are not visible by default.