Interface IAggregateProviderDefinition
- All Superinterfaces:
Cloneable,IClone<IAggregateProviderDefinition>,IDescription,IPluginDefinition,IPropertiesDefinition,Serializable
- All Known Subinterfaces:
IPartialProviderDefinition
- All Known Implementing Classes:
AggregateProviderDefinition,PartialProviderDefinition
An aggregate provider is an object that provides the aggregated values of a measure at a given location.
A query can be represented as a computation tree. Each node of
this tree can be seen as a triplet location/measures/filter. Post-Processors are measures
that require the result of one or more underlying measures, thus adding more nodes to the tree.
The leaf nodes of the computation tree specifically contain aggregated measures.
An aggregate provider is able to create, for a given range location, a table-like data
structure which rows are concreate locations matching the range location and filter, and which columns are the node's aggregated measures.
This resulting table is similar to the output of a SQL function using the SUM aggregate function and a GROUP BY clause.
Example:
Database:
id | Country | Currency | pnl | ...
----+---------+----------+-----+ ...
0 | France | EUR | 4.0 | ...
1 | France | EUR | 1.2 | ...
2 | France | USD | 1.0 | ...
3 | France | USD | 0.2 | ...
4 | UK | GBP | 1.2 | ...
5 | UK | GBP | 2.2 | ...
Primitive Node:
location: {Currency = *, Country = *}
filter: {Country = France}
measures: [pnl.SUM]
Result:
rowId | locations | pnl.SUM | ...
------+----------------+---------+------
0 | (EUR, France) | 7.7 | ...
1 | (USD, France) | 1.2 | ...
To continue with the SQL analogy, this would be equivalent to a query `SELECT Currency, Country, SUM(pnl) WHERE COUNTRY = FRANCE GROUP BY CURRENCY, COUNTRY`.
An aggregate provider can compute aggregates on the fly, in which case it is called a JIT provider, or it can pre-compute them once and store them, ready to use.
Non JIT Aggregate Providers
Much like the aforementioned primitive nodes of a computation chain, a non-JIT aggregate provider is defined by specifying:
- The included levels.
- The included measures.
- A filter for each of the included levels, specifying the slices of the data that are kept in the provider.
Only one level can be included per hierarchy. In the case of a multi-level hierarchy, if a query's location is at a deeper level than the selected one, the provider cannot be used to answer the query, as its aggregates are not granular enough. On the contrary, if a query's location is at a higher level than the selected one, the aggregate provider will compute the necessary aggregated values based on the more granular ones that are stored.
A non-JIT aggregate provider contains several data structures:
- A columnar table containing the pre-computed aggregated values for each of the measures included in the aggregate provider.
- A point index, which is a dictionary of all the locations of the stored aggregates.
- A bitmap index, which is responsible for selecting the aggregates that match the query's location and filter.
Using these aggregate providers represents a trade-off between speed of query execution on one side, memory footprint and speed of transaction processing on the other.
There are two kinds of non-JIT aggregate providers: the bitmap one, and the leaf one. The only difference between the two is that the bitmap provider pre-computes a bitmap index and uses it at query time, while the leaf provider will compute the necessary bitmap index at query time.
- Author:
- ActiveViam
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringPlugin key defining an aggregate provider precomputing aggregates and indexing them with a bitmap index to optimize query execution.static final StringProperty that defines the size of each chunk within the columnar structures associated with this provider.static final intThe default value of theRANGE_SHARING.static final StringGlobal provider name.static final StringPlugin key defining an aggregate provider that delegates retrieval of aggregated values to an underlying database.static final StringPlugin key defining an aggregate provider lighter than theBITMAP_PLUGIN_TYPE.static final StringProperty used to modify a threshold above which range sharing attempts are dropped.static final StringProperty that defines a threshold for the ratio of aggregates that are modified within a transaction.static final StringProperty that defines the size of a block of vectors within the aggregate store of an aggregate provider.Fields inherited from interface com.activeviam.activepivot.core.intf.api.description.IPluginDefinition
ID -
Method Summary
Modifier and TypeMethodDescriptionReturns the filters applied to this provider.default StringgetName()Retrieves the aggregate provider name,GLOBAL_PROVIDER_NAMEby default.Returns the optional list of partial storage definitions.Gets the partitioning to use for this provider.The range sharing allows queries to share intermediate results i.e.Methods inherited from interface com.activeviam.activepivot.core.intf.api.description.IDescription
getValidationParameter, validateMethods inherited from interface com.activeviam.activepivot.core.intf.api.description.IPluginDefinition
getPluginKey, setPluginKeyMethods inherited from interface com.activeviam.activepivot.core.intf.api.description.IPropertiesDefinition
getProperties, setProperties
-
Field Details
-
JIT_PLUGIN_TYPE
Plugin key defining an aggregate provider that delegates retrieval of aggregated values to an underlying database.This aggregate provider will compute its partitioning using its underlying database. It then cannot accept a custom partitioning.
- See Also:
-
BITMAP_PLUGIN_TYPE
Plugin key defining an aggregate provider precomputing aggregates and indexing them with a bitmap index to optimize query execution.- See Also:
-
LEAF_PLUGIN_TYPE
Plugin key defining an aggregate provider lighter than theBITMAP_PLUGIN_TYPE. It does not store any bitmap index.- See Also:
-
RANGE_SHARING
Property used to modify a threshold above which range sharing attempts are dropped.This property's value must be an Integer, or the String representation of an Integer.
Any query against the cube can be seen as a triplet:
scopeLocation/List of Measures/Query Filter. The queried measures may have underlying measures, thus creating a computation chain. Each node of this chain can also be represented as one of these tripletsscopeLocation/List of Measures/Query Filter.In the event that multiple nodes require a specific measure, but for different scopes, ActivePivot's query planner will attempt to figure out in some of these scopes are "included" in some of the other scopes, and if their points (and associated values for the measure) can be deduced by aggregating more granular data coming from the other scope.
Example: In a particular query, there are two primitive retrievals asking for "pnl.SUM", one at the location:
Country: AllMember, Currency: EUR, and the second one at locationCountry: *, Currency: *. In this case, EUR is included in the wildcard on Currency of the second retrieval, and the top level on the Country hierarchy can be computed by summing the values of each country that were computed in the second retrieval.In this particular example, instead of computing both primitive retrievals, the query planner will have the first retrieval changed into a node that depends on the second primitive retrieval, and compute the result of the first node from the result of the second node.
Finding if there are such relationships between the scopes of different nodes requiring the same measure is a complex endeavour. This property allows users to control how hard we try to find them before dropping the attempt.
With this property, users can adjust the number of scope locations between which the algorithm attempts to find relationships.
- See Also:
-
DEFAULT_RANGE_SHARING
static final int DEFAULT_RANGE_SHARINGThe default value of theRANGE_SHARING. This value is high enough that range sharing will almost always be activated by default.- See Also:
-
CHUNK_SIZE
Property that defines the size of each chunk within the columnar structures associated with this provider.For a JIT provider, this defines the size of the chunks in the result of a query.
For a non-JIT provider, it defines both the size of the chunks in the aggregate store holding the aggregates and in the point index (represented as a columnar table, one column per level).
- See Also:
-
VECTOR_BLOCK_SIZE
Property that defines the size of a block of vectors within the aggregate store of an aggregate provider.- See Also:
-
REBUILD_LIMIT
Property that defines a threshold for the ratio of aggregates that are modified within a transaction.Above this threshold, the provider will rebuild itself instead of modifying each aggregated value one after the other.
- See Also:
-
GLOBAL_PROVIDER_NAME
Global provider name.- See Also:
-
-
Method Details
-
getPartialProviders
List<IPartialProviderDefinition> getPartialProviders()Returns the optional list of partial storage definitions. -
getPartitioningDescription
IPartitioningDescription getPartitioningDescription()Gets the partitioning to use for this provider.- Returns:
- partitioning to set up for the provider
-
getRangeSharingLimit
Integer getRangeSharingLimit()The range sharing allows queries to share intermediate results i.e. queries on grand total can use results on subtotals.- Returns:
- the maximum number of locations accumulated during a range sharing attempt on a query
- See Also:
-
getFilters
IPartialProviderDefinition.IPartialProviderFilters getFilters()Returns the filters applied to this provider. -
getName
Retrieves the aggregate provider name,GLOBAL_PROVIDER_NAMEby default.- Returns:
- The aggregate provider name.
-