Interface IAggregateProviderDefinition

All Superinterfaces:
Cloneable, IClone<IAggregateProviderDefinition>, IDescription, IPluginDefinition, IPropertiesDefinition, Serializable
All Known Subinterfaces:
IPartialProviderDefinition
All Known Implementing Classes:
AggregateProviderDefinition, PartialProviderDefinition

public interface IAggregateProviderDefinition extends IPluginDefinition, IClone<IAggregateProviderDefinition>
Definition of an aggregate provider.

An aggregate provider is an object that provides the aggregated values of a measure at a given location.

A query can be represented as a computation tree. Each node of this tree can be seen as a triplet location/measures/filter. Post-Processors are measures that require the result of one or more underlying measures, thus adding more nodes to the tree.

The leaf nodes of the computation tree specifically contain aggregated measures.

An aggregate provider is able to create, for a given range location, a table-like data structure which rows are concreate locations matching the range location and filter, and which columns are the node's aggregated measures.

This resulting table is similar to the output of a SQL function using the SUM aggregate function and a GROUP BY clause.

Example:

 Database:

   id | Country | Currency | pnl | ...
  ----+---------+----------+-----+ ...
    0 |  France |    EUR   | 4.0 | ...
    1 |  France |    EUR   | 1.2 | ...
    2 |  France |    USD   | 1.0 | ...
    3 |  France |    USD   | 0.2 | ...
    4 |    UK   |    GBP   | 1.2 | ...
    5 |    UK   |    GBP   | 2.2 | ...

 Primitive Node:

 location: {Currency = *, Country = *}
 filter: {Country = France}
 measures: [pnl.SUM]

 Result:

 rowId | locations      | pnl.SUM | ...
 ------+----------------+---------+------
  0  | (EUR, France)    |   7.7   | ...
  1  | (USD, France)    |   1.2   | ...
 

To continue with the SQL analogy, this would be equivalent to a query `SELECT Currency, Country, SUM(pnl) WHERE COUNTRY = FRANCE GROUP BY CURRENCY, COUNTRY`.

An aggregate provider can compute aggregates on the fly, in which case it is called a JIT provider, or it can pre-compute them once and store them, ready to use.

Non JIT Aggregate Providers

Much like the aforementioned primitive nodes of a computation chain, a non-JIT aggregate provider is defined by specifying:

  • The included levels.
  • The included measures.
  • A filter for each of the included levels, specifying the slices of the data that are kept in the provider.

Only one level can be included per hierarchy. In the case of a multi-level hierarchy, if a query's location is at a deeper level than the selected one, the provider cannot be used to answer the query, as its aggregates are not granular enough. On the contrary, if a query's location is at a higher level than the selected one, the aggregate provider will compute the necessary aggregated values based on the more granular ones that are stored.

A non-JIT aggregate provider contains several data structures:

  • A columnar table containing the pre-computed aggregated values for each of the measures included in the aggregate provider.
  • A point index, which is a dictionary of all the locations of the stored aggregates.
  • A bitmap index, which is responsible for selecting the aggregates that match the query's location and filter.

Using these aggregate providers represents a trade-off between speed of query execution on one side, memory footprint and speed of transaction processing on the other.

There are two kinds of non-JIT aggregate providers: the bitmap one, and the leaf one. The only difference between the two is that the bitmap provider pre-computes a bitmap index and uses it at query time, while the leaf provider will compute the necessary bitmap index at query time.

Author:
ActiveViam
  • Field Details

    • JIT_PLUGIN_TYPE

      static final String JIT_PLUGIN_TYPE
      Plugin key defining an aggregate provider that delegates retrieval of aggregated values to an underlying database.

      This aggregate provider will compute its partitioning using its underlying database. It then cannot accept a custom partitioning.

      See Also:
    • BITMAP_PLUGIN_TYPE

      static final String BITMAP_PLUGIN_TYPE
      Plugin key defining an aggregate provider precomputing aggregates and indexing them with a bitmap index to optimize query execution.
      See Also:
    • LEAF_PLUGIN_TYPE

      static final String LEAF_PLUGIN_TYPE
      Plugin key defining an aggregate provider lighter than the BITMAP_PLUGIN_TYPE. It does not store any bitmap index.
      See Also:
    • RANGE_SHARING

      static final String RANGE_SHARING
      Property used to modify a threshold above which range sharing attempts are dropped.

      This property's value must be an Integer, or the String representation of an Integer.

      Any query against the cube can be seen as a triplet: scopeLocation/List of Measures/Query Filter. The queried measures may have underlying measures, thus creating a computation chain. Each node of this chain can also be represented as one of these triplets scopeLocation/List of Measures/Query Filter.

      In the event that multiple nodes require a specific measure, but for different scopes, ActivePivot's query planner will attempt to figure out in some of these scopes are "included" in some of the other scopes, and if their points (and associated values for the measure) can be deduced by aggregating more granular data coming from the other scope.

      Example: In a particular query, there are two primitive retrievals asking for "pnl.SUM", one at the location: Country: AllMember, Currency: EUR, and the second one at location Country: *, Currency: *. In this case, EUR is included in the wildcard on Currency of the second retrieval, and the top level on the Country hierarchy can be computed by summing the values of each country that were computed in the second retrieval.

      In this particular example, instead of computing both primitive retrievals, the query planner will have the first retrieval changed into a node that depends on the second primitive retrieval, and compute the result of the first node from the result of the second node.

      Finding if there are such relationships between the scopes of different nodes requiring the same measure is a complex endeavour. This property allows users to control how hard we try to find them before dropping the attempt.

      With this property, users can adjust the number of scope locations between which the algorithm attempts to find relationships.

      See Also:
    • DEFAULT_RANGE_SHARING

      static final int DEFAULT_RANGE_SHARING
      The default value of the RANGE_SHARING. This value is high enough that range sharing will almost always be activated by default.
      See Also:
    • CHUNK_SIZE

      static final String CHUNK_SIZE
      Property that defines the size of each chunk within the columnar structures associated with this provider.

      For a JIT provider, this defines the size of the chunks in the result of a query.

      For a non-JIT provider, it defines both the size of the chunks in the aggregate store holding the aggregates and in the point index (represented as a columnar table, one column per level).

      See Also:
    • VECTOR_BLOCK_SIZE

      static final String VECTOR_BLOCK_SIZE
      Property that defines the size of a block of vectors within the aggregate store of an aggregate provider.
      See Also:
    • REBUILD_LIMIT

      static final String REBUILD_LIMIT
      Property that defines a threshold for the ratio of aggregates that are modified within a transaction.

      Above this threshold, the provider will rebuild itself instead of modifying each aggregated value one after the other.

      See Also:
    • GLOBAL_PROVIDER_NAME

      static final String GLOBAL_PROVIDER_NAME
      Global provider name.
      See Also:
  • Method Details

    • getPartialProviders

      List<IPartialProviderDefinition> getPartialProviders()
      Returns the optional list of partial storage definitions.
    • getPartitioningDescription

      IPartitioningDescription getPartitioningDescription()
      Gets the partitioning to use for this provider.
      Returns:
      partitioning to set up for the provider
    • getRangeSharingLimit

      Integer getRangeSharingLimit()
      The range sharing allows queries to share intermediate results i.e. queries on grand total can use results on subtotals.
      Returns:
      the maximum number of locations accumulated during a range sharing attempt on a query
      See Also:
    • getFilters

      Returns the filters applied to this provider.
    • getName

      default String getName()
      Retrieves the aggregate provider name, GLOBAL_PROVIDER_NAME by default.
      Returns:
      The aggregate provider name.