Query Execution Plan

Introduction

A query can be made against a Cube either through external sources (such as ActiveUI, ActiveMonitor, Excel or through QueriesService), or internally by instantiating a GetAggregatesQuery and executing it against the Cube.

What is an Execution Plan ?

An Execution Plan is a specification of how Measures are queried.

During its evaluation, a submitted query is split up into requested locations and measures. There are two classes of measures:

Native and Primitive measures: Native measures are defined as being either of the count and update.timestamp built-in measures. Primitive measures are measures defined using a standard aggregation function on a datastore field (e.g. value.SUM)
Post-processed measures: These measures are produced by transforming existing measures. They are created using a Post-processor and are calculated at query time. Post-processed measures can also be chained together, each measure feeding the next one.

ActivePivot relies on the concept of AggregatesRetrieval to retrieve the requested measures for the queried locations. These retrievals are made against either the Cube aggregate providers (for primitive measures) or on the fly (for Post-processors). For primitive measures, we use a single AggregatesRetrieval per location, which calculates the value at the given location for all primitive measures with only one retrieval. For post-processed measures, we create as many AggregatesRetrievals as there are Post-processors at the given location.

The query Execution Plan (also referred to as the Query Plan) is built by ActivePivot based on the involved measures. It details the retrievals together with their dependencies, depending on the required Measures.

Partitioning

Retrievals can have one of the following partitioning types:

ConstantPartitioning means that either the datastore has no partition or the Retrieval is at the top-most level of the query plan
SinglePartitioning means that only one field is used for partitioning. For instance hash16(TradeId)
MultiplePartitioning results from the concatenation of multiple partitioning fields, together with their partitioning functions. For instance: hash16(TradeId)|IdentityFunction(AsOfDate)

Retrieval Types

Retrievals are categorized based on their complexity into two classes, as follows:

Standard Retrievals

They represent the basic retrieval for retrieving measures.

JITPrimitiveAggregatesRetrieval

This type of retrieval works with the JIT aggregates provider.

BitmapPrimitiveAggregatesRetrieval

This type of retrieval will calculate the measure on a single partition, and then report its result to a Merger.

PartialPrimitiveAggregatesRetrieval

This type works with the partial aggregates provider.

PostProcessedAggregatesRetrieval

This type of retrieval will calculate the value of a post-processed measure, once all of its underlying measures have finished their evaluation.

DistributedAggregatesRetrieval

This retrieval type will send a distributed message to a remote AP instance to retrieve a measure.

DistributedPostProcessedRetrieval

This is the equivalent of DistributedAggregatesRetrieval for post-processors.

Advanced Retrievals

These types of Retrieval are more advanced in the sense that they leverage ActivePivot capabilities during queries (cache usage, partitioning, rangesharing).

NoOpPrimitiveAggregatesRetrieval

This type of retrieval represents a retrieval operations where there is no content for the requested locations and measures.

PrimitiveAggregatesCacheRetrieval

This type retrieves primitive measures in the AggregatesCache.

PostProcessedCacheRetrieval

This is the equivalent of PrimitiveAggregatesCacheRetrieval for post-processors.

RangeSharingPrimitiveAggregatesRetrieval

This retrieval type implements the range sharing logic for primitive measures.

RangeSharingLinearPostProcessorAggregatesRetrieval

This retrieval type implements the range sharing logic for linear post-processors.

PrimitiveAnalysisAggregationRetrieval

This type of retrieval performs the aggregation on analysis hierarchies.

PrimitiveResultsMerger

If the evaluation of a measure needs to be done on the Datastore partitions, then one retrieval is created for each partition and these partitioned retrievals are then given to a Merger. The merger will wait for its children to finish and aggregate its children's results to create its own result.

PostProcessedResultsMerger

This is the equivalent of PrimitiveResultsMerger for post-processors.

Enable Execution Plan Logging

There are two modes for displaying the query plan.

Execution plan print: This mode prints the Execution Plan of the GetAggregatesQuery before its execution. The Execution Plan will therefore not contain any timing information.
Execution plan timing print: This mode prints the Execution Plan timing of the GetAggregatesQuery after its execution. The Execution Plan will therefore contain all the timing information.

These modes can be enabled dynamically using the QueryPlan Mbean,

or through query context values.

They can also be enabled through the Pivot context values,

pivot.getContext().set(IQueryMonitoring.class, new QueryMonitoring().enableExecutionPlanningPrint());

and in the Pivot description.

StartBuilding.cube("tweets")
        .withSingleLevelDimensions("sender_id")
        .withDimension("time")
        .withHierarchy("time")
        .withLevel("year")
        .withLevel("month")
        .withLevel("day")
        .withSharedContextValue(new QueryMonitoring().enableExecutionPlanningPrint())

In some cases the query plan may yield huge number of retrievals (thousands) which makes it almost impossible to read such plan and can even cause performance issue. The queryPlanSummary property allows to bypass this problem by excluding the retrieval descriptions from the query plan. Alternatively, it provides a query plan summary section grouping information such as the total number of retrievals, the list of retrieved measures, the retrievals count by type and the partitioning count by type (see example below).

Execution Plan Snippet

We distinguish two important sections within the query plan. For both sections, the snippets were extracted from an actual query plan for illustration purposes.

General Information

This section provides query general information such as the Pivot ID, the query context values, and other propreties such as the range sharing limit for instance.

General information:
-------------------
    ActivePivot: ActivePivotVersion [id=TestRangeSharingExecution-Pivot, epoch=7]
    RetrieverActivePivotAggregatesRetriever : Standard aggregates retriever on cube TestRangeSharingExecution-Pivot

Context values:
--------------
    IQueryMonitoring: QueryMonitoring [printExecutionPlan=true, printExecutionTiming=false]
    IBranch: null
    ISubCubeProperties: null
    ICubeFilter: null
    IAsOfEpoch: null

Additional properties:
---------------------
    Continuous: false
    Range sharing: 1000000
    Missed prefetches: WARN
    Cache: None

Planning:
--------
    Planning time: 0ms
        Execution context creation time: 0ms
    Planning finalization time: 1ms

Execution:
---------
    Total query execution time: 3ms

Here, the subsections Planning and Execution are included when executionTimingPrint is enabled.

Retrievals Section

The following illustrates the blocks detailing the query plan retrievals

Query plan:
----------
Retrieval #0: PostProcessedAggregatesRetrieval
  Location=
  Measures=[...]
  Filter=
  Partitioning=
  ...
  Start time   (in ms)=
  Elapsed time (in ms)=
  which depends on {
    Retrieval #1: ...
    Retrieval #2: ...
      which depends on {
        Retrieval #1: ...(see above for dependencies)
      }
  }

Each block is made of:

a unique id: #0
a type: PostProcessedAggregatesRetrieval
attributes, such as Location, Measures, Filter - (see snippets below)
timing attributes that are displayed when executionTimingPrint is enabled
an optional list of retrievals that the block depends on. In the example, Retrieval #0 waits for the results of Retrieval #1 and Retrieval #2 to execute its work. Retrievals are only detailed once. When the same retrieval appears somewhere else, only its id and type are mentioned, followed by the sentence (see above for dependencies).

Retrieval Attributes

Location

Retrieval #0: ...
  Location=
     Desk@Bookings:Desk=[*],
     Country@Geography:City=Europe\Spain\Barcelona,
     HistoricalDates@Time:AsOfDate=[2018, 2019]
...

This retrieval is performed to gather a measure for the location:

Desk@Bookings:Desk=[*],
Country@Geography:City=Europe\Spain\Barcelona,
HistoricalDates@Time:AsOfDate=[2018, 2019]

In this list, only the hierarchies not positioned on AllMember are listed. For each hierarchy, the output is structured as follow:

<hierarchy name>@<dimension name>:<level name>=<list of selected members>

hierarchy name and dimension name uniquely identify a Hierarchy, whereas level name provides the deepest expressed level.

[*] means that all the members of that level are requested. Europe\Spain\Barcelona means that only one City member is selected: Barcelona. Its full path from the root of the Hierarchy is expressed: Barcelona is under Spain, itself within Europe. Note that any AllMember members are absent from this pretty print. Finally, [2018, 2019] identifies a selection of 2 members.

The form [*]\[*] can exist, indicating that all members of the 1st and 2nd levels are requested. It is also possible to see Europe\[*], query all countries under Europe.

Measures

Retrieval #0: ...
  Location= ...
  Measures= [pnl.FOREX]

The Measures attribute lists all measures retrieved by the Retrieval. As stated earlier, retrievals for Post-processed measures process only one measure, while Retrievals for primitive measures can gather many measures in a single operation.

Filter

Retrieval #0: ...
   Filter= Global query filter
   ...

The Filter attribute shows the cube filter applied to the Retrieval.

Partitioning

Retrieval #0: ...
   Partitioning= ConstantPartitioning
   ...

The Partitioning attribute details the partitioning schema applied to the computation of the Retrieval. The more partitions, the more instances of the Retrieval are created. All Retrievals operate in parallel. In this snippet, ConstantPartitioning means that the computation is performed in a single partition.

Complete Retrieval Example

Retrieval #1: PrimitiveResultsMerger
  Location=
    Underlyings@Underlyings:UnderlierCurrency=[*],
    Desk@Bookings:Desk=[*],
    HistoricalDates@Time:AsOfDate=[*],
    TimeBucketDynamic@Time:TimeBucketDynamic=ANY
  Measures= [pnl.SUM]
  Filter= Global query filter
  Partitioning= value(AsOfDate)
  Measures provider= SimpleMeasuresProvider
  Start time   (in ms)= [4]
  Elapsed time (in ms)= [1]
  which depends on {
    Retrieval #2: JitPrimitiveAggregatesRetrieval
      Location=
        Underlyings@Underlyings:UnderlierCurrency=[*],
        Desk@Bookings:Desk=[*],
        HistoricalDates@Time:AsOfDate=[*],
        TimeBucketDynamic@Time:TimeBucketDynamic=ANY
      Measures= [pnl.SUM]
      Partitioning= hash8(TradeId) | value(AsOfDate)
      Start time   (in ms)= [4]
      Elapsed time (in ms)= [0]
}

For Retrieval #1, the Retrieval Type is PrimitiveResultsMerger. A Primitive measure is the aggregate of a single column in the Just-In-Time Aggregate Provider. The Measure that is used is pnl.SUM. The merger partitioning is partitioned by value on AsOfDate, while its dependency partitioning is partitioned by values on AsOfDate and by hash over 8 parts. The job of the merger is then to create a single result for each date from the 8 underlying results per date. An instance of PrimitiveResultMerger is created for each value of date. Each instance depends only on the underlying results for that given date, meaning that some results for given date values can be ready sooner than others.

Note that the location and measures of the result are identical. This step only recreates a single result out of 8.

Query Plan Summary

This section provides a summarized overview of the query plan.

Query Plan Summary:
-------------------
    Total number of retrievals: 4
    List of retrievals measures: [contributors.COUNT, additionalStorage]
    Retrievals count by type: {JITPrimitiveAggregatesRetrieval=1, PostProcessedAggregatesRetrieval=2, RangeSharingPrimitiveAggregatesRetrieval=1}
    Partitioning count by type: {Constant partitioning=4}

Query Plan Export

One drawback of the query plan logging is that it provides only a query plan for a single GetAggregatesQuery which makes it hard to visualize the result of queries with multiple passes (e.g subselect queries), and even harder for distributed queries especially when the instances are on different hosts.

The query plan export allows to report all the query plans of a given MDX query as a single JSON. This is provided via two REST services (GET/POST) allowing you to perform a query and export its execution plans. The result now contains the plan of each query part, that is, for each Data Cube (if any) and for each pass of the MDX query.

Please refer to the REST API documentation for detailed information and concrete examples.

ActivePivot

Datastore

ETL

Loading data from the cloud

Copper API

Streaming API

Advanced APIs

Server Endpoints

MDX

ContentServer

Client/Server Communication

Data Access Control

Monitoring

Introduction

What is an Execution Plan ?

Partitioning

Retrieval Types

Standard Retrievals

Advanced Retrievals

Enable Execution Plan Logging

Execution Plan Snippet

General Information

Retrievals Section

Retrieval Attributes

Location

Measures

Filter

Partitioning

Complete Retrieval Example

Query Plan Summary

Query Plan Export