Skip to main content

What's new

This section provides a brief overview of the new features and improvements in the latest versions of ActivePivot.

For a detailed list of all changes, see our Changelog.

6.0

New features

  • Direct Query:

    • ActivePivot can now work on top of external databases. It is no longer mandatory to load the data in a Datastore.
    • A new Database API has be added to allow ActivePivot to transparently work on a Datastore or an external database.
  • JDK 17 support: ActivePivot is compatible with Java 11 and Java 17.

  • Partial Aggregate Providers: They can now be defined on a slice of data. One can specify values on a level to filter the facts that will be pre-aggregated.

  • Aggregation Functions: The POP_VAR, POP_STD, SAMPLE_VAR, and SAMPLE_STD aggregate functions have been added to the core. They are used to calculate the variance and standard deviation using the population and sample formulas. They can be used in dynamic aggregations by using the UserDefinedAggregationPostProcessor. A complete list of natively supported aggregation functions is available here, as well as guides to write your own functions.

  • Virtual Hierarchies: This is now fully available in 6.0. A virtual hierarchy is a new hierarchy implementation that does not store members in the cube. It represents a trade-off between a smaller memory footprint and faster commit times versus offering a slightly reduced set of features. At the moment:

    • member navigation between hierarchy members, as performed in Lead and Lag measures, are not allowed
    • MDX expressions including a virtual hierarchy must include the NON EMPTY keyword. In addition, if the expression describes a member expression the full member path must be provided
    • A virtual hierarchy cannot be slicing
    • Virtual hierarchy levels cannot be used as distributing fields.

Improvements

  • REST services were migrated to SpringMVC

  • Datastore Housekeeping:

    • The datastore API now provides additional operations to rebuild indexes, allowing to reduce their size and thus memory usage.
    • Empty partitions will be dropped automatically by the datastore at the end of transactions.
  • Distributed cubes: New option to disable the replication on distributed count, available when defining a query cube. This can speed up the evaluation of Mdx formulas.

  • New Content Server Rest Bulk API: API to do several operations in one call to reduce the latency.

  • Tracing: Tracing has been removed from APM and is now part of the core product. Please refer to the tracing documentation for the new API usage and details.

5.11

New features

  • Data Export Service: The data export service allows you to download the result of MDX queries to a stream or directly export them to files in the server or in the cloud. Three formats are available: Arrow, csvTabular and csvPivotTable. The exports can be customized. The basic export order will define a mdx query to run with its context values and an output configuration to specify the format of the output and for an export the file configuration.

  • Application Performance Monitoring (APM): ActivePivot APM is a solution for monitoring the healthiness and performance of ActivePivot instances. It provides several features easing the support work, and reducing the burden of maintaining and troubleshooting ActivePivot. The APM library provides the following main features:

    • Distributed log tracing: Allows to easily identify all the logs related to an operation ( e.g. query execution) across nodes
    • Query monitoring: Monitor in-progress, successful or failed MDX queries, their associated execution logs, and the historical performances
    • Cube size and usage monitoring: Display the trend of cube aggregates provider sizes over time.
    • Datastore monitoring: Display the trend of datastore sizes over time; Monitor the datastore transaction times; View successful and failed CSV loading, and their associated execution logs.
    • Netty monitoring: Monitor the size of data transferred between nodes, in terms of single query execution, all queries executed by a user, or all activities, etc.
    • Monitoring activities by user: Monitor activities (e.g. query execution) on a user level.
    • JVM monitoring: Overall JVM status, including CPU, heap and off-heap memory, threads, GC, etc.
  • Expose remote data cube physical addresses in distribution: The query cube now provides a method exposing the physical addresses of the remote servers hosting data cubes. This is done by providing endpoint information using the properties: ActiveViamProperty#DATA_CUBE_REST_ENDPOINT_PORT_PROPERTY and ActiveViamProperty#DATA_CUBE_REST_ENDPOINT_SUFFIX_PROPERTY. These properties can be used alone or cooperatively.

  • MdxQueryBouncer: The MdxQueryBouncer allows to limit the number of concurrently running MDX queries. Query bouncing can be configured using the following properties:

    • ActiveViamProperty#MDX_CONCURRENT_QUERY_LIMIT_PROPERTY : Specifies the maximum number of running MDX queries. Note that this will include mdx queries execution and update, and mdx stream registration and update.
    • ActiveViamProperty#MDX_CONCURRENT_QUERY_WAIT_LIMIT_PROPERTY : Specifies a timeout while waiting for a permit to execute an MDX query

    The above limits are also configurable dynamically using the MBean MdxQueryBouncer.

  • Source Parsing Report: ActivePivot data source API can now generate a parsing report detailing any error encountered during the parsing process. Now, the method ISource#fetch(...) returns a parsing reports provided that the property PARSING_REPORT_ENABLED is set to true and the underlying source support such reporting.

    Note that only the CSV Source and JDBC Source provide such reporting.

    For more details, see the documentation of the CSV source and of the JDBC source.

  • Excel Add-in: the Excel add-in is now part of the Activepivot core. This Excel extension allows users to:

    • see the Mdx query created by Excel
    • perform drillthroughs
    • move from slicers to slicers
    • modify context values.

    The installer of the Excel add-in can be found on Artifactory. The context service has been integrated to the ActivePivot code base as well (see Spreadsheet service documentation).

  • Virtual Hierarchies: This is an experimental feature available in 5.11.0 but still in development. A new hierarchy implementation to reduce the memory footprint and speedup the application. It can be used on high cardinality hierarchies in exchange for a reduced set of features. These hierarchies are inherently not able to access their members, which restricts the operations that can be performed on such hierarchies. These restrictions consist of:

    • Lead, Lag, PevMember and NextMember functions are not allowed on such hierarchies
    • Axis expressions including a virtual hierarchy must include the NON EMPTY key word. In addition, if the expression describes a member expression the full member path must be provided.
    • A Virtual hierarchy must not be slicing
    • Virtual hierarchy levels cannot be used as distributing fields Notice that although a query including a given hierarchy as virtual or not virtual will output the same results, internally the query execution may be slightly different.
  • Admin UI: This UI allows to browse the datastore.

Improvements

  • Performance improvement in Post Processors and Copper Measures: All abstract post processor classes have been rewritten. The new versions have the suffix V2. The goal of this rewriting is to prevent the boxing of primitive values into Objects, and reduce the memory footprint of queries. Both the initialization and the evaluation phases of the post-processors were redesigned, and more details can be found in the migration notes. This also lead to slight modifications of the Copper API.

  • Aliases on Datastore Queries: Fields of a Datastore query can now be aliased. The String representing the expression of the field can be replaced with a SelectionField containing the alias.

5.10

New features

  • User Defined Aggregate Functions on Measures: It is now possible for user-defined aggregate functions to work on measures (instead of facts). This can be achieved using the Copper API. The documentation is available here.
  • Copper Join hierarchy with multiple levels: The Copper API now lets you define Join hierarchies with several levels.
  • Option to set the visibility of hierarchies and dimensions through context values: the MDX context now has a feature that can override the visibility of a hierarchy or a dimension. This feature can be used through MdxContext::setHierarchyVisibility() or through the MDX context builder via IMdxContextBuilder::overrideHierarchyVisibility() (analogous methods exist for dimensions).
  • Copper Store Lookups: The Copper API lets you define store look-ups measures that run a get-by-key query on the selected store.

Improvements

  • Chunks gained a new compression method: Frequency compression. When the Datastore detects a value repeated many times in a chunk, the chunk is compressed to store only explicit values, while recording the implicit values repeated many times.
    The compression happens when a value appears more than x % in the chunk. x can be specified as a ratio through the property ActiveViamProperty#CHUNK_FREQUENCY_COMPRESSION_RATIO_PROPERTY. The compression can be enabled or disabled for each of the following five data types: int, float, long, double and object. This can be controlled through the property: ActiveViamProperty#ENABLED_FREQUENCY_COMPRESSIONS_PROPERTY.
  • When a data node is being shutdown, it will send a GoodbyeMessage to the query node which will immediately trigger an asynchronous task responsible for removing all subsequent contributions of the former. On success, a GoodbyeMessageApplicationEvent is generated. Otherwise, on failure of the removal task, a GoodbyeMessageApplicationFailureEvent is generated. This lets ActivePivot distinguish the clean removal of a data node from one that is momentarily unresponsive.
  • When executing a query, intermediary results that have been computed and that are not required by any remaining computations are now discarded and are available for garbage collection.

Miscellaneous

  • The Azure Cloud Source has migrated its dependency to the Azure Blob Storage SDK from version 8 to version 12. This change came along a number of changes in the module's interface. More about this can be read on the migration notes.

5.9

New features

  • New Copper API: The Copper API, allowing you to easily define measures and hierarchies, has been overhauled. Documentation for the API is available here.

  • User Defined Aggregate Functions: These new functions can access several fields and store several intermediate values. They can be defined using the Copper API or by extending AUserDefinedAggregateFunction. The documentation is available here .

  • Option to conceal hierarchies and measures from the distributed application: A concealed hierarchy is only known in the data cube where it's located. The query node has no knowledge of this hierarchy. Concealing a hierarchy improves the stability of the cluster and shortens the discovery phase, by significantly reducing its impact on the network's bandwidth. We advise concealing hierarchies that have a very high cardinality. For more information, see the distributed documentation .

    Measures can also be concealed. A concealed measure is only known in the data cube where it's located. We recommend concealing measures that are not relevant to the query node, such as intermediate measures for calculations that do not need to be visible in the User Interface.

  • Option to disable the epoch level: The epoch level of the epoch dimension can now be enabled or disabled. By default, the epoch level is now disabled, only the branch level is enabled. To enable the epoch level, use IEpochDimensionDescription.setEpochLevelEnabled(boolean).

  • Export query plans: A new REST service allows you to perform a query and export its execution plan. The plan now contains the result of each part of the query for each Data Cube (if any), and each pass of the MDX query.

  • Limit the result size of a single GetAggregatesQuery (since 5.9.3): The result size for every retrieval as well as the transient result size cross retrievals within a given GetAggregatesQuery can be limited in terms of number of points i.e. locations. These limits can be set in the project configuration as a shared context value in the cube description or dynamically at query time using QueriesResultLimit helper methods. If not specified, the default behavior is without limit. Note that the transient result size also includes intermediate retrievals result size, often required to compute the top retrievals (Retrievals appearing in the query plan) in complex queries.

Improvements

  • The CSV Source is now capable of detecting line ending at the byte level. This improves the parallelism and speed of the CSV Source when reading one large file, at the cost of not supporting a few uncommon character sets.

  • The performances of the MDX Engine has been improved by using fine-grained locking instead of coarse-grained locking.

  • Indexes are now more intensively used for Datastore queries. In previous versions, only two cases were covered:

    • an Index was used when the conditions on the index fields were set on a single value. Eg: Index on fields A, B, C was used for a condition And(a = 1, b = 2, c = 3)
    • an Index was used for a single condition with multiple values. Eg: Index on field F can be used for the condition b IN [1, 2, 3] For other cases, Indexes were ignored. For example, despite an Index on fields A and B, the query with conditions And(a IN [1, 2], b = 3) could not be planned using Index lookup. It could eventually rely on an Index lookup for A or B, provided such an Index existed, and field scans for the rest.

    The new logic adds the following support:

    • an Index on any number of fields can be used if only one field is subject to a condition with multiple values. Such an Index will be used by pre-compiled queries and ad-hoc queries. Eg: Index on fields A, B, C, D can be used for conditions And(a = 1, b = 2, c IN [3, 4], d = 5).
    • an Index on any number of fields can be used if no more than three of its fields are involved in conditions with multiple values. Such an Index can only be used by ad-hoc queries, for the query engine must be able to estimate the number of points generated by the conditions. Eg: Index on fields A, B, C, D can be used for conditions And(a IN [1, 2], b = 3, c = 4, d IN [5, 6, 7]).

    The maximum limit for cross joins are configured by the properties ActiveViamProperty#MAX_LOOKUPS_ON_PRIMARY_INDEX and ActiveViamProperty#MAX_LOOKUPS_ON_SECONDARY_INDEX.

Miscellaneous

  • Java 8 is no longer supported. ActivePivot 5.9 is compatible with Java 11.
  • The Sandbox has been converted to Spring Boot. Additionally, many features have been transformed into unit tests and are presented in the online documentation.

5.8

New features

  • JDK 11 support: we now provide two versions of ActivePivot 5.8: 5.8.x-jdk8 built with JDK8 and 5.8.x-jdk11 built with JDK11. If you plan to run your project with the latest JDK version, use 5.8.x-jdk11. For more information, see the dedicated page.

  • Cloud Csv Source: This source can fetch CSV files from major Cloud providers.

  • New Mode of our Continuous Query Engine: It is now capable of producing real-time notifications that a given query has changed, without computing the new view.

  • Centralization of ActivePivot Properties: The properties of ActivePivot are now centralized and documented in the ActiveViamProperty class. For more information, see the associated developer guide page.