Dimensions and Hierarchies
Definition of Dimension
An OLAP cube is an array of data understood in terms of its different dimensions. OLAP is an acronym for Online Analytical Processing. It is a computer-based technique for analyzing business data in the search for business intelligence.
A cube can be considered as a generalization of a three-dimensional spreadsheet. For example, a company might wish to summarize financial data by product, by time-period, and by city to compare actual and budget expenses. Product, time, city and scenario (actual and budget) would be the data dimensions. Cube is a shortcut for multidimensional dataset, given that data can have an arbitrary number of dimensions. The term hyper-cube is sometimes used, especially for data with more than three dimensions.
Each cell of the cube holds a number that represents some measure of the business, such as sales, profits, expenses, budget and forecast.
OLAP data is typically stored in a star schema or snowflake schema in a relational data warehouse. In Atoti, the in-memory relational data storage is called the datastore. The data can then be analyzed through the cube: we look at the measures, organized by dimensions. Measures are derived from the numerical elements of the records. Those are aggregated (by sum, average, or other summarizing operations) to get a general overview on the data. Dimensions are the elements of the records that allow to classify the data: they are the criteria by which data is grouped and/or filtered before summarization.
Definition of Hierarchy
The elements of a dimension can be organized within hierarchies. A hierarchy is a set of parent-child relationships, where each parent typically summarizes its children. Parent elements can further be aggregated as the children of another parent.
For example May 2005's parent is Second Quarter 2005 which is in turn the child of Year 2005. Similarly, cities are the children of regions; products roll into product groups and individual expense items into types of expenditure.
Note that the dimension and hierarchy concepts exposed here are the same as defined by the XMLA and MDX standards, with a dimension being a collection of one or several hierarchies.
Consequently, it is possible (in either Excel or Atoti UI) to have hierarchies neatly grouped by parent dimensions, instead of just having a large flat list of analytical axes.
Definition of Level
A level is a specific depth within a hierarchy. If you take the example presented above:
May 2005
's parent isSecond Quarter 2005
which is in turn the child ofYear 2005
then the levels would be Year
, Quarter
, Month
.
Definition of Member
A member is a specific value at a given level of a hierarchy. For example, May 2005
.
A hierarchy often has a single general member, that can summarize all data: that member is called AllMember
.
If a hierarchy has no AllMember
(for example, a currency hierarchy where it makes no sense to compute an
all-currencies grand total), then the hierarchy is said to be slicing.
Different Types of Hierarchies
Hierarchies can be differentiated into four types:
- symmetrical (or balanced)
- asymmetrical (or unbalanced)
- ragged
- network
Symmetrical Hierarchies: Supported
The symmetrical, or balanced, hierarchy is the most common type.
In a balanced hierarchy, the parent of each member comes from immediately above that member. Additionally, the lowest levels on all branches of the hierarchy are the same (hence the balanced/symmetrical name).
The diagram below shows a symmetrical hierarchy. It is symmetrical because the termination of every branch is at Level 3.
Unbalanced Hierarchies: Supported
In an unbalanced hierarchy, the parent of each member comes from immediately above that member, as is the case for a symmetrical hierarchy. However, with an unbalanced hierarchy, not all the branches contain the same number of levels. For example, while the lowest level for one branch may be Level 2, other branches may have lowest branches of Level 3 or more. Such a hierarchy can also be referred to as an Asymmetrical Hierarchy.
The diagram below shows a (very simple) unbalanced hierarchy. It is unbalanced because Mercury and Venus do not possess any moons, so their branches terminate at Level 1, whereas Earth and Mars (who do have moons) have their branches extended to Level 2.
Ragged Hierarchies: Not Supported
In a ragged hierarchy, the parent of a member can come from any level above the level of the member itself, not just from the level immediately above.
A good example of such a hierarchy would be the following geographical one, involving the USA and Washington DC. It is ragged because Washington DC has USA as its parent. USA is two levels above Washington DC, and not in the level immediately above, as it has to be for the hierarchy to be symmetrical.
Ragged hierarchies are not supported in Atoti. You would add a state node at level 1 in the example to be able to use it as a hierarchy in Atoti.
Network Hierarchies: Not Supported
In a network hierarchy, nodes can contain more than one parent. A family tree is a common example of a network hierarchy.
Factless Hierarchies
Factless is a flag that can be set in AxisHierarchyDescription
to change how the members of the hierarchy are loaded.
The levels of a factless hierarchy use fields of the selection that can be found in joined tables rather than in the fact table. The hierarchy directly discovers its members from this joined table, making its feeding much faster than when it is fed from the facts.
All the values of the table will be found in the hierarchy, including those with no associated fact in the base table, thus explaining the term "factless".
By default, all hierarchies that can be factless will automatically set the factless
flag.(See ActivePivotDescription.setAutoFactlessHierarchies
).
This feature can be disabled globally, through the definition of the ActivePivotDescription
, or for a particular hierarchy, through AxisHierarchyDescription.setFactless(false)
. In this case, only the members with associated facts will be loaded in the hierarchy.
For a more detailed description take a look at the javadoc of IAxisHierarchyDescription.setFactless
.
An example is worth a thousand words:
In the diagram above, we have a fact table (STORE 1) whose records contain a CITY field. There are six distinct CITY values materialized (through existing records) in that fact table : Paris, London, Madrid, Sydney, Vienna and Oslo. In each record, there is another field containing a Measure. The measure for each record is represented in the diagram below by the numbers 'n1', 'n2', etc.
The records in this fact table reference those in another table (STORE 2), which we will call the "Country table", via the CITY field (read more about references). The Country table contains records that associate each city with the country it is in, which effectively defines the hierarchy for the countries and cities.
In the Country table, there is a record that associates Tokyo with JAPAN. However, if we considered the countries/cities hierarchy seen through the facts contributed to a cube that depends on this fact table, as there are no records in the fact table for Tokyo, this JAPAN/Tokyo combination would not show up in the hierarchy.
Now, let's imagine that you would always want to see the full hierarchy as defined in the Country table when browsing the cube.
Without factless hierarchies, this was dealt with by adding a 'fake' record (fact) in the fact table, as shown in the following update of the previous example:
However, the factless hierarchy concept offers a much better way to deal with such a business requirement, whereby you can make sure that the hierarchy seen when browsing the cube is fully defined by the content of a static data referential (such as the one in Country table in our previous example), as opposed to be defined through what is effectively materialized by the facts contributing to the cube.
In other words, when using factless hierarchies, the data is sent from all relevant tables (not just the fact table) to contribute to an Atoti cube. The full hierarchies with all relevant members, including those for which there are no facts contributed to the cube yet, are automatically created within the cube.
If the hierarchy in our previous example is activated as factless, then the hierarchy seen from the cube will be:
The illustration below shows how the facts from the different records in the base table (STORE 1) are pushed to the cube to be aggregated, while the records in the Country table (STORE 2) are used to create a hierarchy within the cube. This hierarchy contains all the relevant members from the Country table, including JAPAN (at Level 1) and Tokyo (at Level 2). This hierarchy is a factless hierarchy, because no records (i.e. facts) exist in the fact table for Tokyo.
Factless hierarchies are commonly used in connection with dates, to ensure that a full range, without gaps, is made available, even when there are no facts for a number of the required dates or data ranges.
Analysis Hierarchies
Analysis hierarchies allow to define a hierarchy structure (number of levels) and content (members on those levels) programmatically. The naming convention may be confusing for some people. By "Analysis Hierarchy" here we don't mean any hierarchy available in the cube for "analysis", but those special hierarchies that are not derived from facts.
The concept of "Analysis Hierarchy" is fairly advanced and Atoti offers several ways to simplify its implementation (see Copper Hierarchies).
Custom measures are necessary to make use of analysis hierarchies. The analysis hierarchy members can be viewed as parameters for a post-processor evaluation procedure.
Virtual hierarchies
A virtual hierarchy has both a smaller memory footprint and faster commit times than a (non-virtual) hierarchy. The main use case is a hierarchy with millions of members in an application where the memory footprint or the server start time is a concern. For example, a tradeId hierarchy. When a (non-virtual) hierarchy is created, it collects all the members from the database and stores them in the cube. This can be a problem when the hierarchy has millions of members (such as the tradeID), as this increases the memory usage and the loading time.
A virtual hierarchy is a way to implement a hierarchy without storing its members in the cube. This allows the virtual hierarchy to be as lightweight as possible. It is important to note that the reduced memory usage comes with some limitations. Operations like retrieving all the members of a virtual hierarchy on one of the levels are not possible. Usually this limitation will not have an impact on most of queries. This is because the queries do not rely on this to retrieve results.
To get the best experience, we strongly advise to use Atoti UI as a visualization tool.
To qualify a hierarchy as virtual, simply call the method virtual()
of the HierarchyBuilder
.
.withDimension("Virtual currency")
.withHierarchyOfSameName()
.virtual()
.withLevel("Virtual currency", "currency");
Here are some limitations of the virtual hierarchies:
- member navigation between hierarchy members, as performed in
Lead
andLag
measures, are not allowed - MDX expressions including a virtual hierarchy must include the
NON EMPTY
keyword. In addition, if the expression describes a member expression the full member path must be provided - A virtual hierarchy cannot be slicing
- Virtual hierarchy levels cannot be used as distributing fields.
Hierarchies in a Distributed Architecture Setup
The levels of the hierarchies from the data cubes are only merged if the hierarchies have the same name and belong to the same dimension.
If the hierarchies from the data cubes do not have the same name but belong to the same dimension, they will be in the same dimension in the query cube.
More details about distributed architecture can be found here.