Indexing the Stores

What is an index ?

An index allows better search performances. When a column is indexed, it allows knowing in which records a given value exists, without having to actually read all records in the column.

There are two types of indices based on their usage:

Unique index: an index that allows to ensure uniqueness of a given value. Used for the key fields of a store.
Secondary index: an index that allows repetition of the same value in different records. Used "only" for query performance.

Orthogonally to the unique/secondary nature of an index, there are two types of indices based on the structure of the data they represent:

Single column index: an index based on the value of a single column in a store
Composite index: an index based on the combination of values from several columns in a store

A unique index is automatically created over all the key fields of a store, allowing quick searching of a record by its key, and enforcing key uniqueness.

In some cases, queries might hit the datastore directly, for instance when using JUST_IN_TIME aggregates provider, or when project code is written to directly query the stores.

In such cases, you may need to add indices on other columns or groups of columns to improve performance.

Key Fields Index

When a store is created with one key field, a unique, single column index is automatically created for that one field.

When a store is created with more than one key field, a unique, composite index that encompasses all these key fields is automatically created. The composite key ensures uniqueness in the index and in the store.

The following example shows a key comprising of three key fields.

new StoreDescriptionBuilder().withStoreName("Store")
        .withField("ID_1").asKeyField()
        .withField("ID_2").asKeyField()
        .withField("ID_3").asKeyField()

As the unique key on this store is a combination of ID_1, ID_2, and ID_3, the composite index will only be used if all three are present in a query.

Indexing individual columns

If a single column is the key field of a store, it is automatically indexed. But one might also want to index an individual column, even if it is not a key field: indexing allows faster searches when a search is done directly on an indexed column.

The following example shows a field called "A", which isn't a key field but requires an index for faster searching.

new StoreDescriptionBuilder().withStoreName("Store")
        .withField("ID").asKeyField()
        .withField("A").indexed()

Contrary to the automatically created indices for key fields, the indices created with the indexed() method are secondary indices (do not ensure uniqueness).

If a key field needs to be indexed (for example, there are two key fields so a composite index is created, but one key field is often queried alone), it is also possible to use the the indexed() method:

new StoreDescriptionBuilder().withStoreName("Store")
        .withField("ID_1").asKeyField()
        .withField("ID_2").indexed().asKeyField()

Composite Column Index

When a group of columns are frequently queried together, you can define an index that combines these columns using the withIndexOn() method, as shown below.

new StoreDescriptionBuilder().withStoreName("Store")
            .withField("ID").asKeyField()
            .withField("A")
            .withField("B")
            .withField("C")
            .withField("D")
            .withIndexOn("A", "B", "C")

Queries that then specify all three values of A, B, and C will use this index.

Queries that only specify values A and C will not use this index, as a third value is required. If there are a lot of searches involving just A and C, it may be worth adding another composite index just for those two fields.

Logging Query Plans

When query performance needs to be investigated, you can set the server to log more detailed information as to what indices, if any, are being used.

The class com.qfs.store.query.impl.QueryCompiler can be (temporarily) set to the TRACE (logback) or FINER (jul) logging level to log such query plans.

The following sample log file output shows that a query has effectively done a table scan for the three columns:

Search on store Store, composed of:    
    Scan of field A
    Scan of field B
    Scan of field C

When analyzing typical usage, if such scans occur frequently, there is a good case to add a composite index on columns A, B, and C.

The output after adding such a composite index may then appear as follows, showing that the index is being used:

Secondary index lookup on (A, B, C)

Creating indices will incur a memory usage overhead, so as with traditional database design, they should only be added for the most commonly-executed queries.

Having too many indices could impact data loading times as there is also an overhead of maintaining those indices.

Automatic indices

ActivePivot automatically creates a unique index containing all the key fields.

It will also create a composite secondary index on all fields used in a reference between two stores.

For example, if Store-1 references Store-2 through fields A, B, and C, then a secondary composite index using A, B, and C is automatically created on Store-1. A unique index on fields A, B, C will also be created for Store-2.

Consequently, when analyzing queries for scans and indices used, you may see that a secondary index lookup is being performed even if no explicit index for those fields has been created.

You can override this automatic index creation by calling dontIndexOwner() on the store reference builder.

Calling dontIndexOwner() will have several effects:

It will reduce the memory footprint.
It will speed up commits in the owner store.
It will slow down the queries on those fields.
It will slow down the propagation of updates from the target store to the owner store, and hence slow down the commit when committing in the target store.

We recommend that you keep the automatic indices, unless you encounter any issues with them.

ActivePivot

Datastore

ETL

Loading data from the cloud

Copper API

Streaming API

Advanced APIs

Server Endpoints

MDX

ContentServer

Client/Server Communication

Data Access Control

Monitoring