Skip to main content

Troubleshooting "My partitions are empty, why are they not dropped ?"

Introduction

Deleting data is often part of the application tasks that are done before loading fresher data.

When deleting data, ActivePivot can empty or drop partitions. Emptying partitions means deleting the records but keeping the indexes and references for future usage of the partitions. Dropping partitions means deleting all the data (records, indexes and references), except a unique index dictionary in some cases. Partitions are dropped if and only if the deletion condition is done on fields that are value-partitioned.

Here are some information to help design the best deletion procedure.

printStoresSizes method

The printStoresSizes function prints the list of the existing partitions and their sizes. Only existing partitions appear in the list. Partitions in the list with a size equal to zero indicate there are no records but the partition has not been dropped and can retain memory (like indexes and references).

The following example shows the Base store partitions are all empty but have not been deleted.

+----------------------------------------------------------------------------------------------------------------+
| |
| Store lengths at epoch 58 |
| |
+--------------+--------+--------------+--------------+--------------------------+-------------------------------+
| | | | | | |
| Store name | Size | Max row id | Partitions | Partition sizes | Partition max row ids |
| | | | | | |
+--------------+--------+--------------+--------------+--------------------------+-------------------------------+
| | | | | | |
| Base | 0 | 29000000 | 8 | 0, 0, 0, 0, 0, 0, 0, 0 | list of partitions max size |
| | | | | | |
+--------------+--------+--------------+--------------+--------------------------+-------------------------------+

The following example shows that the partitions of the Base store are all empty except the last one.

+------------------------------------------------------------------------------------------------------------------+
| |
| Store lengths at epoch 58 |
| |
+--------------+--------+--------------+--------------+----------------------------+-------------------------------+
| | | | | | |
| Store name | Size | Max row id | Partitions | Partition sizes | Partition max row ids |
| | | | | | |
+--------------+--------+--------------+--------------+----------------------------+-------------------------------+
| | | | | | |
| Base | 0 | 29000000 | 8 | 0, 0, 0, 0, 0, 0, 0, 100 | list of partitions max size |
| | | | | | |
+--------------+--------+--------------+--------------+----------------------------+-------------------------------+

The example below shows the City partitions were never created or have been deleted.

+---------------------------------------------------------------------------------------------------+
| |
| Store lengths at epoch 3 |
| |
+--------------+--------+--------------+--------------+-------------------+-------------------------+
| | | | | | |
| Store name | Size | Max row id | Partitions | Partition sizes | Partition max row ids |
| | | | | | |
+--------------+--------+--------------+--------------+-------------------+-------------------------+
| | | | | | |
| City | 0 | 0 | 0 | | |
| | | | | | |
+--------------+--------+--------------+--------------+-------------------+-------------------------+

Partitions deletions

Partitions can be dropped automatically when doing a removeWhere operation. Partitions are dropped if and only if the removeWhere condition is done on fields that are value-partitioned.

If the dropped partition belongs to a target store (i.e. referenced store), all the data will be deleted except for the dictionary of the unique index of the referenced field which is kept in a cache if and only if there is at least one existing owner partition pointing to it.

Let's take a datastore with two stores, A and B with a reference from A to B on field F1. A is value partitioned by F1 and hash partitioned by F2. B is value partitioned by F1. In the initial state, there are four partitions in store A and two partitions in store B as below.\n Partitions initial state\n Example 1: removeWhere on store A on F1 = 0 will delete partitions 0 and 1 of store A.\n Partitions removeWhere on store A\n Example 2: removeWhere on store B on F1 = 0 will delete partition 0 of store B but keep the unique index dictionary on F1 in a cache because partitions 0 and 1 of store A still reference this partition.\n Partitions removeWhere on store A\n Example 3: removeWhere on stores A and B on F1 = 0 will delete partitions 0 and 1 of store A and partition 0 of store B without keeping the unique index dictionary because there are no more partitions in store A referencing partition 0 of store B.\n Partitions removeWhere on stores A and B

Optimizing data deletions

When deleting data (or loading new data), the references between the stores are updated if the owner store is updated. So when deleting data on different stores, it is best to start with the owner store of the reference not to update the references for nothing and to ease the deletion of partitions. If the deletion is done periodically, it should target entire partitions rather than deleting a few rows across several partitions. If the deletion transaction is quite big, it is also possible to make one transaction per store or even per partition.