Handling Dates
Introduction
Temporal types represent a real challenge for most softwares. From parsing to storing and displaying, date and time patterns constantly change. This article provides a general understanding on how to properly work with temporality in ActivePivot.
Within this article, "temporal types" is used as a generic term when detailing "Date" or "Time" does not matter.
Declaring Temporal Types within the Datastore
When defining a date field, the developer chooses the object type used to store the latter.
For instance, in order to store dates as LocalDate
objects, we use the ILiteralType.LOCAL_DATE
type in the dataType
argument of the field description, as shown in the example below:
public IStoreDescription storeDescription() {
return new StoreDescriptionBuilder()
.withStoreName("exampleStore")
.withField("date1", "localDate[yyyy-MM-dd]").asKeyField()
.withField("date2", "date[yyyyMMdd]")
.build();
}
Parsing
Carefully note the syntax used: dataType[parsingPattern]
.
ActivePivot uses its own parsers for temporal types. Two pieces of information are required:
- the type itself, which refers to the target java class, such as
LocalDate
orLocalDateTime
- the pattern used for the parsing process.
A common example is
localDate[MM-dd-yyyy]
for American dates.
The
ILiteralType
interface shows the data types available by default in ActivePivot. Note that usingdate
asdataType
will force the use of the legacyjava.util.Date
, and its associated formatterSimpleDateFormat
.the
dataType
"zonedDateTime" is available for projects in which specifying the time zone is relevant.
All date and time parsers implement the contract IVariableParser
, and new parsers can easily be
defined as plugin values using a similar syntax.
If the pattern is not specified by the developer, ActivePivot uses the default pattern it has associated with the dataType used. Default patterns all follow the norm ISO 8601.
When it comes to parsing "Times", ActivePivot adds a layer of flexibility to the DateTimeFormatter
by being lenient on specific fields,that is by providing default values for unspecified fields in the
parsed string.
For example, localDateTime[yyyy/MM/dd]
will successfully parse the string "2020/06/20" by creating
a LocalDateTime
whose inner LocalTime
is set to midnight.
ActivePivot controls leniency at two levels:
- In the pattern: optional fields can be surrounded with square brackets, or not specified at all.
- In the parsing method itself, through the definition of
TemporalQueries
.DateTimeFormatter#parseBest
's javadoc describes best how these are defined. Modifying this level of control will require the overriding of the parser'sdoParse
method.
ActivePivot always sets default values that make sense. Seconds and milliseconds default to zero. Dates default to Today. Times default to Midnight. TimeZones default to the system's time zone.
Storing in the Datastore
Temporal types are stored in Object Chunks within the Datastore, and thus reside in the Heap Memory, unless the developer chooses to dictionarize the field.
Temporal Objects imported from a data source
Temporal objects can be imported from any data source. Their associated documentation should be used as a reference. Note that parsers in a data source follow the same rules as the ones aforementioned. See the data sources documentation for more details.
Front End Formatting
Clients can format dates within their cube definitions so that the stringified dates sent to any graphical interface match a specific format.
This is done through the fluent builders. When adding a level, one can use withFormatter(String)
.
The format String given as argument must once again follow the aforementioned rules.
Dates in MDX
ActivePivot's MDX language implementation supports temporal types. A comprehensive suite of functions is available in MDX.
Operators
Operators such as <
, >=
behave as expected when used to compare dates and times.
Now() function
Returns the current date and time, to the millisecond.
Datediff(interval, date1, date2) function
Computes how many times the given interval can be found between the two given dates.
The interval argument must be one of "ms", "s", "n", "h", "d", "ww", "w", "m", "q", "y", "yyyy".
For example, given Datediff("d", "2020-12-24", "2020-12-31")
, the result would be 7 days.
Other MDX Functions
- LastPeriods: returns a set of members prior to and including the current member.
- PeriodsToDate: returns a set of periods from a given level within the time dimension, starting with the first period it finds and ending with the specified member.
The following example aggregates over the first 8 months of the calendar year:
Aggregate(
PeriodsToDate(
[Date].[Calendar].[Calendar Year],
[Date].[Calendar].[Month].[August 2003]
)
)
QTD (Quarter to Date): returns a set of members from the date and time dimension starting with the first day of the quarter up to the specified member.
WTD (Week to Date): returns a set of members from the date and time dimension starting with the first day of the week up to the specified member.
MTD (Month to Date): returns a set of members from the date and time dimension, starting with the first date of the month up to the specified member.
YTD (Year to Date): returns a set of members from the date and time dimension starting with the first day of the year up to the specified member.
Remarks
In this section, numbers are meant as rough estimates rather than absolute truths.
Storing Temporal Types within the Datastore
Developer's choices regarding temporal types in the Datastore can have a strong impact on memory usage and performance.
The developer should consider using two columns, one for the LocalDate, and the other for the LocalTime. This will allow the dictionarization of the date field, and add some flexibility to the queries run against the pivot.
Let's imagine a banking system storing 8 years of trading history (and round this to 3000 days).
This represents 3000 differents LocalDate
objects, each of them being stored (roughly) as:
- one int for the year (32 bits)
- two shorts for the day and month (16 * 2 bits)
Let's also say that 10k trades are performed every day.
If the Date field is dictionarized, the impact on memory is negligible:
- 3000
LocalDate
objects stored on Heap, which amounts to less than 1 MBits. - 3000 * 10k = 30M trades stored Offheap, in Integer Chunks. These can be compressed in ActivePivot. For the dictionarized Date field, these can be compressed using 11 bits (the closest power of 2 = 4096). This very roughly amounts to 300 MBits.
If the Date field is not dictionarized, one LocalDate
object is stored for each trade. That is 30M
LocalDate
objects stored in Object Chunks on Heap, for about 1.7 GBits total.
Now let's imagine that these trades are recorded with LocalDateTime
objects instead, with
precision to the second.
These objects each contain a LocalDate
(64 bits) and a LocalTime
, stored roughly as:
- three bytes for the hour, minute, second (8 * 3 bits)
- one int for the nanoseconds (32 bits) That amounts to 64 + 32 + 24 = 120 bits per temporal marker.
Following the same calculations, if the Date field is not dictionarized, that is 30M LocalDateTime
objects, stored similarly as before, for about 3.3 GBits of On Heap object Chunks.
Indexing this field would represent a grave mistake: the chances that two trades are performed at
the same second are small, and the index would be a 1 to 1 mapping between the LocalDateTime
objects stored on Heap (3.3 GBits again), and the bitmap (which has no value), stored OffHeap,
in Double Chunks (as we would index more than Integer.MAX_VALUE
objects).
These numbers might sound small, but they tend to be bigger in real life projects, with simulation data, and so on. They still illustrate how indexes can free Heap Memory in ActivePivot, which is intensively used at querying time. Offheap management is one of ActivePivot's strong points.
If needed, one can use two fields, one for a LocalDate, and one for a LocalTime, and only index the date.