Data Load Controller

This is the documentation for ActiveViam’s Data Load Controller library.

How to Get Started

The DLC is provided as a library that can be added as a maven dependency. It is configured via Spring Configuration as described in Configuring the DLC.

If you are looking for how to interact with DLC to load / unload data, start / stop messaging consumers or check the status of a request, please refer to the User Guide.

What Is The Data Load Controller?

The Data Load Controller (DLC) is a component within ActivePivot that sits on top of whatever target sources we may want to use (e.g. CSV source, JDBC source, custom sources), and allows loading and unloading of data for particular ‘source topics’ in a consolidated way.

From DLC’s perspective one just tells, for instance, “I want to fetch data for topic A & B” (and specifying a particular scope of fetching, e.g. for a particular COB date) without caring about the underlying implementation details, such as which source is actually responsible for the fetching of topic A or topic B.

Moreover, DLC is not only a local component but also exposed as a web-service by ActivePivot. This is quite fundamental, in fact, because it allows to keep “data orchestration” (loading/unloading) outside of ActivePivot. ActivePivot is a computational framework, and not a data orchestration framework.

So, an external data orchestrator component, such as another application or just a collection of Control-M jobs, etc., will be developed and maintained by customers outside ActivePivot, and ActivePivot simply gives ways for that orchestrator to tell it what to load/unload.

Diagram showing where the DLC lives in respect to Data-Connectors and ActivePivot Application: Postman

Source and Topics

The DLC is made up of Sources and Topics. Sources define where the data is coming from and how to interact with that data source. Topics define what data to load from that source. A Source contains multiple Topics. Topics are globally unique as two sources cannot contain the same Topic.

Source Types

The Data Load Controller allows you to load multiple types of files from the local file system or from the cloud. The DLC Supports two types of data sources - Fetching and Messaging. Fetching will load the entire file in one go while Messaging is used for data sources that stream their data.

Fetching Data Source

Fetching Data Sources are sources that can be called upon to load in a single operation. When initiating a LOAD Operation on a Fetching Source, the DLC will load a particular file in a single request. Once the file is loaded it is safe to delete / move the file as it is no longer being used by the DLC. All Fetch Topic’s LOAD Operations execute completely and do not finish until the entire file has been processed.

Our fetching data sources consist of:

Messaging Data Sources

Messaging Data Sources are sources where a connection needs to be open between a messenger and consumer in order to get data from the source. A message channel is opened when a START_LISTEN Operation is executed. The messaging channel will remain open and continue to read / poll new data until a STOP_LISTEN Operation is executed, closing the messaging channel.

Our messaging data sources consist of:

Event Monitoring

As part of the data loading/unloading phase, clients can monitor the status of the load, create monitoring dashboards, which will enable them to track events like the start/end times and exceptions/failures if any. This section is mainly aimed at developers looking to tap into the event handling mechanism that has been built into the DLC.

Basic events, like task summary, task completion, task failure, and event handlers for the supported data sources have been provided as part of the DLC. These event handlers write the corresponding events to the DLC cache.

Event Tracing

The DLC utilizes the APM (ActivePivot’s Application Performance Monitoring library) to handle tracing of events. The library relies on openzipkin/brave for the tracing. All events will be traced with a TraceId and SpanId which are unique Hex Strings such as “7a8c982a87b050f2”

TraceContext

A TraceContext will be added to HealthEvents. This TraceContext object has a TraceId and SpandId property. The TraceContext is added to HealthEvents through the APMHealthEventDispatcher. TraceContext’s can be null if the APM Tracing has not properly been configured as stated above. When a DLC process is started a Tracer will be created. All processes started from within the DLC process will be traced with the same traceId. A TraceContext contains three values:

TraceContext Property Value
TraceId Globally unique ID. All tracers created under the parent tracer will contain the same TraceId.
ParentId Parent tracer’s SpanId. This is the SpanId of the parent tracer.
SpanId Globally unique ID of the current process. Every time a new tracer is created (even under a parent) this value will be unique.

Below we can see an example of how the TraceContext is used and tracked through child processes.

Example:

Here we can see the TraceContext in action. In the below image we have two DLC requests to execute in the DLC. Each DLC request will be given its own TaskTracer to trace all proceses. Every time a new sub-process is created a child tracer will be created under the current tracer. Here we can see how all tracers for a DLC request have the same Trace ID. This is because all child tracers created from an existing tracer share the Trace ID. We can also see that all child tracers have a unique Span ID. The Span ID for the first tracer is the same as the Trace ID - this is because this is the root tracer. Additionally all Span ID’s are used as child TraceContext’s Parent ID. With this information we can trace back an event to its parent and vice versa. TraceContextDiagram

search.js