ActivePivot

ActivePivot

  • 5.10.4
  • Other Versions
  • User Guide
  • Technical Documentation
  • Support

›ETL

Introduction

  • Overview
  • What's new in ActivePivot

Getting Started

  • Overview
  • AP in a Nutshell
  • Development Environment
  • Download
  • Sandbox Project

Concepts

  • Overview
  • AP Concepts in a Nutshell
  • Data Versioning (MVCC)
  • Dimensions and Hierarchies
  • Partitioning and NUMA
  • Other Concepts

Data Loading

  • Overview
  • Datastore

    • Datastore Configuration
    • Datastore Transactions
    • Store Indexing

    ETL

    • Overview
    • CSV Source
    • JDBC Source
    • Parquet Source

    Loading data from the cloud

    • Cloud Source
    • Amazon S3 Cloud Source
    • Azure Cloud Source
    • Google Cloud Source

    Specific data types

    • Handling Dates
    • Vectors

Aggregation & Analytics

  • Overview
  • Cube Configuration
  • Copper API

    • Introduction
    • API
    • Measures
    • Hierarchies
    • Publication
    • Join operations
    • Advanced topics

    Streaming API

    • Continuous Queries Overview
    • Streaming Overview
    • Continuous Query Engine
    • Continuous Handlers

    Advanced APIs

    • Cube Locations
    • Post-Processors
    • Cube Filters
    • Member Properties
    • Context Values

Data Querying

  • Overview
  • Business Frontends
  • Server Endpoints

    • XMLA
    • Datastore REST API
    • Cube REST API
    • Cube Websocket API

    MDX

    • MDX Engine Configuration
    • MDX Functions
    • MDX Operators
    • MDX Formats
    • MDX Filtering
    • MDX Snippets
    • MDX Cellsets
  • Datastore Queries
  • Location-Based Queries
  • Drillthrough Extensions

Configuration

  • Overview
  • ContentServer

    • Content Server
    • ContentServer REST API
    • CS Websocket API
  • ActivePivot Properties
  • Internationalization

Security

  • Overview
  • Client/Server Communication

    • Authentication
    • Authorization & Entitlements

    Data Access Control

    • Datastore Access Control
    • ActivePivot Access Control
    • Branch Permission Manager

Distributed Architecture

  • Overview
  • Communication Flows
  • Post-Processors
  • Security
  • What-If
  • Recommendations
  • Distribution Properties

Operations

  • Overview
  • Monitoring

    • Health Dispatcher
    • Query Execution Plan
    • Monitoring Query Execution
    • JMX monitoring
    • Off-Heap Memory Export
    • Tracing REST API
  • Troubleshooting
  • Performance
  • High Availability

Release & Migration Notes

  • Changelog
  • Migration notes
  • Deprecated features

Reference

  • Javadoc
  • REST APIs

ETL in ActivePivot

ActivePivot's data source API enables developers to populate their Datastore(s) from multiple data sources. Currently, ActivePivot supports the following data source types:

  • CSV files
  • Parquet files
  • Java DataBase Connectivity
  • Java Objects

In addition to these source types, ActivePivot provides data fetching capabilities from the cloud using the Cloud Source API, which natively supports AWS, Azure, and Google Cloud.

The following schema illustrates the Datastore data loading flow:

Structure Overview

Implementation details may change depending on the source's type.

Extraction

First, ActivePivot's data sources orchestrate and perform the extraction phase that consists of loading the desired data sources. This step is responsible for creating Topics that define the contents of a data source. A topic usually represents a particular type of data content, for instance a coherent business entity.

Next, Message Channels transform and feed data from a single source to a single store within the Datastore through transactions. Optionally, various data processing steps can be applied before data is committed into the Datastore. This includes performing various ETL (Extract, Transform, and Load) operations, using column calculators or custom tuple publishers.

A Channel also encloses two specialized objects:

  • a translator that converts source records to the store's format
  • a tuple publisher that processes and publishes records into the target store

Transformation

A message channel can listen to a data source, and fetch from it. An IMessageChannel is linked to a single ITopic and fed using ITranslators. This contract simply translates an input object into an output. Most notably, the TupleTranslator translates a line read from a CSV file into a tuple that can be fed to the IMessageChannel, and later directly added to a store.

During the translation step, data can be enhanced: attributes can be added or modified. A simple example is to store in the file name data that is constant for the entire file, like a date, and add it as an attribute using the FileNameCalculator.

A message channel appends chunks of data to a message before it is ready to be sent. Several message chunks can be created and filled concurrently from several threads, for higher performance.

Loading

Lastly, tuple publishers conclude the data loading. ActivePivot provides two implementations of the ITuplePublisher contract:

  • TuplePublisher can be used within a transaction to publish the loaded tuples into the targetStores. The publish method MUST be called within a started transaction.
  • AutoCommitTuplePublisher takes care of this step. Its publish method starts and stops the transaction. This is typically used for real-time updates, so that tuples are published as soon as they are available, to prevent the spawning of a long transaction across multiple messages, which has to wait for the entire load to be over before committing the transaction.
← Store IndexingCSV Source →
  • Extraction
  • Transformation
  • Loading
ActivePivot
Community
Stack OverflowLinkedinTwitter
More
Blog
Copyright © 2021 ActiveViam