Getting Started
Supported sources
The DLC supports loading data from multiple sources into an Atoti datastore.
note
This milestone release supports Local CSV and Tuple sources.
The final release will add support for Avro, JDBC, Kafka, Parquet, RabbitMQ and loading files from cloud storage (AWS, Azure, GCP).
- Local CSV - loading data from local CSV files.
- Tuple - loading data generated by Java code.
Imports
Import the module appropriate for your data source, Spring Auto Configuration will then load the required configuration classes.
If you are not using Spring Auto Configuration, you can manually load the configuration classes.
Module Dependency
<dependency>
<groupId>com.activeviam.io</groupId>
<artifactId>data-connectors-csv-local</artifactId>
<version>${data.connectors.version}</version>
</dependency>
Spring Auto Configuration Classes
DataLoadControllerConfig.class,
LocalCsvSourceConfig.class
Module Dependency
<dependency>
<groupId>com.activeviam.io</groupId>
<artifactId>data-connectors-tuple</artifactId>
<version>${data.connectors.version}</version>
</dependency>
Spring Auto Configuration Classes
DataLoadControllerConfig.class,
TupleSourceConfig.class
Configuration
The DLC can be configured in Java via Spring Beans.
Most things can also be configured via Spring Boot’s Externalized Configuration,
for example, using application.yml
.
Sources
The first thing to configure is a source. A source is a location where data is loaded from.
This is also the minimum configuration for loading data into an Atoti Server.
Yaml Configuration
dlc:
local-csv-sources:
- source-name: localCsvSource
root-base-dir: '/data'
Java Configuration
@Bean
LocalCsvSourceDescription localCsvSource() {
return LocalCsvSourceDescription.builder("localCsvSource", "/data").build();
}
For more information on configuration see Local CSV Source doc.
note
No configuration is required as the DLC provides a default TupleSource.
For more information on configuration see Tuple Source doc.
Topics
A topic describes a specific collection of columnar data for loading into an Atoti Server.
A topic includes the source format as well as a target for the data to be loaded into, some of which is implicit in a minimal configuration.
note
The DLC also creates implicit topics.
The following configuration examples configure a topic called “trades”,
which loads files which match the file pattern trades*.csv
.
This configuration does not specify a target or a file format, the target is implicitly the store named “trades” and the file format is implicitly the columns in the store.
Yaml Configuration
dlc:
csv-topics:
trades:
file-pattern: 'trades*.csv'
Java Configuration
@Bean
public CsvTopicDescription trades(NamedEntityResolverService namedEntityResolver) {
return CsvTopicDescription.builder("trades")
.filePattern("trades*.csv")
.build();
}
For more information on configuration see CSV Topic doc.
This configuration specifies a tuple generator which produces tuples (an array of objects) which match the store format.
This configuration does not specify a target or a tuple format, the target is implicitly the store named “trades” and the tuple format is implicitly the columns in the store.
Yaml Configuration
You can not define a tuple topic in Yaml.
Java Configuration
@Bean
public TupleTopicDescription tupleTopic() {
return TupleTopicDescription.builder("trades", this::generateTuples)
.build();
}
For more information on configuration see Tuple Topic doc.
Aliases
Within the DLC you can define aliases for topics, this allows you to group multiple topics for loading and unloading.
Yaml Configuration
dlc:
aliases:
alias:
- trades
- sensitivities
Java Configuration
@Bean
DlcAliases aliases() {
Map<String, Set<String>> aliases = new HashMap<>();
aliases.put("alias", Set.of("trades", "sensitivities"));
return new DlcAliases(aliases);
}
For more information on configuration see Aliases doc.
DLC requests
Loading and unloading is initiated by sending requests to the DLC.
The DLC provides APIs for Java requests as well as REST requests.
Requests contain an operation and a list of topics or aliases.
Operations
The DLC comes with the following operations:
- LOAD - loads data into the datastore
note
This milestone release supports LOAD.
The final release will add support for UNLOAD, START_LISTEN, and STOP_LISTEN.
For more information on operations see Operation doc.
Request
A request contains an operation and a list of topics or aliases.
Additionally, a scope can be provided to a DLC request.
The request can also override an existing topic configurations.
Load a topic Example
Given the above Local Csv
configuration of the localCsvSource
source and trades
topic, the following request will load the data/trades1.csv
and data/trades2.csv
files into the trades
store.
POST https://<hostname>:<port>/<app-context>/connectors/rest/dlc/v2
Content-Type: application/json
{
"operation": "LOAD",
"topics": [ "trades" ]
}
DlcRequest.builder()
.operation(DefaultDlcOperationsConfig.LOAD)
.topics("trades")
.build();
Override Topic Example
We can override parts of a Topic’s configuration at runtime by defining the csvTopicOverrides
and/or jdbcTopiOverrides
in the request. The topic properties will override the existing topic. It is also possible to define an entirely new topic at request time by providing a topicName that does not already exist nor match any datastore name.
All Topic Overrides will only exist during the duration of the request and will not be persisted.
A source can be configured to allow or disallow the use of Topic Overrides.
Given the above trades
topic, we can override it at request time to specify a new pathMatcher
to use.
The following request will load the data/alternative_trades1.csv
and data/alternative_trades2.csv
files into the
trades
store.
POST https://<hostname>:<port>/<app-context>/connectors/rest/dlc/v2
Content-Type: application/json
{
"operation": "LOAD",
"csvTopicOverrides": {
"newTopic": {
"filePattern": "alternative_trades*.csv"
}
}
}
DlcRequest.builder()
.operation("LOAD")
.topicDescriptions(Set.of(
CsvTopicDescription.builder("trades")
.filePattern("alternative_trades*.csv")
.build()
))
.build();
New Topic Example
Here we will define a new topic that does not already exist.
We will define our new topic to load a file into the trades
store. Since we are not overriding an existing topic, we need to define all aspects of the topic. We can re-use sinks and other pre-defined components in a new topic.
The name we provide for the topic does not matter and will not persist.
POST https://<hostname>:<port>/<app-context>/connectors/rest/dlc/v2
Content-Type: application/json
{
"operation": "LOAD",
"csvTopicOverrides": {
"newTopic": {
"filePattern": "alternative_trades*.csv",
"channels": [
{
"targetName": "trades"
}
]
}
}
}
DlcRequest.builder()
.operation("LOAD")
.topicOverrides(Set.of(
CsvTopicDescription.builder("NewTopic")
.filePattern("alternative_trades*.csv")
.channel(ChannelDescription.builder(
AnonymousSinkDescription.of(
"trades",
scope -> new TuplePublisher<>(datastore, "trades")
)
).build()
)
.build()
))
.build();
Initial Load example
This is an example configuration class which configures data loading on application startup.
@Configuration
public class InitialDataLoadConfig {
@Bean
public ApplicationRunner initialConfigDataLoad(IDataLoadControllerService dlc) {
return args -> {
dlc.execute(
DlcRequest.builder()
.operation(DefaultDlcOperationsConfig.LOAD)
.topics("trades")
.build()
);
};
}
}
Load into a Branch
The DCL provides the ability to load directly into a specific branch. This can be done by specifying a branch in the
DlcRequest
.
The DLC opens transaction on a specific branch through the transaction manager. Atoti will automatically create the
CustomBranchToLoadInto
if it does not exist:
note
The DLC does not manage deletion of branches.
var properties = new Properties();
properties.setProperty(ITransactionManager.BRANCH, "CustomBranchToLoadInto");
datastore.getTransactionManager().startTransaction(properties);
POST https://<hostname>:<port>/<app-context>/connectors/rest/dlc/v2
Content-Type: application/json
{
"operation": "LOAD",
"topics": [ "trades" ],
"branch": "CustomBranchToLoadInto"
}
DlcRequestDTO.builder()
.operation("LOAD")
.topics(Set.of("trades"))
.branch("CustomBranchToLoadInto")
.build();
Next Steps
The DLC offers more configurations than are covered in this guide. To learn more, read the following sections:
- Channels - for connecting topics to sinks and column calculators
- Column Calculators - for enriching data during loading
- Sinks - for defining where data is loaded to and using tuple publishers
- Custom Topic Ordering - for enforcing the order in which certain topics are processed
- Custom Operations - for extending the DLC with custom operations