Amazon S3 Cloud Source

This documentation page assumes you are already familiar with Amazon Simple Storage Service (Amazon S3).

The Amazon S3 Cloud Source relies on AWS SDK for Java. Make sure you are familiar with this SDK when using the Amazon S3 Cloud Source.

Cloud Source to Amazon S3 concepts

Amazon Simple Storage Service uses two main concepts: the S3 objects that are the fundamental entities stored in Amazon S3 and the buckets that are containers to organize the S3 objects. The objects can be represented by a S3Entity or a S3EntityPath in our cloud source.

Entities

The AWS implementation of ICloudEntity is S3Entity. It is essentially a wrapper around a S3 object from the AWS Simple Storage Service SDK.

Locating an entity

Entity paths

IS3EntityPath implements ICloudEntityPath. It is a reference to an S3 object.

Directories

The AWS implementation for ICloudDirectory is represented by the S3CloudDirectory implementation.

A directory is tied to a bucket. It contains all S3 objects whose names start with a certain prefix. For example, a directory on a certain container with the prefix directory1/subdirectory2 would contain the first three of the following blobs:

inside:
    directory1/subdirectory2/blob1.txt
    directory1/subdirectory2/blob2.txt
    directory1/subdirectory2/subdirectory3/blob3.txt

not inside:
    blob4.txt
    other_directory/blob5.txt
    directory1/blob6.txt

A directory with an empty prefix corresponds to the root of the container.

An AWS directory object can be constructed by specifying the AmazonS3 client, a bucket name and a prefix.

The AmazonS3 client is the configuration of the connection to Amazon. It can be configured as follows using the Amazon SDK.

AmazonS3 client =  AmazonS3Client.builder()
    .withCredentials(new DefaultAWSCredentialsProviderChain())
    .withRegion(Regions.EU_WEST_1)
    .withClientConfiguration(new ClientConfiguration().withMaxConnections(128))
    .build();

CSVDataProviderFactory

To configure the CSV source to read Amazon object, you can use the AwsCsvDataProviderFactory class to configure how the files are downloaded.

ActivePivot

Datastore

ETL

Loading data from the cloud

Copper API

Streaming API

Advanced APIs

Server Endpoints

MDX

ContentServer

Client/Server Communication

Data Access Control

Monitoring