This documentation page assumes you are already familiar with Amazon Simple Storage Service (Amazon S3).The Amazon S3 Cloud Source relies on AWS SDK for Java. Make sure you are familiar with this SDK when using the Amazon S3 Cloud Source. In order to use the AWS source, add the following lines to your
pom.xml:
Cloud Source to Amazon S3 concepts
Amazon Simple Storage Service uses two main concepts: the S3 objects that are the fundamental entities stored in Amazon S3 and the buckets that are containers to organize the S3 objects. The objects can be represented by aS3Entity or a S3EntityPath in our cloud source.
Entities
The AWS implementation ofICloudEntity is S3Entity. It is essentially a
wrapper around a S3 object from the AWS Simple Storage Service SDK.
Locating an entity
Entity paths
IS3EntityPath implements ICloudEntityPath. It is a reference to an S3 object.
Directories
The AWS implementation forICloudDirectory is represented by the
S3CloudDirectory implementation.
A directory is tied to a bucket. It contains all S3 objects whose names start with a certain prefix.
For example, a directory on a certain container with the prefix
directory1/subdirectory2 would contain the first three of the following blobs:
AmazonS3 client, a bucket name and a prefix.
The AmazonS3 client is the configuration of the connection to Amazon. It can be configured as follows using the
Amazon SDK.
CsvDataProviderFactory
To configure the CSV source to read Amazon object, you can use theAwsCsvDataProviderFactory class to
configure how the files are downloaded.
Configuration example: how to configure it with a CSV source
First let’s define a generic CSV source configuration. This abstract configuration contains all the part that is common to all the sources and can be used to switch from local to cloud sources easily. In this example we will create a CSV source which loads 2 topics from the cloud:- one single CSV file “products.csv” for the products
- a folder “desks” for the desks table