> ## Documentation Index
> Fetch the complete documentation index at: https://docs.activeviam.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Amazon S3 Cloud Source

> This documentation page assumes you are already familiar with [Amazon Simple
> Storage Service (Amazon
> S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html).

The Amazon S3 Cloud Source relies on [AWS SDK for Java](https://github.com/aws/aws-sdk-java).
Make sure you are familiar with this SDK when using the Amazon S3 Cloud Source.

In order to use the AWS source, add the following lines to your `pom.xml`:

```xml theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
<dependency>
    <groupId>com.activeviam.source</groupId>
    <artifactId>cloud-source-aws</artifactId>
    <version>${atoti-server.version}</version>
</dependency>
```

## Cloud Source to Amazon S3 concepts

Amazon Simple Storage Service uses two main concepts: the S3 objects that are the fundamental entities stored in
Amazon S3 and the buckets that are containers to organize the S3 objects.
The objects can be represented by a `S3Entity` or a `S3EntityPath` in our cloud source.

### Entities

The AWS implementation of `ICloudEntity` is `S3Entity`. It is essentially a
wrapper around a *S3 object* from the AWS Simple Storage Service SDK.

### Locating an entity

#### Entity paths

`IS3EntityPath` implements `ICloudEntityPath`. It is a reference to an S3 object.

#### Directories

The AWS implementation for `ICloudDirectory` is represented by the
`S3CloudDirectory` implementation.

A directory is tied to a bucket. It contains all S3 objects whose names start with a certain prefix.
For example, a directory on a certain container with the prefix
`directory1/subdirectory2` would contain the first three of the following blobs:

```yaml theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
inside:
    directory1/subdirectory2/blob1.txt
    directory1/subdirectory2/blob2.txt
    directory1/subdirectory2/subdirectory3/blob3.txt

not inside:
    blob4.txt
    other_directory/blob5.txt
    directory1/blob6.txt
```

A directory with an empty prefix corresponds to the root of the container.

An AWS directory object can be constructed by specifying the
`AmazonS3` client, a bucket name and a prefix.

The `AmazonS3` client is the configuration of the connection to Amazon. It can be configured as follows using the
Amazon SDK.

```java theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
final S3Client client =
    S3Client.builder()
        .region(Region.EU_WEST_1)
        .credentialsProvider(DefaultCredentialsProvider.create())
        .httpClientBuilder(
            ApacheHttpClient.builder().maxConnections(128) // Default is 50
            )
        .build();
```

## CsvDataProviderFactory

To configure the CSV source to read Amazon object, you can use the `AwsCsvDataProviderFactory` class to
configure how the files are downloaded.

## Configuration example: how to configure it with a CSV source

First let's define a generic CSV source configuration.
This abstract configuration contains all the part that is common to all the sources and can be used to switch from local to cloud sources easily.
In this example we will create a CSV source which loads 2 topics from the cloud:

* one single CSV file "products.csv" for the products
* a folder "desks" for the desks table

```java theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
public abstract class GenericCsvSourceConfig<I> {
  protected abstract ICsvTopic<I> createTopic(
      String topic, String fileName, CsvParserConfiguration parserConfig);
  protected abstract ICsvTopic<I> createDirectoryTopic(
      String topic, String directory, CsvParserConfiguration parserConfig);
  /** Returns CSV source configured with two topics. */
  public ICsvSource<I> csvSource() {
    final ICsvSource<I> csvSource = ICsvSource.<I>builder().build();
    final CsvParserConfiguration parserConfig = getParserConfiguration();
    final ICsvTopic<I> productsTopic = createTopic("PRODUCTS_TOPIC", "products.csv", parserConfig);
    csvSource.addTopic(productsTopic);
    final ICsvTopic<I> desksTopic = createDirectoryTopic("DESKS_TOPIC", "desks", parserConfig);
    csvSource.addTopic(desksTopic);
    return csvSource;
  }
}
```

Then let's define the AWS specific configuration:

```java theme={"languages":{"custom":["/engine/python-sdk/0.9/languages/pycon.tmLanguage.json"]}}
public class AwsSourceConfiguration extends GenericCsvSourceConfig<ICloudEntityPath<S3Entity>> {
  @Override
  protected ICsvTopic<ICloudEntityPath<S3Entity>> createTopic(
      final String topic, final String fileName, final CsvParserConfiguration parserConfig) {
    return new CloudEntityCsvTopic<>(
        topic, parserConfig, dataProviderFactory(), rootDirectory().getEntity(fileName));
  }
  @Override
  protected ICsvTopic<ICloudEntityPath<S3Entity>> createDirectoryTopic(
      final String topic, final String directory, final CsvParserConfiguration parserConfig) {
    return new CloudDirectoryCsvTopic<>(
        topic,
        parserConfig,
        dataProviderFactory(),
        rootDirectory().getSubDirectory(directory),
        null);
  }
  public ICloudDirectory<S3Entity> rootDirectory() {
    return new S3CloudDirectory(createClient(), "myBucket", "root");
  }
  public ICloudCsvDataProviderFactory<S3Entity> dataProviderFactory() {
    return AwsCsvDataProviderFactory.create(
        CloudFetchingConfig.builder().downloadThreadCount(10).build());
  }
}
```
