Azure Cloud Source

This documentation page assumes you are already familiar with the general structure of Cloud Sources in ActivePivot as well as with Azure Blob Storage.

The Azure Cloud Source relies on Azure Blob Storage SDK 8 for Java. Make sure you are familiar with this SDK when using the Azure Cloud Source.

Cloud Source to Azure Blob Storage concepts

Entities

The Azure implementation of ICloudEntity is AzureEntity. It is essentially a wrapper around a blob client from the Azure Blob Storage SDK (e.g. CloudBlockBlob, CloudAppendBlob, ...). The Cloud Source bucket nomenclature refers to the Azure Blob Container (and Blob Storage) containing the referred blob.

Entity paths

ICloudEntityPath implementors for Azure Blob Storage all implement IAzureEntityPath. They refer to a single blob in an Azure Blob Storage account. There are five implementations for IAzureEntityPath that each reference a blob client implementation from the Azure Blob Storage SDK:

Cloud Source	Azure Blob Storage SDK
`CloudBlobPath`	`CloudBlob`
`BlockBlobPath`	`CloudBlockBlob`
`AppendBlobPath`	`CloudAppendBlob`
`PageBlobPath`	`CloudPageBlob`

CloudBlob is a blob-type-agnostic blob client that can be used to read a blob's content without needing to know the blob type beforehand. Blobs created using CloudBlobPath are created as block blobs by default.

Entity path limitations

The implementation of entity paths for Azure Blob Storage currently has the following limitations:

when uploading content of unknown length to an AzurePageBlobPath, the uploaded data's size must be a multiple of the internal page size on the storage (512 bytes).

Directories

The Azure implementation for ICloudDirectory is represented by the IAzureCloudDirectory interface. This interface provides additional methods to explicitly request a blob client of a certain type:

IAzureEntityPath<CloudBlockBlob> getBlockBlob(String name);
IAzureEntityPath<CloudAppendBlob> getAppendBlob(String name);
IAzureEntityPath<CloudPageBlob> getPageBlob(String name);

The default implementation for IAzureCloudDirectory is AzureCloudDirectory.

A directory is tied to a container. It contains all blobs that have a certain prefix in their name (which follows Azure Blob Storage conventions for directories). For example, a directory on a certain container with the prefix directory1/subdirectory2 would contain the following blobs:

inside:
    directory1/subdirectory2/blob1.txt
    directory1/subdirectory2/blob2.txt
    directory1/subdirectory2/subdirectory3/blob3.txt

not inside:
    blob4.txt
    other_directory/blob5.txt
    directory1/blob6.txt

A directory with an empty prefix corresponds to the root of the container.

An Azure directory object can be constructed by specifying the CloudBlobClient to use to communicate with the storage account and a container name, or by directly supplying the appropriate CloudBlobContainer.

Client-side encryption

To enable client-side encryption with the directory and path classes, it need to be enabled by setting the default request options of the underlying CloudBlobClient:

BlobEncryptionPolicy policy = new BlobEncryptionPolicy(key, keyResolver);
client.getDefaultRequestOptions().setEncryptionPolicy(policy);
// use the client
CloudBlobPath path = new CloudBlobPath(client, "myContainer", "myBlob");

Either the key or the keyResolver can be null. They respectively enable uploading to or downloading from client-side encrypted blobs.

Understanding client-side encryption

When using client-side encryption, data is encrypted and decrypted on the client side, meaning that the data transiting on the network is always encrypted (on top of the HTTPS protocol if used), using an encryption key that is only known by the client.

When uploading data to a blob using client-side encryption, data is first encrypted using a one-time symmetric encryption key (the content encryption key, or CEK), that is itself encrypted by the client using a key encryption key, or KEK (whose algorithm can be chosen, and can be either symmetric or asymmetric). The wrapped encryption key is sent and stored along the encrypted data on the blob metadata. The key wrapping operation is performed by an object implementing the IKey interface in the Azure Blob Storage SDK. The client needs to associate a String id to the specified key that will be stored along the metadata. This enables the client to distinguish between multiple keys when encrypting different blobs with different keys.

As an example, two blobs uploaded to a storage account using client-side encryption using two different KEKs would result in the following information being store on the cloud:

secret_blob1.txt: {
  content: encrypted with CEK from metadata
  metadata: {
    CEK: encrypted with KEK "key1",
    KEK id: "key1"
  }
}

secret_blob2.txt: {
  content: encrypted with CEK from metadata
  metadata: {
    CEK: encrypted with KEK "key2",
    KEK id: "key2"
  }
}

When downloading data from an encrypted blob, the wrapped CEK is retrieved along the encrypted data. The client can then unwrap said key using their KEK to decrypt the blob data. This process is labelled as key encryption key resolution in the Azure Blob Storage SDK and is performed by an object implementing the IKeyResolver interface.

The Azure Blob Storage SDK provides basic IKey and IKeyResolver implementations that can be created from keys stored on Azure Key Vault, or directly from a java.security.KeyPair object.

The symmetric encryption algorithm used by the Azure Blob Storage SDK to encrypt or decrypt content is AES with Cipher Block Chaining (CBC). More details on the Microsoft documentation.

ActivePivot

Datastore

ETL

Loading data from the cloud

Copper API

Streaming API

Advanced APIs

Server Endpoints

MDX

ContentServer

Client/Server Communication

Data Access Control

Monitoring