ActivePivot

ActivePivot

  • 5.10.4
  • Other Versions
  • User Guide
  • Technical Documentation
  • Support

›Loading data from the cloud

Introduction

  • Overview
  • What's new in ActivePivot

Getting Started

  • Overview
  • AP in a Nutshell
  • Development Environment
  • Download
  • Sandbox Project

Concepts

  • Overview
  • AP Concepts in a Nutshell
  • Data Versioning (MVCC)
  • Dimensions and Hierarchies
  • Partitioning and NUMA
  • Other Concepts

Data Loading

  • Overview
  • Datastore

    • Datastore Configuration
    • Datastore Transactions
    • Store Indexing

    ETL

    • Overview
    • CSV Source
    • JDBC Source
    • Parquet Source

    Loading data from the cloud

    • Cloud Source
    • Amazon S3 Cloud Source
    • Azure Cloud Source
    • Google Cloud Source

    Specific data types

    • Handling Dates
    • Vectors

Aggregation & Analytics

  • Overview
  • Cube Configuration
  • Copper API

    • Introduction
    • API
    • Measures
    • Hierarchies
    • Publication
    • Join operations
    • Advanced topics

    Streaming API

    • Continuous Queries Overview
    • Streaming Overview
    • Continuous Query Engine
    • Continuous Handlers

    Advanced APIs

    • Cube Locations
    • Post-Processors
    • Cube Filters
    • Member Properties
    • Context Values

Data Querying

  • Overview
  • Business Frontends
  • Server Endpoints

    • XMLA
    • Datastore REST API
    • Cube REST API
    • Cube Websocket API

    MDX

    • MDX Engine Configuration
    • MDX Functions
    • MDX Operators
    • MDX Formats
    • MDX Filtering
    • MDX Snippets
    • MDX Cellsets
  • Datastore Queries
  • Location-Based Queries
  • Drillthrough Extensions

Configuration

  • Overview
  • ContentServer

    • Content Server
    • ContentServer REST API
    • CS Websocket API
  • ActivePivot Properties
  • Internationalization

Security

  • Overview
  • Client/Server Communication

    • Authentication
    • Authorization & Entitlements

    Data Access Control

    • Datastore Access Control
    • ActivePivot Access Control
    • Branch Permission Manager

Distributed Architecture

  • Overview
  • Communication Flows
  • Post-Processors
  • Security
  • What-If
  • Recommendations
  • Distribution Properties

Operations

  • Overview
  • Monitoring

    • Health Dispatcher
    • Query Execution Plan
    • Monitoring Query Execution
    • JMX monitoring
    • Off-Heap Memory Export
    • Tracing REST API
  • Troubleshooting
  • Performance
  • High Availability

Release & Migration Notes

  • Changelog
  • Migration notes
  • Deprecated features

Reference

  • Javadoc
  • REST APIs

Azure Cloud Source

This documentation page assumes you are already familiar with the general structure of Cloud Sources in ActivePivot as well as with Azure Blob Storage.

The Azure Cloud Source relies on Azure Blob Storage SDK 12 for Java. Make sure you are familiar with this SDK when using the Azure Cloud Source.

Cloud Source to Azure Blob Storage concepts

Entities

The Azure implementation of ICloudEntity is AzureEntity. It is essentially a wrapper around a blob client from the Azure Blob Storage SDK (e.g. BlockBlobClient, AppendBlobClient, ...). The Cloud Source bucket nomenclature refers to the Azure Blob Container (and Blob Storage) containing the referred blob.

Entity paths

ICloudEntityPath implementors for Azure Blob Storage all implement IAzureEntityPath. They refer to a single blob in an Azure Blob Storage account. There are five implementations for IAzureEntityPath that each reference a blob client implementation from the Azure Blob Storage SDK:

Cloud SourceAzure Blob Storage SDK
AzureBlobPathBlobClient
AzureBlockBlobPathBlockBlobClient
AzureAppendBlobPathAppendBlobClient
AzurePageBlobPathPageBlobClient
AzureEncryptedBlobPathEncryptedBlobClient

BlobClient is a blob-type-agnostic blob client that can be used to read a blob's content without needing to know the blob type beforehand. Blobs created using AzureBlobPath are created as block blobs by default (following the behavior of BlobClient).

AzureEncryptedBlobPath is a specialized implementation that supports client-side encryption. See the corresponding section for more details.

Entity path limitations

The implementation of entity paths for Azure Blob Storage currently has the following limitations:

  • Uploading content of unknown length is not supported for AzurePageBlobPath. This is due to a limitation with page blobs that requires the uploaded data's size be a multiple of the internal page size on the storage (512 bytes).
  • AzureEncryptedBlobPath only supports the uploading of client-side-encrypted block blobs. This is a limitation of the Azure Blob Storage SDK (EncryptedBlobClient has this same limitation). Downloading client-side-encrypted blobs of other types is supported.

Directories

The Azure implementation for ICloudDirectory is represented by the IAzureCloudDirectory interface. This interface provides additional methods to explicitly request a blob client of a certain type:

IAzureEntityPath<BlockBlobClient> getBlockBlob(String name);
IAzureEntityPath<AppendBlobClient> getAppendBlob(String name);
IAzureEntityPath<PageBlobClient> getPageBlob(String name);

There are two implementations for IAzureCloudDirectory:

  • AzureCloudDirectory: base implementation
  • AzureEncryptedCloudDirectory: can be provided with a key encryption key and/or a key encryption key resolver to respectively write or read blobs using client-side encryption (For more details, see the linked section.)

A directory is tied to a container. It contains all blobs that have a certain prefix in their name (which follows Azure Blob Storage conventions for directories). For example, a directory on a certain container with the prefix directory1/subdirectory2 would contain the first three of the following blobs:

inside:
    directory1/subdirectory2/blob1.txt
    directory1/subdirectory2/blob2.txt
    directory1/subdirectory2/subdirectory3/blob3.txt

not inside:
    blob4.txt
    other_directory/blob5.txt
    directory1/blob6.txt

A directory with an empty prefix corresponds to the root of the container.

An Azure directory object can be constructed by specifying the BlobServiceClient and a container name, or by directly supplying the appropriate BlobContainerClient.

Client-side encryption

The Azure Cloud Source provides specialized implementations of ICloudEntityPath and ICloudDirectory to support client-side encryption.

Internally, the Azure Cloud Source uses the Azure Blob Storage Cryptography module. AzureEncryptedBlobPath is a wrapper around EncryptedBlobClient that supports uploading and downloading blobs with client side encryption.

Understanding client-side encryption

When using client-side encryption, data is encrypted and decrypted on the client side, meaning that the data transiting on the network is always encrypted (on top of the HTTPS protocol, if used), using an encryption key that is only known by the client.

When uploading data to a blob using client-side encryption, data is first encrypted using a one-time, symmetric encryption key (the content encryption key, or CEK), that is itself encrypted by the client using a key encryption key, or KEK (whose algorithm can be chosen, and can be either symmetric or asymmetric). The wrapped encryption key is sent and stored along with the encrypted data on the blob metadata. The key wrapping operation is performed by an object implementing the AsyncKeyEncryptionKey interface in the Azure Blob Storage SDK. The client needs to associate a String id to the specified key that will be stored along with the metadata. This enables the client to distinguish between multiple keys when encrypting different blobs with different keys.

As an example, two blobs uploaded to a storage account using client-side encryption using two different KEKs would result in the following information being store in the cloud:

secret_blob1.txt: {
  content: encrypted with CEK from metadata
  metadata: {
    CEK: encrypted with KEK "key1",
    KEK id: "key1"
  }
}

secret_blob2.txt: {
  content: encrypted with CEK from metadata
  metadata: {
    CEK: encrypted with KEK "key2",
    KEK id: "key2"
  }
}

When downloading data from an encrypted blob, the wrapped CEK is retrieved along with the encrypted data. The client can then unwrap that key using their KEK to decrypt the blob data. This process is labeled as key encryption key resolution in the Azure Blob Storage SDK and is performed by an object implementing the AsyncKeyEncryptionKeyResolver interface.

The Azure Key Vault Key client module provides basic AsyncKeyEncryptionKey and AsyncKeyEncryptionKeyResolver implementations that can be created from keys stored on Azure Key Vault, or directly from a java.security.KeyPair object.

See the classes KeyEncryptionKeyClientBuilder and LocalKeyEncryptionKeyClientBuilder in the aforementioned module.

The module is not included as part of the Azure Cloud Source dependencies.

The symmetric encryption algorithm used by the Azure Blob Storage SDK to encrypt or decrypt content is AES with Cipher Block Chaining (CBC). For more details, see: Microsoft documentation.

Using client-side encryption

Client-side encryption in the Azure Cloud Source can be performed by using the dedicated specializations AzureEncryptedCloudDirectory and AzureEncryptedBlobPath.

Their constructors accept, as additional arguments compared to their regular counterparts, the aforementioned AsyncKeyEncryptionKey and AsyncKeyEncryptionKeyResolver, respectively, for uploading or downloading encrypted content.

If the constructed object is used to either only perform uploading operations or only perform downloading operations, the argument corresponding to the unused operation can be set to null.

AzureEncryptedCloudDirectory

AzureEncryptedCloudDirectory behaves similarly to AzureCloudDirectory and is able to access non-encrypted blobs in the same way. When attempting to download a blob that was encrypted using client-side encryption, it will use the supplied AsyncKeyEncryptionKeyResolver to decrypt the downloaded content.

When used to create a path to a non existing blob, it will provide an AzureEncryptedBlobPath, which means that the uploaded data will be encrypted using the supplied AsyncKeyEncryptionKey.

AzureEncryptedBlobPath

AzureEncryptedBlobPath acts as a reference to an EncryptedBlobClient.

Much like the AzureEncryptedCloudDirectory, it is able to use the supplied AsyncKeyEncryptionKeyResolver or AsyncKeyEncryptionKey to respectively download or upload blobs with client-side encryption.

The Azure Blob Storage SDK only permits uploading data with client-side encryption for block blobs. As such, AzureEncryptedBlobPath has the same restriction and is only able to upload block blobs.

Downloading data from client-side-encrypted, page-and-append blobs (created through other means) is still possible through the AzureEncryptedBlobPath.

← Amazon S3 Cloud SourceGoogle Cloud Source →
  • Cloud Source to Azure Blob Storage concepts
    • Entities
    • Entity paths
    • Directories
  • Client-side encryption
    • Understanding client-side encryption
    • Using client-side encryption
ActivePivot
Community
Stack OverflowLinkedinTwitter
More
Blog
Copyright © 2021 ActiveViam