Skip to main content

Continuous GAQ REST API

What is the Continuous GAQ REST API?

The Continuous GAQ (GetAggregatesQuery) REST API is an experimental API that allows applications to run queries against the cube and receive continuous streaming updates as data changes. Results are delivered in Apache Arrow format.

Experimental API

This API is experimental and may change in future versions without notice.

Why use continuous queries?

Continuous queries provide:

  • Real-time updates: Receive data changes as they occur without polling
  • Incremental updates: Only affected cells are transmitted, not the entire result set
  • Persistent connections: Maintain a single connection for multiple queries

Prerequisites

Before using this API:

  • Authentication: All endpoints require authenticated users
  • GAQ familiarity: Understanding of GetAggregatesQuery concepts (measures, levels, coordinates)
  • Apache Arrow: Client-side Arrow deserialization capability

Overview

The Continuous GAQ REST API provides endpoints for creating publishers, subscribing to query streams, and managing continuous query subscriptions.

Key features

  • Apache Arrow streaming: Results are streamed in application/vnd.apache.arrow.stream format
  • Publisher-subscriber pattern: Publishers group queries with shared lifecycle management
  • User-based authentication: All operations require authenticated users
  • Query lifecycle management: Create publisher, subscribe, unsubscribe, and stop publishers

Usage workflow

A typical workflow for using the Continuous GAQ REST API:

  1. Create a publisher using /publisher/create to obtain a publisher ID
  2. Subscribe to queries using /subscribe with the publisher ID and query request
  3. Receive streaming updates in Apache Arrow format as data changes
  4. Unsubscribe from specific queries when no longer needed using /unsubscribe
  5. Stop the publisher when all queries are complete using /publisher/stop

API endpoints

All endpoints are available under the base path:

/activeviam/pivot/rest/v10/cube/{cubeName}/queries/continuous-gaq

For detailed endpoint specifications, request/response schemas, and examples, see the Continuous GAQ Query REST API in the OpenAPI documentation.

Create publisher

Creates a new publisher for continuous GAQ streams. Clients must use the returned publisher ID to subscribe to GAQ streams.

Endpoint: POST /publisher/create

Response: JSON containing the publisher ID

Subscribe to GAQ

Subscribes to a GAQ query and returns streaming results in Apache Arrow format.

The publisher ID identifies the publisher object where the streamed query results are written. Multiple queries can share one publisher. This groups related queries under the same lifecycle.

Endpoint: POST /subscribe

Response: Streaming HTTP response body in Apache Arrow format (application/vnd.apache.arrow.stream)

Batch Size Configuration: The number of rows per batch when streaming GAQ results can be configured using the JVM property activeviam.gaq.arrow.batchsize.

Unsubscribe from GAQ

Unsubscribes from a specific query on a publisher.

Endpoint: DELETE /unsubscribe

Stop publisher

Stops a publisher and unsubscribes all queries associated with it.

Endpoint: DELETE /publisher/stop

Authentication

All endpoints require authentication. The API uses the current authenticated user's credentials to execute queries and manage subscriptions. If the current thread is not authenticated, a ForbiddenAccessException is thrown.

Streaming result format

Results are streamed using the Apache Arrow IPC (Inter-Process Communication) format. The stream consists of multiple Arrow record batches, each representing either query results or failure notifications.

Apache Arrow streaming format

The API uses Apache Arrow's streaming format (application/vnd.apache.arrow.stream).

For more information about Apache Arrow, see the official documentation.

Stream structure

The continuous GAQ stream delivers results progressively as Arrow IPC messages. Each message contains:

Arrow record batch structure

Each Arrow IPC message contains a record batch with:

Schema metadata (custom metadata)

Metadata embedded in the Arrow schema provides context about the batch:

FieldDescription
publisherIdPublisher identifier
queryIdUnique query identifier
branchIdCube branch name
epochIdQuery execution epoch
updateIdUpdate sequence number (0, 1, 2...)
totalUpdatesTotal number of updates in this batch
cellSetTypeADDED / REMOVED / FULL_REFRESH / EMPTY
chunkIdChunk number within the update
totalChunksTotal chunks for this update

Cell set types:

  • ADDED: Cells added in this update (initial results or incremental additions)
  • REMOVED: Cells removed in this update
  • FULL_REFRESH: Complete result set (replaces all previous results)
  • EMPTY: Empty result set (no data)

Data columns

The record batch contains columns for:

  • Level columns: Level members
  • Measure columns: Aggregated values

Example Arrow Record Batch:

Currency (UTF-8)Value (Float64)
"USD"44.0
"EUR"52.0
"GBP"18.0

With Schema Metadata:

publisherId: "pub-1"
queryId: "query-123"
cellSetType: ADDED
updateId: 0
totalUpdates: 1

Reading results

To read the continuous Arrow stream:

  1. Connect to the streaming endpoint - Establish HTTP connection to /subscribe
  2. Read Arrow IPC messages sequentially - Each message contains one record batch
  3. Extract schema metadata - Read custom metadata from the schema to get:
    • publisherId: Publisher identifier
    • queryId: Query identifier for this specific query
    • epochId: Epoch number (increments with each update)
    • cellSetType: Type of update (ADDED, REMOVED, FULL_REFRESH, EMPTY)
    • updateId and totalUpdates: Position in multi-part updates
    • chunkId and totalChunks: Position in chunked results
    • error: Failure event information (if present)
  4. Check for failure events - If error key exists in metadata, handle the failure
  5. Extract data columns - Read level and measure vectors from the record batch
  6. Process based on cellSetType:
    • ADDED: Merge new cells into the result set or update existing cells
    • REMOVED: Remove cells from the result set
    • FULL_REFRESH: Replace entire result set with new data
    • EMPTY: No data in this batch
  7. Reassemble chunked results - Combine chunks with same updateId using chunkId
  8. Continue reading - Process subsequent messages as they arrive
  9. Handle completion - Connection closes when publisher is stopped

For more information about Apache Arrow, see the official documentation.

Chunking for large results

When a single update contains many cells, it is split into multiple chunks based on the configured batch size (activeviam.gaq.arrow.batchsize JVM property):

Multiple update types in one result

A single continuous GAQ result can contain multiple update types (for example, both ADDED and REMOVED cells):

Failure event format

When a query execution fails, a failure event is sent instead of query results:

The failure event metadata contains:

  • errorClass: The type of error that occurred
  • message: Error message
  • stackTrace: Full stack trace of the error
  • queryId: The query that failed
  • streamId: The stream identifier

Example failure detection:

Schema Custom Metadata:
error: "FailureEvent{streamId=pub-123, queryId=query-456,
type=QueryExecutionException,
message=Query timeout exceeded}"

Complete example: data update flow

Step 1: Initial Query Result (subscription starts)

  • Metadata: queryId="q1", cellSetType=ADDED, epochId=1
  • Data: USD=44.0, EUR=52.0

Step 2: Data Change Event (new USD trade worth 100.0 added)

  • Metadata: queryId="q1", cellSetType=ADDED, epochId=2
  • Data: USD=144.0 (new aggregate value: 44+100)
  • Note: Only the affected cell (USD) is sent, not the entire result set

Step 3: Multiple Update (both additions and removals)

  • Batch 3a: updateId=0, totalUpdates=2, cellSetType=REMOVED, epochId=3 — EUR removed
  • Batch 3b: updateId=1, totalUpdates=2, cellSetType=ADDED, epochId=3 — GBP=30.0 added

Step 4: Full Refresh (complete result set replacement)

  • Metadata: queryId="q1", cellSetType=FULL_REFRESH, epochId=4
  • Data: USD=200.0, GBP=45.0, JPY=88.0 (complete current state)

Connection limitations

When using HTTP/1.1, browsers and HTTP clients typically limit the number of simultaneous open connections to the same host. The standard limit is 6 concurrent connections per domain.

Since each active publisher maintains a persistent streaming connection, this means:

  • Maximum 6 publishers can run simultaneously in a browser environment
  • Once the limit is reached, additional subscription requests will be queued or blocked
  • Stopping a publisher releases its connection, allowing new publishers to be created

Recommendations:

  • Reuse publishers: Share publishers across related queries instead of creating multiple publishers
  • Stop unused publishers: Call /publisher/stop when queries are no longer needed
  • Use HTTP/2: HTTP/2 supports multiplexing, allowing many concurrent streams over a single connection