Skip to main content

How to use vectors

note

There are many ways to define vectors depending on the database, read the vectors documentation to know what is supported by your database.

Table with native vectors

If the table has native vectors, you can directly use them in a table without any transformation.

In ClickHouse, native vector are called Array. Let's create an example table with a few of them:

create table if not exists
TUTORIAL.NATIVE_VECTORS
(
`PRODUCT` String NOT NULL,
`QUANTITY` Array(Float64) NOT NULL
)
ENGINE = MergeTree()
ORDER BY (PRODUCT)
;
insert into TUTORIAL.NATIVE_VECTORS values
('product_1', [10, 20, 15, 25, 10]),
('product_2', [50, 65, 55, 30, 80]);

This table can directly be used:

final Table table = session.discoverTable("TUTORIAL", "NATIVE_VECTORS");

Table with one column per vector element

Let's create a table with 5 columns, each containing one element of a Quantity vector for a given product.

CREATE TRANSIENT TABLE IF NOT EXISTS MULTI_COLUMN_QUANTITY (
PRODUCT STRING NOT NULL,
QUANTITY_0 FLOAT NOT NULL,
QUANTITY_1 FLOAT NOT NULL,
QUANTITY_2 FLOAT NOT NULL,
QUANTITY_3 FLOAT NOT NULL,
QUANTITY_4 FLOAT NOT NULL);

INSERT INTO MULTI_COLUMN_QUANTITY SELECT 'product_1',10,20,15,25,10;
INSERT INTO MULTI_COLUMN_QUANTITY SELECT 'product_2',50,65,55,30,80;

The table looks like this in the database:

PRODUCTQUANTITY_0QUANTITY_1QUANTITY_2QUANTITY_3QUANTITY_4
product_11020152510
product_25065553080

When a table has at least 2 columns with names following the pattern prefix + delimiter + number, all these columns can be converted into a single vector column (numbers must be consecutive from 0 (or 1) to N).

You simply need to create a Table on this and call vectorize on it to transform this table with one column per vector element to a table with vectors. The vectorize method accepts either a list of prefixes or a delimiter to convert columns into a single vector.

final Table tableWithoutVector = session
.discoverTable(new SqlTableId(TEST_DATABASE_NAME, TUTORIAL_SCHEMA_NAME, "MULTI_COLUMN_QUANTITY"));
// Manually vectorize the table after discovery with a list of prefixes
final Table tableWithVector = tableWithoutVector.vectorizeTable(List.of("QUANTITY"));

final Table tableWithoutVector = session
.discoverTable(new SqlTableId(TEST_DATABASE_NAME, TUTORIAL_SCHEMA_NAME, "MULTI_COLUMN_QUANTITY"));
// Manually vectorize the table after discovery with a delimiter
final Table tableWithVector = tableWithoutVector.vectorizeTable("_");

In this example, both code snippets produce the same result.

PRODUCTQUANTITY
product_1[10,20,15,25,10]
product_2[50,65,55,30,80]

It is also possible to have this conversion done automatically. It can be enabled for the whole project by setting the ActiveViam property ActiveViamProperty.ENABLE_DIRECTQUERY_AUTOVECTORIZER to true.

With this property enabled, the auto-vectorizer will detect columns sharing the same prefix and will convert them into a vector.

The behavior of the auto-vectorization can be controlled by two more properties:

  • ActiveViamProperty.DIRECTQUERY_AUTOVECTORIZER_DELIMITER: sets the delimiter as defined above.
    There is no delimiter by default.
  • ActiveViamProperty.DIRECTQUERY_AUTOVECTORIZER_THRESHOLD: sets minimum number of columns with the same prefix to trigger the auto-vectorization.
    The default value is 50.

Read the properties documentation for more details about ActiveViam properties.

With the ActiveViam properties set to enable the autovectorizer, there is nothing more to do after the discovery.

PropertyRegistry.set(ActiveViamProperty.ENABLE_DIRECTQUERY_AUTOVECTORIZER, true);
PropertyRegistry.set(ActiveViamProperty.DIRECTQUERY_AUTOVECTORIZER_DELIMITER, "_");
PropertyRegistry.set(ActiveViamProperty.DIRECTQUERY_AUTOVECTORIZER_THRESHOLD, 5);

final Table table = session
.discoverTable(new SqlTableId(TEST_DATABASE_NAME, TUTORIAL_SCHEMA_NAME, "MULTI_COLUMN_QUANTITY"));

Table with one index and value per row

In this example we will use the following table:

PRODUCTINDEXQUANTITY
product_1010
product_1120
product_1215
product_1325
product_1410
product_2050
product_2165
product_2255
product_2330
product_2480

This table represents 2 vectors split over multiple lines.

Let's create the table and feed it.

CREATE TRANSIENT TABLE IF NOT EXISTS MULTI_ROW_QUANTITY ( PRODUCT STRING NOT NULL, INDEX INTEGER NOT NULL, QUANTITY FLOAT NOT NULL);

INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_1',0,10;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_1',1,20;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_1',2,15;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_1',3,25;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_1',4,10;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_2',0,50;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_2',1,65;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_2',2,55;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_2',3,30;
INSERT INTO MULTI_ROW_QUANTITY SELECT 'product_2',4,80;

You simply need to create a Table on this and call vectorize on it to transform this table with row by row values to a table with vectors.

final Table multiRowTable = session
.discoverTable(new SqlTableId(TEST_DATABASE_NAME, TUTORIAL_SCHEMA_NAME, "MULTI_ROW_QUANTITY"));
final Table tableWithVector = multiRowTable.vectorizeTable("INDEX", List.of("QUANTITY"));

You now have a table with a Vector column which will be equivalent to the following table:

PRODUCTQUANTITY
product_1[10,20,15,25,10]
product_2[50,65,55,30,80]