There are many ways to define vectors depending on the database, read the vectors documentation to know what is supported by your database.
Table with native vectors
If the table has native vectors, you can directly use them in a table without any transformation.- ClickHouse
- Databricks
In ClickHouse, native vectors are called This table can directly be used:
Array.
Let’s create an example table with a few of them:Table with one column per vector element
Let’s create a table with 5 columns, each containing one element of a Quantity vector for a given product.- BigQuery
- Databricks
- MS SQL
- Redshift
- Snowflake
- Synapse
| PRODUCT | QUANTITY_0 | QUANTITY_1 | QUANTITY_2 | QUANTITY_3 | QUANTITY_4 |
|---|---|---|---|---|---|
| product_1 | 10 | 20 | 15 | 25 | 10 |
| product_2 | 50 | 65 | 55 | 30 | 80 |
prefix + delimiter + number,
all these columns can be converted into a single vector column (numbers must be consecutive from 0 (or 1) to N).
You simply need to create a Table on this and call vectorize on it to transform this table with one column per vector element to a table with vectors.
The vectorize method accepts either a list of prefixes or a delimiter to convert columns into a single vector.
- BigQuery
- Databricks
- MS SQL
- Redshift
- Snowflake
- Synapse
| PRODUCT | QUANTITY |
|---|---|
| product_1 | [10,20,15,25,10] |
| product_2 | [50,65,55,30,80] |
useAutoVectorizer to true.
With this setting enabled, the auto-vectorizer will detect columns sharing the same prefix and will convert them into a vector.
The behavior of the auto-vectorization can be controlled by two more settings:
autoVectorizerDelimiter: sets the delimiter as defined above.
There is no delimiter by default.minThresholdForAutoVectorizer: sets minimum number of columns with the same prefix to trigger the auto-vectorization.
The default value is 50.
- BigQuery
- Databricks
- MS SQL
- Redshift
- Snowflake
- Synapse
Table with one index and value per row
In this example we will use the following table:| PRODUCT | INDEX | QUANTITY |
|---|---|---|
| product_1 | 0 | 10 |
| product_1 | 1 | 20 |
| product_1 | 2 | 15 |
| product_1 | 3 | 25 |
| product_1 | 4 | 10 |
| product_2 | 0 | 50 |
| product_2 | 1 | 65 |
| product_2 | 2 | 55 |
| product_2 | 3 | 30 |
| product_2 | 4 | 80 |
- BigQuery
- Databricks
- MS SQL
- Redshift
- Snowflake
- Synapse
vectorize on it to transform this table with row by row values to a table with vectors.
- BigQuery
- Databricks
- MS SQL
- Redshift
- Snowflake
- Synapse
| PRODUCT | QUANTITY |
|---|---|
| product_1 | [10,20,15,25,10] |
| product_2 | [50,65,55,30,80] |