atoti.Session.create_cube()#

Session.create_cube(fact_table, name=None, *, mode='auto', filter=None, id_in_cluster=None, priority=None)#

Create a cube based on the passed table.

Parameters:
  • fact_table (Table) – The table containing the facts of the cube.

  • name (str | None) – The name of the created cube. Defaults to the name of fact_table.

  • mode (Literal['auto', 'manual', 'no_measures']) –

    The cube creation mode:

    • auto: Creates hierarchies for every key column or non-numeric column of the table, and measures for every numeric column.

    • manual: Does not create any hierarchy or measure (except from the count).

    • no_measures: Creates the hierarchies like auto but does not create any measures.

    Example

    >>> table = session.create_table(
    ...     "Table",
    ...     data_types={"id": "String", "value": "double"},
    ... )
    >>> cube_auto = session.create_cube(table)
    >>> sorted(cube_auto.measures)
    ['contributors.COUNT', 'update.TIMESTAMP', 'value.MEAN', 'value.SUM']
    >>> list(cube_auto.hierarchies)
    [('Table', 'id')]
    >>> cube_no_measures = session.create_cube(
    ...     table, mode="no_measures"
    ... )
    >>> sorted(cube_no_measures.measures)
    ['contributors.COUNT', 'update.TIMESTAMP']
    >>> list(cube_no_measures.hierarchies)
    [('Table', 'id')]
    >>> cube_manual = session.create_cube(table, mode="manual")
    >>> sorted(cube_manual.measures)
    ['contributors.COUNT', 'update.TIMESTAMP']
    >>> list(cube_manual.hierarchies)
    []
    

  • filter (CubeFilterCondition | None) –

    If not None, only rows of the database matching this condition will be fed to the cube. It can also reduce costs when using DirectQuery since the filter will be applied to the queries executed on the external database to feed the cube.

    Example

    >>> df = pd.DataFrame(
    ...     columns=["Product"],
    ...     data=[
    ...         ("phone"),
    ...         ("watch"),
    ...         ("laptop"),
    ...     ],
    ... )
    >>> table = session.read_pandas(df, table_name="Filtered table")
    >>> cube = session.create_cube(table, "Default")
    >>> cube.query(
    ...     cube.measures["contributors.COUNT"],
    ...     levels=[cube.levels["Product"]],
    ... )
            contributors.COUNT
    Product
    laptop                   1
    phone                    1
    watch                    1
    >>> filtered_cube = session.create_cube(
    ...     table,
    ...     "Filtered",
    ...     filter=table["Product"].isin("watch", "laptop"),
    ... )
    >>> filtered_cube.query(
    ...     filtered_cube.measures["contributors.COUNT"],
    ...     levels=[filtered_cube.levels["Product"]],
    ... )
            contributors.COUNT
    Product
    laptop                   1
    watch                    1
    

  • id_in_cluster (str | None) – The human-friendly name used to identify this data cube in a cluster.

  • priority (Annotated[int, Field(gt=0)] | None) –

    The priority of this data cube when using distribution with atoti.QueryCubeDefinition.allow_data_duplication set to True. Data cubes with the lowest value will be queried in priority.

    • If two data cubes have the same priority, one will be chosen at random.

    • If None, duplicated data is retrieved in priority from the data cube with the fewest members in the distributing_levels.

Return type:

Cube