Why the data journey matters
The data journey determines how efficiently Atoti ingests, prepares, and serves data for analysis. Each stage has direct consequences on performance and scalability:- Extraction defines how quickly data enters the system
- Transformation ensures records are clean and analytically relevant before storage
- Loading establishes the indexes and structures used by the aggregation engine
- Partitioning and NUMA policies define how the workload scales across available cores and memory nodes
What happens during data extraction?
The first step of the data journey is to extract data from external sources and then to convert it into an in-memory format suitable for the datastore. Atoti supports a range of source types, including CSV files, Parquet files, JDBC databases, and cloud storage providers such as Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage. In the Atoti Java SDK, extraction is handled by dedicated threads that parse incoming data. For CSV files specifically, Atoti uses a built-in CSV parser to read and interpret records during this phase.What happens during data transformation?
After extraction, the data is transformed before it is loaded into the datastore. Transformation ensures that records are clean, consistent, and enriched with any additional context required for analysis. Atoti Python SDK does not include methods for data transformation. If required, this step is managed with other Python tools and libraries. In the Atoti Java SDK, two mechanisms handle data transformation.- Column calculators
- Tuple publishers