Checkpoint and Restore an Atoti application

You can take a snapshot of your running Atoti application and restore it instantly using CRaC technology, eliminating the need to start from scratch.

What is CRaC

CRaC (Coordinated Restore at Checkpoint) is a Java OpenJDK feature that significantly reduces Java application startup time. It allows you to pause an application ("checkpoint") at a fully initialized, ready-to-serve state, and then resume ("restore") it from that point instead of restarting.

CRaC operates by taking a snapshot of the running JVM, including memory, file descriptors, and threads. When you restore the application, it resumes from the same state it was in at the time of the checkpoint.

Why use CRaC

The primary motivation for using CRaC is fast startup and instant readiness. This is especially valuable when:

You want to speed up development iterations.
You need to quickly recover from a crash or restart.
You are working in environments where cold start times are too long.

One main use case is the ability to stop a costly running application that is not in use and restore it almost instantly when it's needed again. Since CRaC avoids the application's startup time, you can be more aggressive with the inactivity delay before stopping the application.

Constraints and limitations

Operating system: CRaC currently supports only Linux on x64 and ARM64 architectures.
JDK requirement: CRaC is not available in standard OpenJDK distributions. You need a supported distribution such as Azul Zulu with CRaC in version 21.
File system and environment consistency: Restoring a checkpoint assumes that the environment (e.g. network, file paths) remains consistent. This means the architecture, OS version, JDK and application JAR file in the restore environment must be identical to those in the checkpoint environment. However, both environments can have different RAM, processors, and storage.
For instance, any swapped vector files must exist at the same path on the restore environment.
Spring Boot: In Atoti, the Spring Boot framework primarily manages CRaC integration. Spring Boot supports CRaC starting from version 3.2. You can find more detailed information in the official documentation.
Checkpoint limitations: Currently, CRaC does not support creating a second checkpoint after a restore; you must restart the application from scratch before creating a new checkpoint.
NUMA architecture: Off-heap memory might not be correctly re-mapped to the right NUMA nodes after restore. Therefore, we do not recommend having a NUMA architecture when using CRaC.

CRaC use cases

What CRaC enables in a non-distributed setup

Imagine starting your Atoti application (a Spring Boot service) and waiting for it to load data, warm up caches, and initialize internal services. Instead of repeating this process every time the application starts, CRaC allows you to:

Start the application normally.
Checkpoint it, once it is fully initialized and ready to serve queries.
Restore it from the checkpoint, instead of starting from scratch. This skips all the startup and warm-up time.

A simple story

Let’s say you start Atoti in the morning, and it takes 20 minutes to load and prepare. You checkpoint it after this point. Now, every time you need to redeploy, restart, or scale, you can restore from the checkpoint and be ready almost instantly.

What CRaC enables in a distributed setup

You can checkpoint and restore both query and data nodes in any order.

A checkpointed node behaves as if it is leaving the cluster, meaning that when you checkpoint a data node, its members are removed from the query node. The node then rejoins the cluster after restore.

What happens during checkpoint and restore

Application pauses: During checkpointing, the application temporarily freezes while its state is being captured. You can choose whether to stop the application after checkpointing.

note

Checkpoint duration highly depends on storage throughput.

info

The size of a checkpoint file is about the size of your application, including both on-heap and off-heap memory.

Queries during checkpoint: During checkpointing, the process pauses, so it is not queryable for a brief moment.

caution

Queries that were running during a checkpoint will most likely be interrupted.

Queries during restore: You cannot execute queries until the application is fully restored. Once restored, the state is identical to the moment of the checkpoint, so any off-heap data or in-memory structures remain intact.

info

Although the application restores almost instantly, not all data is immediately re-mapped into memory. Therefore, the first queries can take around 10 to 20% more time as they trigger the memory mapping of the corresponding queried data.

Network sessions: Any open connections (like websockets or REST endpoints) will not survive the checkpoint and will re-establish during restore.
File watcher: The CSV source watcher service still works after restore.

CRaC and DirectQuery

CRaC is compatible with Atoti applications using DirectQuery, with most supported databases.

When checkpointing an application that relies on DirectQuery, all database connections are closed during the checkpoint phase and safely re-established upon restore. This allows the application to resume querying the database as if it had just started, while still benefiting from the fast startup provided by CRaC.

Supported databases

CRaC with DirectQuery is supported for all databases supported by Atoti, except:

Snowflake
BigQuery

These two databases are currently not compatible with CRaC when used through DirectQuery, as their connection models do not allow the connections to be safely closed and re-established across checkpoint and restore.

Recommendations

Ensure that no DirectQuery queries are running at checkpoint time, as they will be interrupted.

Verify that database endpoints, credentials, and network configuration remain unchanged between checkpoint and restore, as required by CRaC environment consistency.

For supported databases, no additional configuration is required beyond the standard CRaC and DirectQuery setup.

Guidelines

Follow these guidelines closely, as any failure during a checkpoint will cause the application to crash.

When to checkpoint

We highly recommend checkpointing when the application is in a fully initialized and stable state, meaning Spring initialization is complete and the system is ready to serve queries.

Avoid checkpointing if you have the following:

Ongoing queries, as they will most likely be interrupted.
Ongoing transactions, as any open file that the source might read during the checkpoint will cause an exception and crash the application.

caution

An application cannot be checkpointed with CRaC if any open resources exist, such as a file or a socket.

While Atoti and Spring Boot already handle most resources properly, you must manage the closing and reopening of your own resources to use CRaC, as described in the official documentation.

See how to checkpoint an application with CRaC.

How to restore

To facilitate and optimize memory mapping, we highly recommend restoring an application in a clean environment.

As mentioned in the Constraints and limitations section, the environment must remain consistent with the one where you checkpointed the application.

note

The license is reset at checkpoint, then reloaded and checked at restore.

See how to restore an application with CRaC.

CRaC VM options

A component called the engine performs checkpointing and restoring. CRaC provides different engines and several Java command-line options that control its behavior.

See the comprehensive list of CRaC VM options.

What is CRaC​

Why use CRaC​

Constraints and limitations​

CRaC use cases​

What CRaC enables in a non-distributed setup​

A simple story​

What CRaC enables in a distributed setup​

What happens during checkpoint and restore​

CRaC and DirectQuery​

Supported databases​

Recommendations​

Guidelines​

When to checkpoint​

How to restore​

CRaC VM options​