Resilience documentation

Continuity vs. backup

The distinction that's load-bearing for evaluating any identity-resilience product.

Most products in the identity-resilience category are backup-and-recovery tools. They snapshot your IdP, store the snapshot somewhere safe, and restore it after the outage is over. That’s a useful capability. It is not the same thing as keeping authentication working during the outage.

This page exists because the distinction is the single most important question to settle before evaluating any identity-resilience product against the requirements of an enterprise that cannot tolerate authentication downtime. The answer determines what category of architecture is being evaluated, and what category of failure is actually addressed.

Backup is restore-after-the-fact

A backup-and-recovery model treats your identity provider as a system whose state you can periodically copy and later restore. When the provider goes down, you wait for it to come back, then restore the snapshot if data was lost in the process. The recovery target is the same provider, returned to working order.

Two properties follow:

Authentication is unavailable during the outage. Backup does not serve authentication requests. Applications fail to authenticate users for the duration of the provider-side problem, regardless of how recent the snapshot is.

The recovery time is the time to restore the provider. A four-hour outage of the upstream IdP is a four-hour authentication outage at every dependent application. The backup product affects what state survives the outage, not whether the outage happens.

This is appropriate for a class of risks — accidental deletion, schema corruption, ransomware against the IdP — and inappropriate for the class of risks where the business cannot tolerate the application layer being down at all.

Continuity is a property of the live system

Authonomy Resilience is built for the second class. Instances serve traffic continuously; the failover path is exercised on every request, not just during disasters. When degradation happens, the system is already in a state to handle it. There is no scramble to restore from backup, and authentication continues to work for the populations the deployment is configured to serve.

Two properties follow:

Authentication keeps working through the outage. Applications continue to authenticate users for the duration of the provider’s degradation, served by the next healthy method on the ladder. The application does not know the upstream provider is unavailable.

Recovery from severance is replication catchup, not reconstruction. When the upstream provider returns, the deployment’s local state catches up to the authoritative source. There is no separate restore phase, no consistency rebuild, and no dependency on the snapshot being current.

The choice is structural, not optional

A deployment cannot make a backup product behave like a continuity product, or vice versa, by tuning configuration. They are different architectures aimed at different failure shapes. Evaluating one against the requirements of the other is the most common analytical mistake in this category, and it is worth resolving explicitly before considering any vendor in detail.

If the cost of a four-hour login outage is larger than the cost of standing up redundancy, the requirement is continuity. If the cost is smaller, backup is sufficient. Authonomy Resilience is built for the first case, and is documented accordingly.