Resilience documentation

Pick a deployment placement

Where the router runs — single instance, one per site, or a redundant set per site. Selected together with the failover composition.

Composition decides what the router selects among. Placement decides where the router runs. The two axes are orthogonal — any composition shape works at any placement — but they are selected together because they answer different shapes of the same risk question. Composition addresses which provider can fail; placement addresses where the failure boundary sits.

This page describes the three placement patterns and the criteria for choosing between them.

Single instance

One Authonomy instance serves the customer’s applications.

What it survives: Any failure mode the configured composition addresses — provider outages, internet-path failures, WAN routing problems — provided the instance itself remains reachable from the applications.

What it doesn’t survive: A failure of the instance itself, or a network partition between the applications and the instance.

When to pick: Single-region deployments where the customer’s applications and identity dependencies sit in one failure domain. A typical SaaS architecture, where everything runs in one cloud region, fits this shape.

The single-instance placement is the simplest deployment and the right starting point when the failure modes that matter all sit upstream of the instance.

One instance per site or per failure domain

Multiple Authonomy instances, one at each site (a branch, a store, a facility) or each independent failure domain. Each instance is autonomous: it federates with the same external providers, holds its own canonical view of the population it serves, runs the router and the native floor, and writes its own audit. There is no coordination between instances at the request layer.

What it survives: The same failure modes the composition addresses, plus the case where the WAN between a site and the rest of the world is severed. The site-local instance keeps serving its applications against whichever methods remain reachable from the site — which, during a full WAN severance, is the native floor for users enrolled there.

What it doesn’t survive: A failure of the site-local instance itself takes the site’s authentication with it until the instance is restored or applications are reconfigured.

When to pick: Deployments where individual sites can lose connectivity to the broader network independently. A retail estate with hundreds of stores fits this shape; a manufacturing facility whose production lines cannot tolerate a ten-minute zone-failover window does too. The placement decision is driven by the operational cost of a site going dark — if losing one store’s authentication for the duration of a WAN severance is unacceptable, the site needs its own instance.

This is the placement that makes the failover ladder’s bottom rung — native authentication, served against locally-held credential material — operationally meaningful. Without an instance at the site, native is reachable only from the central deployment, which is the same path that just severed.

A redundant set of instances per site

Multiple instances at the same site, sharing an HA-clustered database. The set behaves as one logical authentication endpoint for the site’s applications (typically behind a load balancer or a resolved name).

What it survives: Everything the per-site placement survives, plus the failure of any single instance in the set.

What it doesn’t survive: A site-wide infrastructure failure that takes out the entire site’s compute or the shared database.

When to pick: Sites where single-instance failure is itself an unacceptable failure mode. The shared database avoids the divergence problem that independent per-instance databases would have during a network partition between boxes in the same rack.

The redundant-set placement is operationally heavier than per-site — it adds the HA-clustered database and the load-balancing surface — and is appropriate where the cost of a site going dark for the time it takes to restart an instance exceeds that operational weight.

How to choose

The decision walks through three questions:

Can this site lose connectivity to the rest of the world independently? If no, single instance is correct. If yes, per-site placement at minimum.

Does the cost of a site going dark for the duration of a WAN severance exceed the cost of running an instance at the site? If yes, per-site placement. The native floor only serves the site if there’s an instance there to serve it.

Is single-instance failure at the site itself an unacceptable mode? If yes, redundant set per site. If single-instance failure is recoverable within the site’s tolerance window, per-site placement is sufficient.

The placement decision is a function of the deployment’s risk posture, not a default. The platform supports the range; the deployment selects.

What happens at a site during severance

A site-deployed instance behaves exactly as the central instance would: the router walks the same ladder, evaluates the same per-method health, and dispatches to the first healthy method. What changes during severance is which methods are reachable from the site. A WAN severance shrinks the set of healthy methods to whatever the instance can reach locally, which (in the limiting case) is the native floor.

Crucially, applications at the site authenticate against the site-local instance at all times, not only during severance. There is no application-layer failover to perform when the WAN severs, because the application is already pointed at the site-local instance. The router decides per request which method serves; the application sees one stable endpoint.