Resilience documentation

Limitations and levers

The constraints a deployment inherits from the design — and the levers a deployment can engage against each.

Every design is shaped by a small number of load-bearing constraints. This page names the constraints a deployment inherits from Authonomy Resilience’s design, and the levers available to address each. They are limitations in the sense of “things the design does not do,” not in the sense of “things the design fails to do correctly” — each is a deliberate choice with named tradeoffs.

A deployment inside the operational envelope below operates to contract; a deployment whose requirements sit outside the envelope should either engage levers to bring the requirements inside, or treat the resiliency capability as insufficient for those requirements. The design does not pretend that every posture fits.

Limitations

Drift-window staleness. The deployment’s database is current relative to the customer’s authoritative identity source within the configured drift window — typically one to four hours, per-deployment-configurable. Severance extends effective staleness by the severance duration and resumes converging on reconnect.

Revocation lag during severance. A revocation issued at the authoritative identity source, or a credential revocation written to the credential store, does not reach a severed instance until reconnect. A user whose deprovisioning or credential revocation has not yet propagated can continue to authenticate at the severed instance for the duration of the severance. This is the most important operational-risk statement in the design.

Credential writes fail closed during credential-store severance. Just-in-time enrollment, re-enrollment, credential recovery, password reset, and credential revocation all require the credential store or its authoritative write path to be reachable. Severed instances do not accept these writes locally for replay on reconnect. The trade-off is deliberate: a simpler reconciliation model in exchange for narrower availability of credential-management operations during severance.

Floor coverage builds with usage. Native authentication serves only subjects who have registered a locally-verifiable factor with Authonomy. Enrollment happens just-in-time at first IdP authentication; until a user has authenticated through Authonomy at least once, no credential material is registered and the floor cannot serve them. Day-1 coverage for newly-provisioned users requires pairing just-in-time enrollment with a forced first-login workflow at onboarding.

Site authentication is bounded by site instance availability. Applications at a site authenticate against the site-local instance; a failure of that instance leaves the site without authentication until it is restored or applications are reconfigured. Continuity through single-instance failure requires a redundant set of instances at the site.

Audit lives in the database. The audit trail is persisted in the deployment database alongside the rest of the instance state. In single-instance deployments that trail is local to the instance; in shared-database deployments it is a consolidated stream for the pool. There is no cross-instance audit aggregation at the request-path layer; aggregated audit outside the deployment database is an operational concern, not a request-path capability.

Bearer-token invalidation is bounded by token lifetime. Tokens validated at applications via cached JWKS remain valid until expiry; invalidation propagates through rotation, not through active revocation. Refresh-token revocation is immediate where the refresh path applies.

Authoritative source integrity propagates. The customer’s primary identity provider is the system of record for identity. Corrupted lifecycle events emitted by a compromised or misconfigured authoritative source propagate to every replica within the drift window; the resiliency capability does not detect or filter authoritative-source compromise, nor does it attempt to. Integrity controls at the authoritative source — or in the HR/IGA pipeline upstream of it — are where a deployment’s defense against this case lives.

Credential storage is a deployment-shaped exposure surface. Authonomy is authoritative for native credentials; the credential store is their persistent home. By default credentials live in the database; optionally, they are externalized to a keystore. The exposure shapes differ across topologies — database compromise, wrapping-key compromise, or keystore compromise — and a deployment selects between them.

Clock agreement is a deployment-environment prerequisite. TOTP verification (time-step alignment between the authenticator and the instance) and token lifetime enforcement (the validator’s exp check against its own clock) both depend on reasonable clock agreement. The platform does not provide its own time source; an instance whose environment lacks reliable time sync can observe TOTP verification failures and token-expiry edge cases. Managed-service operation carries time sync as part of the managed surface; self-hosted operation treats reliable time as a deployment-environment prerequisite.

Levers

Every constraint admits at least one lever a deployment can engage against it. The deployment selects values; this page describes the shape of each.

Drift-window width. Shorter window means faster propagation and higher sync traffic; longer window means lighter sync load and a larger revocation-lag surface.

Severance tolerance. Permit an instance to degrade to refusing authentication after a bounded severance duration, rather than serving against increasingly stale state.

Floor enrollment policy. Which populations are required to enroll a locally-verifiable factor at first IdP authentication. Bounds the floor’s coverage, the revocation-lag surface, and the credential-exposure surface together.

Factor catalog selection. Which factor types — WebAuthn, TOTP, password — the deployment offers users for floor enrollment. Bounds the floor’s coverage and the credential-exposure shape together.

Credential-store placement. Whether credential material lives in the per-instance database (default) or in an externalized keystore (HSM, cloud KMS). Shapes the compromise surface and the severance behavior.

Forced first-login at onboarding. A workflow trigger from the upstream identity provider prompting newly-provisioned users to authenticate through Authonomy on Day-1, completing factor enrollment before they begin work. The lever for closing the just-in-time enrollment gap for newly-provisioned populations.

Targeted sync. Operator-initiated push of a specific change — identity revocation, credential revocation when the relevant credential-store path is reachable, urgent onboarding — independent of the incremental cadence.

Site redundancy. A redundant set of instances at a site for continuity through single-instance failure.

The right lever settings are a function of the deployment’s risk posture. The platform supports the range; the deployment selects.

Operational tuning after deployment

Pre-production testing validates the contract; production operation refines thresholds, runbooks, and operating posture. Four classes of observation should be used to tune a deployment after launch:

Actual sync-lag distributions under the deployment’s authoritative source and network path. The platform’s contract names a drift-window bound; the deployment’s actual distribution (median, 95th, 99th percentile) is a function of the authoritative source’s change volume, the sync engine’s capacity, and the network between them.

Probe false-positive rates under the deployment’s network conditions. The rate at which synthetic probes produce unhealthy observations that don’t reflect a true method outage is a function of the deployment’s network path and the probe thresholds. Stabilization windows are tuned against it.

Operator workflow around fallback events in the deployment’s actual identity patterns. The runbook describes the shape of operator response; the actual workflow — how often fallback fires, which signals the operator finds most informative, which runbook entries are most frequently consulted — becomes clearer in operation.

Long-horizon effects. Audit-buffer storage growth, sync-cursor wraparound, token-rotation crossing sync cycles — these are monitored in production over weeks and months. Pre-production operation produces an approximation; sustained operation refines it.

None of these blocks pre-production approval when the contract holds. Each is a reason for the deployment to retain observability against the platform’s behavior through initial operation and beyond.