Version v1.0 · dotnet
Reliability & Execution Model
Lease-based claiming, run lifecycle, retries, and durability guarantees.
Reliability & Execution Model
DurableStack executes runs through a durable store-backed lifecycle with lease-based ownership.
The goal is predictable behavior during normal operation, transient failures, and process interruptions.
Run lifecycle
A run typically moves through:
pending: queued and waiting to be claimed.leased: claimed by a worker and currently in execution.succeeded: execution completed successfully.failed: terminal failure after max attempts or non-retry path.
If a failure is retry-eligible, the run is re-scheduled as pending for a future retry time.
Claim and execution flow
For each processing loop:
- Worker claims due runs with a lease (
ClaimDueRunsAsync). - Worker emits claimed/started events.
- Job executes through the configured runner.
- On success, run is marked succeeded.
- On exception, retry eligibility is evaluated and run is marked failed with or without retry scheduling.
Retry behavior
Retry eligibility is based on attempt count:
- Retry while
Attempt < MaxAttempts. - Transition to terminal
failedwhen attempts are exhausted.
Delay calculation uses:
- per-job retry behavior (
FixedDelayorBackoff) - per-job initial delay when provided
- otherwise runtime defaults (
options.RetryDelay,options.RetryMaxDelay) - optional jitter (
options.RetryJitterEnabled)
Lease heartbeat and recovery
During execution, DurableStack extends the lease periodically.
- Heartbeat extension interval is half the lease duration (minimum 250ms).
- If a worker dies or stops extending lease, the run becomes reclaimable after lease expiry.
This enables automatic recovery from worker interruption.
Provider-level concurrency model
Claiming is implemented with provider-specific concurrency primitives:
- PostgreSQL/MySQL:
FOR UPDATE SKIP LOCKED - SQL Server: lock hints such as
UPDLOCK+READPAST - SQLite: transactional select-and-update semantics
Practical guarantees
DurableStack is designed for effectively-once processing in normal operation with durable retry behavior.
Because distributed systems can re-attempt after failures and lease expiration, handlers should be idempotent.
Operational implications
- Keep
WorkerNameunique per process/container. - Set lease duration to exceed normal execution time for common jobs.
- Use retries intentionally and monitor failure trends.
- Treat idempotency as a required contract for production handlers.