Version v1.0 · dotnet

Reliability & Execution Model

Lease-based claiming, run lifecycle, retries, and durability guarantees.

Reliability & Execution Model

DurableStack executes runs through a durable store-backed lifecycle with lease-based ownership.

The goal is predictable behavior during normal operation, transient failures, and process interruptions.

Run lifecycle

A run typically moves through:

  • pending: queued and waiting to be claimed.
  • leased: claimed by a worker and currently in execution.
  • succeeded: execution completed successfully.
  • failed: terminal failure after max attempts or non-retry path.

If a failure is retry-eligible, the run is re-scheduled as pending for a future retry time.

Claim and execution flow

For each processing loop:

  • Worker claims due runs with a lease (ClaimDueRunsAsync).
  • Worker emits claimed/started events.
  • Job executes through the configured runner.
  • On success, run is marked succeeded.
  • On exception, retry eligibility is evaluated and run is marked failed with or without retry scheduling.

Retry behavior

Retry eligibility is based on attempt count:

  • Retry while Attempt < MaxAttempts.
  • Transition to terminal failed when attempts are exhausted.

Delay calculation uses:

  • per-job retry behavior (FixedDelay or Backoff)
  • per-job initial delay when provided
  • otherwise runtime defaults (options.RetryDelay, options.RetryMaxDelay)
  • optional jitter (options.RetryJitterEnabled)

Lease heartbeat and recovery

During execution, DurableStack extends the lease periodically.

  • Heartbeat extension interval is half the lease duration (minimum 250ms).
  • If a worker dies or stops extending lease, the run becomes reclaimable after lease expiry.

This enables automatic recovery from worker interruption.

Provider-level concurrency model

Claiming is implemented with provider-specific concurrency primitives:

  • PostgreSQL/MySQL: FOR UPDATE SKIP LOCKED
  • SQL Server: lock hints such as UPDLOCK + READPAST
  • SQLite: transactional select-and-update semantics

Practical guarantees

DurableStack is designed for effectively-once processing in normal operation with durable retry behavior.

Because distributed systems can re-attempt after failures and lease expiration, handlers should be idempotent.

Operational implications

  • Keep WorkerName unique per process/container.
  • Set lease duration to exceed normal execution time for common jobs.
  • Use retries intentionally and monitor failure trends.
  • Treat idempotency as a required contract for production handlers.