Compensation Patterns: When Rollback Isn't Possible
In a database transaction, rollback is simple. ROLLBACK and everything disappears. The data was never committed.
In a distributed system, there is no rollback. Once you've called an external API, that call happened. The email was sent. The payment was charged. The order was placed.
You can't undo these actions. But you can compensate for them.
The Saga Pattern
The saga pattern (originally described by Hector Garcia-Molina in 1987) replaces a single distributed transaction with a sequence of local transactions, each paired with a compensating action.
Forward flow:
reserve_inventory → charge_payment → ship_order → send_confirmation
If ship_order fails:
refund_payment → release_inventory
(reverse order, skip what doesn't need compensation)
TIATON implements this automatically through skill compensators.
Declaring Compensators
Each skill can declare a compensator — a function that reverses its effects:
{
"key": "charge_payment",
"handler": "charge_customer",
"compensator": "refund_customer",
"requires": ["inventory_reserved"],
"ensures": ["payment_charged"]
}
The compensator is a regular handler with the same signature:
def charge_customer(ctx, state):
payload = new("payments.v1.ChargeRequest", {
"customer_id": state["customer_id"],
"amount": state["order_total"],
"currency": "USD",
"idempotency_key": state["order_id"] + "_charge",
})
return RUNNING, submit_job(
"payments.v1.PaymentService/Charge", payload
)
def on_charge_complete(ctx, state):
state["payment_id"] = ctx.event.result.payment_id
state["charged_amount"] = ctx.event.result.amount
return SUCCESS
def refund_customer(ctx, state):
"""Compensator: reverse the charge."""
if not state.get("payment_id"):
return SUCCESS # Nothing to refund
payload = new("payments.v1.RefundRequest", {
"payment_id": state["payment_id"],
"amount": state["charged_amount"],
"reason": "Order processing failure - automatic compensation",
})
return RUNNING, submit_job(
"payments.v1.PaymentService/Refund", payload
)
Automatic Compensation Cascade
When a skill fails, TIATON triggers compensation for all previously completed skills in reverse order:
Execution order:
1. validate_order → SUCCESS
2. reserve_inventory → SUCCESS (compensator: release_inventory)
3. charge_payment → SUCCESS (compensator: refund_payment)
4. ship_order → FAILURE ← failure here
Compensation (automatic, reverse order):
3. refund_payment → SUCCESS ← charge reversed
2. release_inventory → SUCCESS ← inventory freed
1. (no compensator) ← validation is stateless
You don't write this orchestration logic. The agent tracks CompletedSkills in execution order and reverses through them on failure.
Compensation Strategies
Different scenarios need different approaches:
Full Reversal
The simplest case — undo everything:
def release_inventory(ctx, state):
"""Release all reserved items."""
payload = new("inventory.v1.ReleaseRequest", {
"reservation_id": state["reservation_id"],
})
return RUNNING, submit_job(
"inventory.v1.InventoryService/Release", payload
)
Partial Compensation
Sometimes you compensate with a different action:
def void_credit_check(ctx, state):
"""Mark credit inquiry as voided (can't delete it, but can flag it)."""
payload = new("credit.v1.VoidInquiryRequest", {
"inquiry_id": state["credit_inquiry_id"],
"reason": "Application cancelled due to processing error",
})
return RUNNING, submit_job(
"credit.v1.CreditService/VoidInquiry", payload
)
Notification-Based Compensation
When you can't reverse an action, notify the affected parties:
def notify_cancellation(ctx, state):
"""Can't un-send the approval email, but can send cancellation."""
payload = new("notifications.v1.SendRequest", {
"to": state["applicant_email"],
"template": "application_cancelled",
"data": {
"reason": "Processing error — your application will be re-evaluated",
"reference": state["application_id"],
},
})
return RUNNING, submit_job(
"notifications.v1.NotificationService/Send", payload
)
The Audit Trail
Every compensation is recorded in the session trace:
{
"tick": 5,
"type": "compensation",
"trigger": "ship_order FAILURE",
"compensations": [
{
"skill": "charge_payment",
"compensator": "refund_payment",
"status": "success",
"duration_ms": 1203,
"result": {
"refund_id": "ref_8841",
"amount": 299.99
}
},
{
"skill": "reserve_inventory",
"compensator": "release_inventory",
"status": "success",
"duration_ms": 45,
"result": {
"items_released": 3
}
}
]
}
No detective work. No manual investigation. The trace shows exactly what failed, what was compensated, and whether compensation succeeded.
When Compensation Fails
What if the compensator itself fails? TIATON records the failure and marks the session as requiring manual intervention:
{
"status": "compensation_failed",
"failed_compensations": [
{
"skill": "charge_payment",
"compensator": "refund_payment",
"error": "Payment provider timeout after 30s",
"payment_id": "pay_7721",
"amount": 299.99
}
]
}
This creates an actionable alert: "Refund of $299.99 for payment pay_7721 failed — manual refund required." The system doesn't silently swallow the failure.
Design Principles
- Every side effect needs a compensator — If a skill calls an external system, declare how to reverse it
- Compensators must be idempotent — They might be called more than once (retries)
- Use idempotency keys — External APIs should handle duplicate compensation requests gracefully
- Log everything — The compensation trail is as important as the execution trail
- Accept imperfection — Some compensations are "best effort" (notifications). That's okay. The audit trail captures it.
The goal isn't perfect rollback — that's impossible in distributed systems. The goal is predictable recovery with complete visibility.