Observability for Decisions: Beyond Logs

Your system logs say: "loan_application_123: rejected". Great. Why was it rejected?

You grep through more logs. You find "risk_score: 0.73, threshold: 0.70". Okay, the risk score was too high. But why was the risk score 0.73? Which rules contributed to it? Was this the correct threshold for this customer segment? When was this threshold last changed?

This is the observability gap. Traditional application monitoring tells you what happened. Decision observability tells you why it happened.

The Anatomy of a Decision Trace

In TIATON, every decision session produces a trace — a complete record of every step, every rule evaluation, and every state change.

A session trace contains:

Input facts — the data that entered the system
Tick-by-tick execution — what the agent did at each step
DMN evaluations — which rules matched, which didn't, and why
State transitions — how the state changed after each skill
Timing data — how long each step took
Final decision — the outcome with full reasoning chain

Reading a Trace

Here's a simplified trace for a loan decision:

{
  "session_id": "sess_a7f3c9e2",
  "agent_key": "loan_processing",
  "status": "done",
  "ticks": 4,
  "duration_ms": 1847,
  "input_facts": {
    "applicant_name": "John Doe",
    "credit_score": 620,
    "annual_income": 45000,
    "loan_amount": 75000
  },
  "phases": [
    {
      "tick": 1,
      "skill": "validate_application",
      "status": "success",
      "duration_ms": 12,
      "state_after": {
        "application_validated": true
      }
    },
    {
      "tick": 2,
      "skill": "check_credit",
      "status": "success",
      "duration_ms": 834,
      "state_after": {
        "credit_checked": true,
        "credit_report_id": "cr_8821"
      }
    },
    {
      "tick": 3,
      "skill": "evaluate_risk",
      "status": "success",
      "duration_ms": 45,
      "dmn_evaluation": {
        "domain": "lending",
        "tables_evaluated": ["loan_eligibility", "risk_scoring"],
        "results": {
          "loan_eligibility": {
            "matched_rule": 3,
            "decision": "manual_review",
            "note": "Borderline — needs review"
          },
          "risk_scoring": {
            "matched_rule": 5,
            "risk_level": "medium",
            "risk_score": 0.62
          }
        }
      },
      "state_after": {
        "risk_evaluated": true,
        "decision": "manual_review"
      }
    },
    {
      "tick": 4,
      "skill": "notify_applicant",
      "status": "success",
      "duration_ms": 956,
      "state_after": {
        "applicant_notified": true,
        "notification_id": "ntf_3391"
      }
    }
  ]
}

An auditor opens this trace and sees: the applicant had a credit score of 620 and income of 45,000. Rule 3 in the loan_eligibility table matched — "Borderline — needs review." The risk scoring table gave a medium risk level. The final decision was manual_review.

No guessing. No log correlation. No "let me check with the dev team."

The 30-Second Audit

The TIATON management UI makes this visual. For any session:

Click the session in the list
See the execution graph with phases
Click any node — see the DMN evaluation, matched rules, state changes
Export the trace as JSON for compliance records

What used to take a compliance investigation (days) now takes 30 seconds.

Error Traces

Traces are even more valuable when things go wrong. When a skill fails and compensation triggers:

{
  "tick": 3,
  "skill": "open_position",
  "status": "failure",
  "error": "Market closed: NASDAQ after-hours rejected",
  "compensation": [
    {
      "skill": "reserve_margin",
      "compensator": "release_margin",
      "status": "success",
      "duration_ms": 23
    }
  ]
}

You see exactly what failed, what error occurred, and how the system cleaned up. The compensation trail is as visible as the happy path.

Connecting Rules to Outcomes

The real power is connecting the dots between rule versions and production outcomes.

"After we changed rule 5 in the risk_scoring table last Tuesday, what percentage of applications shifted from 'approve' to 'manual_review'?"

Because every session records which rule version was used, you can answer this directly. You can compare outcomes across rule versions and spot unintended consequences before they become problems.

What Good Decision Observability Looks Like

Metric	Typical System	With Decision Traces
Time to explain a decision	Hours to days	30 seconds
Time to find a rule-related bug	Days	Minutes
Compliance audit preparation	Weeks	Automated export
Impact analysis of rule changes	Manual + guesswork	Query across sessions
Root cause of rejected applications	Developer investigation	Click the session

This isn't just about compliance — though compliance teams love it. It's about building trust in automated decisions. When anyone in the organization can understand why a decision was made, they trust the system to make those decisions.