Behavior Trees: The Engine Behind Goal-Directed Agents

Under every TIATON agent is a behavior tree. Not a flowchart. Not a state machine. A behavior tree — the same data structure that game developers use to make NPCs decide whether to attack, flee, or patrol.

It turns out this model is remarkably well-suited for business process orchestration.

What Is a Behavior Tree?

A behavior tree is a directed acyclic graph where each node returns one of three statuses:

SUCCESS — the action completed
FAILURE — the action failed
RUNNING — the action is in progress (async)

Nodes compose through two main types:

Sequence — runs children left to right. Stops on first FAILURE.

Sequence
├── ValidateInput    → SUCCESS ✓
├── CheckCredit      → SUCCESS ✓
├── EvaluateRisk     → FAILURE ✗ ← stops here
└── MakeDecision     (not reached)

Fallback — runs children left to right. Stops on first SUCCESS.

Fallback
├── TryPrimaryRoute  → FAILURE ✗
├── TryBackupRoute   → SUCCESS ✓ ← stops here
└── ManualEscalation (not reached)

These two composites give you if/else, try/catch, and sequential execution — all in a composable, declarative structure.

Why Not State Machines?

State machines are the traditional choice for workflow engines. But they have a scaling problem:

States: [idle, validating, credit_check, risk_eval, deciding, approved, rejected, error]
Transitions: idle→validating, validating→credit_check, validating→error,
             credit_check→risk_eval, credit_check→error, risk_eval→deciding,
             risk_eval→error, deciding→approved, deciding→rejected, deciding→error

That's 10 transitions for 8 states. Now add retry logic, compensation, parallel execution, and timeout handling:

States: [idle, validating, credit_check, credit_check_retry,
         risk_eval, deciding, approved, rejected, error,
         compensating, compensating_credit, compensating_inventory,
         waiting_approval, approval_timeout, ...]
Transitions: 47 transitions and counting...

The number of transitions grows quadratically with states. State machines become unmanageable for complex workflows.

Behavior trees grow linearly. Adding a new step means adding one node. The composition handles the control flow.

TIATON's BT Implementation

The internal/bt package implements behavior trees in Go with extensions for persistence:

// Core types
type Status int
const (
    Success Status = iota
    Failure
    Running
)

// Node interface
type Executor interface {
    Execute(ctx context.Context, view *View) Status
}

// Composite nodes
func Sequence(id string, children ...Node) Node
func Fallback(id string, children ...Node) Node
func ParallelWithMemory(id string, policy Policy, children ...Node) Node

The key extension is ParallelWithMemory — a parallel node that remembers which children have completed across ticks. This is what enables async skill execution:

ParallelWithMemory (RequireAll)
├── Skill: check_credit     → SUCCESS (completed tick 2)
├── Skill: verify_identity  → RUNNING (waiting for external service)
└── Skill: check_sanctions  → SUCCESS (completed tick 1)

Tree returns: RUNNING (one child still in progress)

Next tick, when the external service responds:

ParallelWithMemory (RequireAll)
├── Skill: check_credit     → (remembered SUCCESS)
├── Skill: verify_identity  → SUCCESS (event received)
└── Skill: check_sanctions  → (remembered SUCCESS)

Tree returns: SUCCESS (all children done)

Session Persistence

The entire tree state serializes to JSON:

{
  "indexes": {
    "n1": { "status": "success" },
    "n2": { "status": "success" },
    "n3": { "status": "running", "composite_index": 1 },
    "n4": { "status": "success" },
    "n5": { "status": "running" }
  },
  "jobs": [
    {
      "job_id": "job_44f1",
      "node_id": "n5",
      "job_type": "identity.v1.VerificationService/Verify"
    }
  ]
}

Node IDs are deterministic — the same tree definition always produces the same IDs (n1, n2, n3...). This means you can:

Serialize the session on server A
Rebuild the tree on server B
Node IDs match → state maps correctly
Resume execution seamlessly

The Agent Layer

The internal/bt/agent package builds goal-directed behavior on top of the behavior tree:

Agent Tick:
  1. Evaluate predicates (DMN + Starlark)
  2. Check which goals are unmet
  3. Find eligible skills (requires met, ensures not met)
  4. Calculate goal affinity (BFS backward)
  5. Select best skill (or parallel set)
  6. Build behavior tree nodes
  7. Execute one tick of the tree
  8. Process commands (job submissions, timer schedules)
  9. Record audit data

The agent doesn't hardcode the execution order. It re-evaluates the situation each tick and selects the best action based on current state.

Deterministic Node IDs

This is subtle but critical. When you serialize a session and later rebuild the tree, the node IDs must match:

type Controller struct {
    nodeSeq int  // deterministic counter
}

func (c *Controller) nextNodeID() string {
    c.nodeSeq++
    return fmt.Sprintf("n%d", c.nodeSeq)
}

Same Build() call order → same IDs every time. This enables cross-process session resume without storing the tree structure — only the state snapshot.

Why This Matters

Property	State Machine	Behavior Tree
Complexity growth	O(n²) transitions	O(n) nodes
Parallel execution	Explicit fork/join	ParallelWithMemory
Error handling	Per-state handlers	Fallback composition
Async resume	Checkpoint + restore	Native RUNNING state
Reusability	States are unique	Subtrees are reusable
Determinism	Depends on impl	Guaranteed by design

Behavior trees give TIATON a foundation that scales with workflow complexity while remaining predictable and debuggable. Every decision the agent makes traces back to the tree structure, the current state, and the predicate evaluations — all visible in the session trace.