AI Under Rule Control: The 80/20 Problem

Every AI pilot starts the same way. The demo is impressive. The prototype works. Leadership is excited. Then someone asks: "What's the accuracy?"

"About 80%."

"And the other 20%?"

Silence.

The Production Gap

80% accuracy sounds good until you do the math. If your system processes 10,000 decisions per day:

8,000 correct decisions
2,000 wrong decisions every single day

For loan approvals, that's 2,000 people getting wrong answers. For compliance checks, that's 2,000 potential violations. For fraud detection, that's 2,000 missed or false alerts.

No amount of prompt engineering fixes this. LLMs are probabilistic. They're brilliant at understanding context, extracting information, and generating text. They're terrible at consistent, deterministic decisions.

The Split: AI Extracts, Rules Decide

The solution isn't to remove AI. It's to use AI where it excels and rules where consistency matters.

┌─────────────────────────────────────────────┐
│                Input Document                │
│  "The applicant, John Doe, age 34, works    │
│   at Acme Corp earning $85,000/year..."     │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│            AI Extraction Layer               │
│  LLM extracts structured data:              │
│  { name: "John Doe", age: 34,              │
│    employer: "Acme Corp", income: 85000 }   │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│          Rule-Based Decision Layer           │
│  DMN tables evaluate:                        │
│  - Eligibility: age ≥ 18 ✓, income > 30k ✓ │
│  - Risk scoring: income/loan ratio = 1.7    │
│  - Decision: APPROVE (rule 5 matched)       │
└─────────────────────────────────────────────┘

AI handles the fuzzy part (understanding unstructured text). Rules handle the precise part (making the decision). The decision is always explainable because it maps to a specific rule in a versioned table.

Implementation Pattern

In TIATON, this looks like:

load("//proto", "new")

def extract_application_data(ctx, state):
    """Use LLM to extract structured data from document."""
    payload = new("ai.v1.ExtractionRequest", {
        "document": state["raw_document"],
        "schema": "lending.v1.ApplicationData",
        "instructions": "Extract applicant details from this loan application.",
    })
    return RUNNING, submit_job("ai.v1.ExtractionService/Extract", payload)

def on_extraction_complete(ctx, state):
    """Store extracted data, proceed to rule-based evaluation."""
    result = ctx.event.result
    state["applicant_name"] = result.data.name
    state["age"] = result.data.age
    state["income"] = result.data.income
    state["employer"] = result.data.employer
    state["extraction_confidence"] = result.confidence
    return SUCCESS

def evaluate_with_rules(ctx, state):
    """DMN tables take over — deterministic, auditable decision."""
    # This handler triggers DMN evaluation automatically
    # The predicate_domain evaluates all tables in the domain
    # Results are in ctx.facts after DMN evaluation
    decision = ctx.facts["loan_eligibility"].decision
    state["decision"] = decision
    state["decision_rule"] = ctx.facts["loan_eligibility"].matched_rule
    return SUCCESS

The LLM extraction might be 95% accurate. But the 5% it gets wrong gets caught by validation rules before any decision is made. And the decision itself is always made by rules — not by the LLM.

Confidence Scoring

For fields where AI extraction is uncertain, you add validation:

def validate_extraction(ctx, state):
    if state["extraction_confidence"] < 0.85:
        state["needs_human_review"] = True
        state["review_reason"] = "Low extraction confidence"
        return SUCCESS  # Don't fail — route to manual review

    # Validate extracted values against rules
    if state["age"] < 18 or state["age"] > 120:
        state["needs_human_review"] = True
        state["review_reason"] = "Age out of range: " + str(state["age"])
        return SUCCESS

    return SUCCESS

Low confidence? Route to human review. Impossible values? Route to human review. Everything else goes through the rule engine.

The Compliance Conversation

When a regulator asks how you make decisions:

"We use AI to extract information from documents. The extracted data is validated against business rules. If extraction confidence is below 85%, the case goes to human review. All decisions are made by versioned decision tables. Every decision is traced and auditable."

Compare this to:

"We use an AI model that we fine-tuned on historical data. It usually gets it right."

The first answer gets you approved. The second gets you a follow-up investigation.

Architecture Summary

Layer	Technology	Purpose	Accuracy
Extraction	LLM	Unstructured → structured	~90-95%
Validation	Rules	Catch extraction errors	Deterministic
Decision	DMN Tables	Business logic	Deterministic
Routing	Agent	Orchestrate the flow	Deterministic
Audit	Decision Trace	Full explainability	Complete

AI does what AI does best. Rules do what rules do best. The result: a system where AI accelerates the process, but rules guarantee the outcome.