The role confusion at the heart of most AI agent architectures
Most conversations about AI agents conflate two fundamentally different capabilities: the ability to understand unstructured input and the ability to make decisions about what to do with it.
LLMs are remarkably good at the first. They can read a natural-language request and extract structured data: intent, entities, urgency, sentiment, relationships. They can take a document with no fixed format and pull out the fields that matter. They can interpret ambiguous instructions and map them to a finite set of categories.
LLMs are noticeably less reliable at the second. Given the same input twice, they may choose different actions. Given a boundary condition, they may invent a response that sounds plausible but violates business rules. Given conflicting instructions, they may prioritize whichever one appeared most recently in the prompt — not whichever one is actually more important.
These aren't bugs. They're consequences of how language models work: they generate statistically likely continuations of a prompt, not logically guaranteed outcomes. This is what makes them useful for understanding language and dangerous for making decisions with consequences.
The pattern: extract, then decide
The Extraction Pattern separates these two capabilities into distinct architectural layers:
Layer 1: LLM as extractor. The LLM receives unstructured input (text, document, conversation) and produces structured output: typed fields, classifications, extracted entities. Its contract is defined: given this input, return these fields in this format. The output is validated against a schema before proceeding.
Layer 2: Rules as decision engine. The structured output feeds into explicit rules — decision tables, policy definitions, routing logic — that determine the action. These rules are versioned, testable, and auditable. With a deterministic rules engine, the same structured input and the same rule version produce the same decision.
The boundary between layers is a typed contract. The LLM doesn't know what the rules will do with its output. The rules don't care how the LLM arrived at its extraction. Each layer can be tested, replaced, or updated independently.
This isn't a new idea. It's the same principle behind every well-designed system boundary: define the contract, validate the output, keep the concerns separated. What's new is applying this principle specifically to the LLM-rules boundary in AI agent architectures.
What this solves
The reproducibility problem. When an LLM makes a decision, reproducing the exact same decision with the exact same inputs is not guaranteed — model versions change, temperature settings vary, context window differences affect output. When rules make a decision, the same structured input and the same rule version always produce the same outcome. This is what "deterministic" means in practice: not "no AI involved," but "the decision path is fixed given the inputs."
The explainability problem. As discussed in depth in the context of GDPR Articles 13–15 and 22 and the EU AI Act's Article 86 (applicable from 2 August 2026 for Annex III high-risk systems with significant effects), intrinsic explainability (where the decision logic can be directly traced) is architecturally different from post-hoc explainability (where an approximation method tries to explain why a complex model produced a particular output). The Extraction Pattern keeps the LLM in the "understanding" role, where its contribution is a structured data extraction that can be displayed alongside the rule that acted on it. The explanation is traceable and auditable: "The LLM extracted X from the input. Rule Y (version Z) matched condition A, producing outcome B."
The testing problem. Testing an LLM's decision-making requires probabilistic evaluation: run the same input many times, check that the output is "usually" correct. Testing a rule requires deterministic assertion: this input, this rule, this output — pass or fail. The Extraction Pattern means you have two separate test suites: one for extraction quality (does the LLM correctly identify the fields?) and one for decision correctness (do the rules produce the right outcomes?). Each can be tested independently, with appropriate methodology.
The change management problem. When business rules need to change, the change is isolated to the rule layer. In many cases, the extraction prompt or model can remain unchanged — unless the contract itself changes (new fields, labels, or allowed values). When the LLM model is updated (or replaced with a different provider), the decision logic is unaffected as long as the extraction still conforms to the typed contract. Changes in one layer don't cascade into the other.
Where the LLM should — and shouldn't — operate
The Extraction Pattern doesn't eliminate the LLM. It defines where the LLM adds value and where it introduces risk.
LLM is well-suited for:
- Classifying unstructured input into predefined categories
- Extracting named entities, dates, amounts, and other structured fields from free text
- Interpreting ambiguous natural language and mapping it to a finite set of intents
- Summarizing or scoring content along defined dimensions
- Translating between formats (natural language → structured data)
LLM should not be the mechanism for:
- Choosing which action to take (rules should determine this based on extracted data)
- Evaluating policy compliance (explicit policies should be evaluated programmatically)
- Determining access or permissions (security decisions should be deterministic)
- Selecting from an unbounded set of options (the LLM should work within bounded sets defined by the system)
Google Cloud's 2025 retrospective noted that "agents designed by engineers often lack the tribal knowledge gained through experience in finance, legal, HR, and sales." The Extraction Pattern addresses this by not asking the LLM to have domain expertise in decision-making — only in language understanding. Domain expertise lives in the rules, which are written and maintained by domain experts.
The contract between layers
The critical engineering detail is the typed contract between the extraction layer and the decision layer.
A loosely defined contract — "the LLM returns some JSON" — reintroduces all the problems the pattern is trying to solve. If the fields are ambiguous, if the types aren't enforced, if the output isn't validated, then the rules are operating on unreliable input.
A well-defined contract specifies: these are the fields, these are the types, these are the valid values, these are the required vs. optional fields. The LLM's output is validated against this contract before it reaches the decision engine. If validation fails, the system handles it explicitly (retry, request clarification, reject) rather than passing malformed data to rules that weren't designed for it.
This is standard input validation — the same principle that underpins every API contract, every database schema, every type system. The difference is that the "client" producing the input happens to be a language model instead of a user or an upstream service.
Practical implications
If you're building an AI-powered system that needs to make decisions with consequences — approval/rejection, routing, pricing, escalation — consider whether the LLM is the right component to be making those decisions, or whether it should be providing structured input to a decision engine that makes them.
The test is simple: can you trace every decision back to a specific, versioned rule and the specific data that triggered it? If yes, your system is auditable. If no — if the answer to "why did the system do this?" is "the model decided" — then you have an architecture that will struggle with every governance, compliance, and debugging challenge that production brings.
The LLM is the bridge between human language and machine logic. The rules are the logic. The contract between them is the architecture.