[{"data":1,"prerenderedAt":1107},["ShallowReactive",2],{"blog-all":3},[4,176,361,507,645,781,945],{"id":5,"title":6,"body":7,"date":161,"description":162,"draft":163,"extension":164,"meta":165,"navigation":166,"path":167,"seo":168,"stem":169,"tags":170,"__hash__":175},"blog/blog/2026-03-03-kyc-routing-rules-problem.md","KYC Routing Is a Rules Problem, Not an AI Problem",{"type":8,"value":9,"toc":150},"minimark",[10,15,19,22,25,28,31,35,38,41,44,47,50,52,56,59,62,65,68,71,74,77,79,83,86,89,96,102,105,117,120,123,125,129,132,135,138,140,144,147],[11,12,14],"h2",{"id":13},"the-routing-decision-hiding-inside-every-kyc-process","The routing decision hiding inside every KYC process",[16,17,18],"p",{},"Every financial institution that onboards customers makes the same set of decisions hundreds or thousands of times a day. Is this customer low-risk, medium-risk, or high-risk? Which verification level applies? What documents are required? Does this case need enhanced due diligence, or can it go through simplified checks? Should it be routed to an analyst, or can it be approved automatically?",[16,20,21],{},"These are routing decisions. And they are governed by rules: FATF's risk-based approach, national AML/CFT regulations, internal risk appetite frameworks, and jurisdiction-specific requirements that can vary from country to country and sometimes from product to product.",[16,23,24],{},"FATF's Recommendations (adopted February 2012, updated regularly) require assessing ML/TF risk and applying a risk-based approach (Recommendation 1). In February 2025, FATF updated Recommendation 1 and its Interpretive Note, emphasising proportionality and simplified measures under the risk-based approach. The principle is straightforward: low-risk customers go through simplified due diligence, high-risk customers go through enhanced due diligence, and the institution is responsible for defining and justifying the tiers in between.",[16,26,27],{},"In practice, this means every KYC process is a decision tree. The inputs are customer data, jurisdiction, product type, transaction patterns, and screening results. The outputs are routing decisions: which verification path, which level of scrutiny, which documents, which approval flow. The logic connecting inputs to outputs is a set of rules.",[29,30],"hr",{},[11,32,34],{"id":33},"where-the-rules-actually-live","Where the rules actually live",[16,36,37],{},"In most financial institutions, these rules don't live in one place. They're spread across multiple systems, codebases, and sometimes spreadsheets.",[16,39,40],{},"The risk scoring model might be in one system. The document requirements might be configured in another. The jurisdiction-specific overrides might be hardcoded into the onboarding platform. The escalation thresholds might exist in a policy document that a compliance officer references manually. The PEP (Politically Exposed Person) screening rules might follow one logic in the initial onboarding and a different logic during periodic reviews.",[16,42,43],{},"Fenergo's Financial Crime Industry Trends 2025 research (survey conducted in August 2025) reports average annual spend on AML/KYC operations of $72.9 million per firm, with country-level breakdowns for the UK ($78.4M), US ($72.2M), and Singapore ($68.2M). Reported use of advanced AI tools in KYC/AML (as self-reported by survey respondents) surged from 42% in 2024 to 82% in 2025. Automation of periodic KYC reviews averaged roughly a third across respondents. The gap between AI adoption and actual process automation suggests that many institutions are adding AI capabilities on top of fragmented rule systems rather than addressing the rule layer itself.",[16,45,46],{},"Fenergo's KYC in 2022 survey (1,055 C-suite respondents) reports that two thirds of respondents said a single KYC review costs between $1,501 and $3,500. In Fenergo's Financial Crime Industry Trends 2025 research, UK corporate banks report onboarding times averaging more than six weeks. Signicat/11:FS research reports that 63% of consumers in Europe have abandoned a financial application in the past year, citing lengthy processes and too much information required as key reasons.",[16,48,49],{},"These numbers aren't a technology problem. They're a rules problem. The rules are scattered, undocumented, inconsistent across channels, and expensive to change.",[29,51],{},[11,53,55],{"id":54},"why-adding-ai-to-broken-rules-doesnt-help","Why adding AI to broken rules doesn't help",[16,57,58],{},"The current industry conversation focuses heavily on applying AI to KYC: AI-powered document verification, AI-driven risk scoring, AI-based transaction monitoring. And AI is genuinely useful for specific tasks within KYC. Extracting data from identity documents. Matching faces to photos. Detecting anomalies in transaction patterns. Screening names against sanctions lists with fuzzy matching.",[16,60,61],{},"But these are extraction and classification tasks. They take unstructured or semi-structured input and produce structured output: a verified name, a risk score, a match/no-match result, a list of extracted fields from a passport.",[16,63,64],{},"The routing decision that follows is different. Given this risk score, this jurisdiction, this product type, and these screening results, which verification path does this customer take? That's not a question for a language model or a classification algorithm. That's a question for a decision table: a set of explicit, versioned, testable rules that map inputs to routing outcomes.",[16,66,67],{},"When the routing logic lives inside an AI model, or is scattered across multiple systems with no central definition, several things break.",[16,69,70],{},"Auditors ask \"why was this customer routed to simplified due diligence?\" and the answer requires a developer to trace code paths across multiple services. Regulators expect the institution to demonstrate that its risk-based approach is consistently applied. In January 2026, Fenergo reported that penalties for AML/KYC, sanctions, and CDD failures totalled $3.8 billion globally in 2025, with enforcement activity shifting toward EMEA and APAC. The consequences of not being able to explain a routing decision are significant — aggregate enforcement actions in this domain are routinely measured in the billions.",[16,72,73],{},"Rule changes are slow. When a jurisdiction updates its requirements, or when the institution adjusts its risk appetite, the change needs to propagate through every system that implements part of the routing logic. If the routing rules are distributed across code, configuration, and documentation, a single policy change can take weeks to implement and validate.",[16,75,76],{},"Testing is incomplete. Most institutions test their KYC technology (does the document scanner work? does the name screening return results?) but don't systematically test their routing logic (given this specific combination of risk factors, does the system route to the correct verification path?). The technology works; the rules are untested.",[29,78],{},[11,80,82],{"id":81},"what-kyc-routing-as-a-managed-rule-set-looks-like","What KYC routing as a managed rule set looks like",[16,84,85],{},"The alternative is treating KYC routing decisions as what they are: a set of explicit business rules that can be authored, versioned, tested, and audited independently of the technology that feeds them data.",[16,87,88],{},"A decision table for KYC routing might look like this in practice:",[16,90,91,95],{},[92,93,94],"strong",{},"Inputs:"," customer type (individual/corporate), jurisdiction risk rating, product risk rating, PEP status, sanctions screening result, source of funds clarity.",[16,97,98,101],{},[92,99,100],{},"Outputs:"," verification tier (simplified/standard/enhanced), required documents, approval path (auto/analyst/senior analyst), periodic review frequency.",[16,103,104],{},"Each row in the table is a rule. Each rule has a version, an author, a timestamp, and a test case. When the rule changes, the change is tracked. When a customer is routed, the system records which rule version was applied.",[16,106,107,108,112,113,116],{},"This separation has practical consequences. When a regulator asks why a particular customer was routed to simplified due diligence, the answer is specific: \"Rule 47, version 3.2, authored by ",[109,110,111],"span",{},"compliance officer"," on ",[109,114,115],{},"date",", matched conditions A, B, C, and produced routing outcome X. Here is the test suite that validates this rule. Here is the diff from the previous version.\"",[16,118,119],{},"When a jurisdiction changes its requirements, the compliance team updates the relevant rules in the decision table. The change is tested against existing cases. It deploys. The extraction layer (document verification, name screening, risk scoring) doesn't change because it doesn't contain routing logic.",[16,121,122],{},"When the institution changes its document verification vendor, or upgrades its AI-powered screening tool, the routing rules don't change because they operate on structured outputs (risk scores, match results, extracted fields), not on the technology that produced those outputs.",[29,124],{},[11,126,128],{"id":127},"the-extraction-routing-boundary","The extraction-routing boundary",[16,130,131],{},"This maps directly to the Extraction Pattern described in the context of AI agent architectures: the AI handles unstructured input (reading documents, matching faces, screening names) and produces structured output. The rules handle routing decisions based on that structured output.",[16,133,134],{},"The boundary between them is a typed contract. The AI components produce fields with defined types and valid values (risk_score: integer 1-100, pep_status: boolean, jurisdiction_risk: low/medium/high). The routing rules consume those fields and produce routing outcomes. Each side can be tested, updated, and replaced independently.",[16,136,137],{},"FATF's risk-based approach is consistent with this separation, even if it doesn't describe it in these terms. The expectation is that institutions can demonstrate their risk assessment methodology, show how it maps to due diligence measures, and prove that it is consistently applied. That's a description of a testable, auditable rule set, not an AI model.",[29,139],{},[11,141,143],{"id":142},"the-question-to-ask-your-compliance-team","The question to ask your compliance team",[16,145,146],{},"How many distinct routing rules does your KYC process actually contain? Not approximately, not \"it depends.\" How many explicit conditions map to how many distinct verification paths? Can you list them? Can you version them? Can you test them?",[16,148,149],{},"If the answer is \"we'd need to check with engineering,\" then your KYC routing logic is technical debt that happens to live in a regulated environment. And unlike most technical debt, this one sits in a domain where aggregate enforcement actions are routinely measured in the billions.",{"title":151,"searchDepth":152,"depth":152,"links":153},"",3,[154,156,157,158,159,160],{"id":13,"depth":155,"text":14},2,{"id":33,"depth":155,"text":34},{"id":54,"depth":155,"text":55},{"id":81,"depth":155,"text":82},{"id":127,"depth":155,"text":128},{"id":142,"depth":155,"text":143},"2026-03-03","Risk tiers, document requirements, verification paths — KYC routing is a set of explicit business rules. AI helps extract data, but the routing decision should be versioned, tested, and auditable.",false,"md",{},true,"/blog/2026-03-03-kyc-routing-rules-problem",{"title":6,"description":162},"blog/2026-03-03-kyc-routing-rules-problem",[171,172,173,174],"business-rules","compliance","fintech","industry-cases","hp8h8z-29QSKMIlSRbL32RP2EcqDrsPkG2hBiz8m_Vo",{"id":177,"title":178,"body":179,"date":350,"description":351,"draft":163,"extension":164,"meta":352,"navigation":166,"path":353,"seo":354,"stem":355,"tags":356,"__hash__":360},"blog/blog/2026-03-01-the-extraction-pattern.md","The Extraction Pattern: Using LLMs as Data Operators, Not Decision Makers",{"type":8,"value":180,"toc":342},[181,185,188,191,194,197,199,203,206,212,218,221,224,226,230,236,242,248,254,256,260,263,268,287,292,306,309,311,315,318,321,324,327,329,333,336,339],[11,182,184],{"id":183},"the-role-confusion-at-the-heart-of-most-ai-agent-architectures","The role confusion at the heart of most AI agent architectures",[16,186,187],{},"Most conversations about AI agents conflate two fundamentally different capabilities: the ability to understand unstructured input and the ability to make decisions about what to do with it.",[16,189,190],{},"LLMs are remarkably good at the first. They can read a natural-language request and extract structured data: intent, entities, urgency, sentiment, relationships. They can take a document with no fixed format and pull out the fields that matter. They can interpret ambiguous instructions and map them to a finite set of categories.",[16,192,193],{},"LLMs are noticeably less reliable at the second. Given the same input twice, they may choose different actions. Given a boundary condition, they may invent a response that sounds plausible but violates business rules. Given conflicting instructions, they may prioritize whichever one appeared most recently in the prompt — not whichever one is actually more important.",[16,195,196],{},"These aren't bugs. They're consequences of how language models work: they generate statistically likely continuations of a prompt, not logically guaranteed outcomes. This is what makes them useful for understanding language and dangerous for making decisions with consequences.",[29,198],{},[11,200,202],{"id":201},"the-pattern-extract-then-decide","The pattern: extract, then decide",[16,204,205],{},"The Extraction Pattern separates these two capabilities into distinct architectural layers:",[16,207,208,211],{},[92,209,210],{},"Layer 1: LLM as extractor."," The LLM receives unstructured input (text, document, conversation) and produces structured output: typed fields, classifications, extracted entities. Its contract is defined: given this input, return these fields in this format. The output is validated against a schema before proceeding.",[16,213,214,217],{},[92,215,216],{},"Layer 2: Rules as decision engine."," The structured output feeds into explicit rules — decision tables, policy definitions, routing logic — that determine the action. These rules are versioned, testable, and auditable. With a deterministic rules engine, the same structured input and the same rule version produce the same decision.",[16,219,220],{},"The boundary between layers is a typed contract. The LLM doesn't know what the rules will do with its output. The rules don't care how the LLM arrived at its extraction. Each layer can be tested, replaced, or updated independently.",[16,222,223],{},"This isn't a new idea. It's the same principle behind every well-designed system boundary: define the contract, validate the output, keep the concerns separated. What's new is applying this principle specifically to the LLM-rules boundary in AI agent architectures.",[29,225],{},[11,227,229],{"id":228},"what-this-solves","What this solves",[16,231,232,235],{},[92,233,234],{},"The reproducibility problem."," When an LLM makes a decision, reproducing the exact same decision with the exact same inputs is not guaranteed — model versions change, temperature settings vary, context window differences affect output. When rules make a decision, the same structured input and the same rule version always produce the same outcome. This is what \"deterministic\" means in practice: not \"no AI involved,\" but \"the decision path is fixed given the inputs.\"",[16,237,238,241],{},[92,239,240],{},"The explainability problem."," As discussed in depth in the context of GDPR Articles 13–15 and 22 and the EU AI Act's Article 86 (applicable from 2 August 2026 for Annex III high-risk systems with significant effects), intrinsic explainability (where the decision logic can be directly traced) is architecturally different from post-hoc explainability (where an approximation method tries to explain why a complex model produced a particular output). The Extraction Pattern keeps the LLM in the \"understanding\" role, where its contribution is a structured data extraction that can be displayed alongside the rule that acted on it. The explanation is traceable and auditable: \"The LLM extracted X from the input. Rule Y (version Z) matched condition A, producing outcome B.\"",[16,243,244,247],{},[92,245,246],{},"The testing problem."," Testing an LLM's decision-making requires probabilistic evaluation: run the same input many times, check that the output is \"usually\" correct. Testing a rule requires deterministic assertion: this input, this rule, this output — pass or fail. The Extraction Pattern means you have two separate test suites: one for extraction quality (does the LLM correctly identify the fields?) and one for decision correctness (do the rules produce the right outcomes?). Each can be tested independently, with appropriate methodology.",[16,249,250,253],{},[92,251,252],{},"The change management problem."," When business rules need to change, the change is isolated to the rule layer. In many cases, the extraction prompt or model can remain unchanged — unless the contract itself changes (new fields, labels, or allowed values). When the LLM model is updated (or replaced with a different provider), the decision logic is unaffected as long as the extraction still conforms to the typed contract. Changes in one layer don't cascade into the other.",[29,255],{},[11,257,259],{"id":258},"where-the-llm-should-and-shouldnt-operate","Where the LLM should — and shouldn't — operate",[16,261,262],{},"The Extraction Pattern doesn't eliminate the LLM. It defines where the LLM adds value and where it introduces risk.",[16,264,265],{},[92,266,267],{},"LLM is well-suited for:",[269,270,271,275,278,281,284],"ul",{},[272,273,274],"li",{},"Classifying unstructured input into predefined categories",[272,276,277],{},"Extracting named entities, dates, amounts, and other structured fields from free text",[272,279,280],{},"Interpreting ambiguous natural language and mapping it to a finite set of intents",[272,282,283],{},"Summarizing or scoring content along defined dimensions",[272,285,286],{},"Translating between formats (natural language → structured data)",[16,288,289],{},[92,290,291],{},"LLM should not be the mechanism for:",[269,293,294,297,300,303],{},[272,295,296],{},"Choosing which action to take (rules should determine this based on extracted data)",[272,298,299],{},"Evaluating policy compliance (explicit policies should be evaluated programmatically)",[272,301,302],{},"Determining access or permissions (security decisions should be deterministic)",[272,304,305],{},"Selecting from an unbounded set of options (the LLM should work within bounded sets defined by the system)",[16,307,308],{},"Google Cloud's 2025 retrospective noted that \"agents designed by engineers often lack the tribal knowledge gained through experience in finance, legal, HR, and sales.\" The Extraction Pattern addresses this by not asking the LLM to have domain expertise in decision-making — only in language understanding. Domain expertise lives in the rules, which are written and maintained by domain experts.",[29,310],{},[11,312,314],{"id":313},"the-contract-between-layers","The contract between layers",[16,316,317],{},"The critical engineering detail is the typed contract between the extraction layer and the decision layer.",[16,319,320],{},"A loosely defined contract — \"the LLM returns some JSON\" — reintroduces all the problems the pattern is trying to solve. If the fields are ambiguous, if the types aren't enforced, if the output isn't validated, then the rules are operating on unreliable input.",[16,322,323],{},"A well-defined contract specifies: these are the fields, these are the types, these are the valid values, these are the required vs. optional fields. The LLM's output is validated against this contract before it reaches the decision engine. If validation fails, the system handles it explicitly (retry, request clarification, reject) rather than passing malformed data to rules that weren't designed for it.",[16,325,326],{},"This is standard input validation — the same principle that underpins every API contract, every database schema, every type system. The difference is that the \"client\" producing the input happens to be a language model instead of a user or an upstream service.",[29,328],{},[11,330,332],{"id":331},"practical-implications","Practical implications",[16,334,335],{},"If you're building an AI-powered system that needs to make decisions with consequences — approval/rejection, routing, pricing, escalation — consider whether the LLM is the right component to be making those decisions, or whether it should be providing structured input to a decision engine that makes them.",[16,337,338],{},"The test is simple: can you trace every decision back to a specific, versioned rule and the specific data that triggered it? If yes, your system is auditable. If no — if the answer to \"why did the system do this?\" is \"the model decided\" — then you have an architecture that will struggle with every governance, compliance, and debugging challenge that production brings.",[16,340,341],{},"The LLM is the bridge between human language and machine logic. The rules are the logic. The contract between them is the architecture.",{"title":151,"searchDepth":152,"depth":152,"links":343},[344,345,346,347,348,349],{"id":183,"depth":155,"text":184},{"id":201,"depth":155,"text":202},{"id":228,"depth":155,"text":229},{"id":258,"depth":155,"text":259},{"id":313,"depth":155,"text":314},{"id":331,"depth":155,"text":332},"2026-03-01","LLMs extract structured data from unstructured input. Rules decide what to do with it. A typed contract between the two layers makes the system testable, auditable, and reproducible.",{},"/blog/2026-03-01-the-extraction-pattern",{"title":178,"description":351},"blog/2026-03-01-the-extraction-pattern",[357,358,359],"architecture","ai-agents","explainability","zcZBBCe5VjMrpyVlzCoAn-iLMm2705TokxQmQkeMxh4",{"id":362,"title":363,"body":364,"date":497,"description":498,"draft":163,"extension":164,"meta":499,"navigation":166,"path":500,"seo":501,"stem":502,"tags":503,"__hash__":506},"blog/blog/2026-02-28-the-rollback-problem.md","The Rollback Problem: What Happens When Step 4 Fails?",{"type":8,"value":365,"toc":490},[366,370,373,376,379,382,385,387,391,394,397,400,403,405,409,412,415,418,421,423,427,430,436,442,448,454,460,462,466,469,475,481,487],[11,367,369],{"id":368},"the-failure-scenario-nobody-designs-for","The failure scenario nobody designs for",[16,371,372],{},"Your automated process has five steps. Step 1 validates the input. Step 2 checks eligibility. Step 3 reserves a resource. Step 4 processes the payment. Step 5 sends the confirmation.",[16,374,375],{},"Step 4 fails.",[16,377,378],{},"Now what? Step 3 already reserved a resource that should be released. Step 2's eligibility check created a record. Step 1 logged the attempt. The system is in an inconsistent state — half-done work scattered across multiple services and databases.",[16,380,381],{},"In a monolithic application with a single database, you'd rely on a database transaction: if anything fails, everything rolls back. But in distributed systems — and especially in AI-agent-driven workflows that interact with multiple external services — there's no global transaction to rely on.",[16,383,384],{},"This is the rollback problem. It's not new. But with the rise of AI agents that autonomously execute multi-step workflows, it's become significantly more dangerous — because agents can initiate actions faster and across more systems than any manual process, and they don't naturally pause to ask \"can I undo this?\"",[29,386],{},[11,388,390],{"id":389},"a-well-documented-engineering-problem","A well-documented engineering problem",[16,392,393],{},"The Saga pattern, first described by Hector Garcia-Molina and Kenneth Salem in a 1987 Princeton University technical report (\"SAGAS\", TR-070-87), addresses exactly this scenario. Microsoft's Azure Architecture Center, AWS Prescriptive Guidance, and Chris Richardson's widely referenced microservices.io all document the same core concept: when a multi-step process can't rely on a single ACID transaction, each step must define a compensating transaction — a reverse operation that semantically undoes the work.",[16,395,396],{},"The key word is \"semantically.\" As Microsoft's Compensating Transaction pattern explains, you can't always roll back data changes with a simple database rollback; compensation is an application-specific process that applies business logic to undo previously completed work. You can't un-send an email. You can't un-call an API. But you can cancel a reservation, reverse a charge, or mark a record as void.",[16,398,399],{},"AWS documentation distinguishes platform-level failures (forward recovery via retry and continue) from application-level failures (backward recovery via compensating transactions). The choice between them is a design decision that must be made per step, not per system.",[16,401,402],{},"Microsoft's Compensating Transaction pattern documentation adds an important nuance: \"A compensating transaction might not have to undo the work in the exact reverse order of the original operation. It might be possible to perform some of the undo steps in parallel.\" This matters for performance — if your process has 8 steps and step 6 fails, you may be able to compensate steps 3, 4, and 5 simultaneously rather than sequentially.",[29,404],{},[11,406,408],{"id":407},"why-ai-agents-make-this-worse","Why AI agents make this worse",[16,410,411],{},"Traditional workflows — built with tools like Camunda, Temporal, or AWS Step Functions — at least force developers to think about the execution path. When you draw a BPMN diagram or write a state machine, you see the steps, you think about failure modes, and (sometimes) you define compensation logic.",[16,413,414],{},"AI agents invert this. In most agent frameworks, the LLM decides which tools to call and in what order, dynamically, at runtime. There's no pre-defined execution graph. There's no explicit declaration of \"if this step fails, undo that step.\" The agent reasons about what to do next, and if something fails, it reasons about what to do about the failure — introducing another layer of non-determinism.",[16,416,417],{},"An Edstellar article offers an illustrative cascade-failure scenario: a mislabeled supplier risk rating triggers a contract termination; a mishandled email kicks off automated reactions across procurement, legal, and finance. The scenario is hypothetical, but the pattern is recognizable: unlike rule-based automation that halts at failure, agents can push forward and compound bad decisions without oversight.",[16,419,420],{},"In an IBM Think interview, Maryam Ashoori cited figures suggesting only about 19% of organizations focus on observability and monitoring of agents in production — implying many teams still lack mature tracing for agent workflows and may not detect when a failure leaves downstream systems in an inconsistent state.",[29,422],{},[11,424,426],{"id":425},"the-design-principles-that-work","The design principles that work",[16,428,429],{},"The Saga pattern literature converges on a set of principles that apply directly to any automated workflow — whether human-coded or AI-driven:",[16,431,432,435],{},[92,433,434],{},"1. Every step declares its compensator."," Before a step executes, the system knows how to undo it. This isn't an afterthought — it's a precondition for execution. Microsoft's documentation puts it plainly: \"Use this pattern only for operations that must be undone if they fail. If possible, design solutions to avoid the complexity of requiring compensating transactions.\" When that's not possible, the compensating logic should be defined upfront.",[16,437,438,441],{},[92,439,440],{},"2. Compensation runs in reverse order."," When step 4 fails, you compensate step 3, then step 2, then step 1 — or in parallel where dependencies allow. The Saga Execution Coordinator (described in the Baeldung architecture guide) inspects the saga log to identify impacted components and the correct compensation sequence.",[16,443,444,447],{},[92,445,446],{},"3. Compensating actions must be idempotent and retryable."," A compensation can itself fail. The system must be able to retry it without causing additional inconsistency. DevX's practical analysis puts it well: \"Compensating actions should be retryable, observable, and honest about what cannot be undone. Some actions are irreversible, and your system must handle that with follow-up workflows or human intervention.\"",[16,449,450,453],{},[92,451,452],{},"4. State is preserved between steps."," If the process pauses (waiting for an external response, waiting for human approval), it must preserve full state so it can resume without losing context. This is the \"durable execution\" concept that Temporal has popularized — but the principle is universal.",[16,455,456,459],{},[92,457,458],{},"5. The entire execution is traced."," A consulting write-up on saga implementation describes an organization where manual investigation and rollback of failed multi-service transactions took hours per incident. After introducing orchestration with a dedicated saga log and automated compensating transactions, recovery was reduced to minutes. The specific figures are anecdotal, but the pattern is consistent with the broader Saga literature: automated compensation with a trace log dramatically reduces mean time to recovery.",[29,461],{},[11,463,465],{"id":464},"what-this-means-for-production-ai-systems","What this means for production AI systems",[16,467,468],{},"If you're building or deploying AI agents that take actions across multiple systems, ask three questions:",[16,470,471,474],{},[92,472,473],{},"For every action the agent can take — what's the undo?"," If the agent can create a record, what cancels it? If it can call an external API, what reverses that call? If there's no undo, that action needs human approval gates, not autonomous execution.",[16,476,477,480],{},[92,478,479],{},"When something fails — does the system know what already happened?"," If step 4 fails, can the system enumerate steps 1-3 and their compensating actions? Or does failure mean \"call an engineer and figure out what state things are in\"?",[16,482,483,486],{},[92,484,485],{},"Is the failure recovery deterministic or does it depend on the AI's judgment?"," If the LLM decides how to recover from a failure, you've compounded one source of unpredictability with another. If compensating transactions are predefined and execute mechanically, recovery is reliable regardless of what caused the failure.",[16,488,489],{},"The Saga pattern exists precisely because distributed systems can't pretend they have global transactions. AI agents exist in an even more distributed, less predictable environment. The rollback problem isn't going away — it's scaling with every new agent you deploy.",{"title":151,"searchDepth":152,"depth":152,"links":491},[492,493,494,495,496],{"id":368,"depth":155,"text":369},{"id":389,"depth":155,"text":390},{"id":407,"depth":155,"text":408},{"id":425,"depth":155,"text":426},{"id":464,"depth":155,"text":465},"2026-02-28","Distributed workflows can't rely on global transactions. AI agents make this worse — they act fast, across many systems, with no built-in undo. The Saga pattern offers design principles that apply.",{},"/blog/2026-02-28-the-rollback-problem",{"title":363,"description":498},"blog/2026-02-28-the-rollback-problem",[504,505,357],"failure-recovery","orchestration","Z7mj_ntv2gOKgnJ8k4K80LjHpAenp1NREYgo6Wj52vo",{"id":508,"title":509,"body":510,"date":637,"description":638,"draft":163,"extension":164,"meta":639,"navigation":166,"path":640,"seo":641,"stem":642,"tags":643,"__hash__":644},"blog/blog/2026-02-27-business-rules-technical-debt.md","Your Business Rules Are Technical Debt You Don't Track",{"type":8,"value":511,"toc":630},[512,516,519,522,525,528,530,534,537,540,543,546,552,558,564,570,572,576,579,582,585,588,590,594,597,600,603,609,615,618,620,624,627],[11,513,515],{"id":514},"the-debt-that-doesnt-show-up-on-any-dashboard","The debt that doesn't show up on any dashboard",[16,517,518],{},"McKinsey describes technical debt as \"digital dark matter\": you can infer its impact, but you can't see or measure it. Their research found that some business units carry up to 58% additional hidden cost in their IT total cost of ownership due to accumulated technical debt. CISQ's 2022 Cost of Poor Software Quality Report estimates accumulated software technical debt in the US at approximately $1.52 trillion.",[16,520,521],{},"These numbers are well-known. What's less discussed is a specific category of this debt that lives outside the codebase, outside the infrastructure, and often outside anyone's awareness: the business rules that govern how your organization actually makes decisions.",[16,523,524],{},"Every business runs on rules. Pricing logic. Approval thresholds. Routing conditions. Eligibility criteria. Escalation policies. Risk classifications. These rules determine who gets approved, what gets flagged, how much gets charged, and where requests get routed. They are the operating logic of the business.",[16,526,527],{},"And in most organizations, they are scattered across codebases, spreadsheets, configuration files, and — most dangerously — in the institutional memory of individual employees.",[29,529],{},[11,531,533],{"id":532},"how-rules-become-invisible-debt","How rules become invisible debt",[16,535,536],{},"McKinsey's analysis of technical debt identifies a pattern that applies directly to business rules: \"temporary fixes that inevitably become permanent, solutions that become outdated, and one-off implementations to meet business priorities.\" One company they studied suspected 50+ legacy applications carried major debt — but analysis showed just 20 asset types drove the majority, and just 4 debt types drove 50-60% of the impact.",[16,538,539],{},"Business rules follow the same pattern, but with an additional complication: unlike code, rules are rarely treated as first-class engineering artifacts. They don't get version control. They don't get automated tests. They don't get code review. They live in if/else blocks buried in application code, in Excel files on someone's desktop, or in policy documents that haven't been updated since the person who wrote them left.",[16,541,542],{},"A Fintech Today article on \"process debt\" in financial institutions describes exactly this: \"legacy steps that are long divorced from their original rationale but remain in place because of the sunk IT costs and established routines built upon them.\" The article argues that process debt is \"more hidden than technical debt\" because it's \"deeply embedded in culture and institutional routines.\"",[16,544,545],{},"This creates a specific set of problems:",[16,547,548,551],{},[92,549,550],{},"The change bottleneck."," When a business rule lives in code, changing it requires a development ticket, code modification, code review, testing, and deployment. CodeScene's technical-debt whitepaper reports that, on average, 40–50% of development time is spent on unplanned work; rules buried in code can be one contributing factor. A business analyst identifies that a threshold needs to change from 50 to 75. The actual change is one number. The process to deploy that change can take days or weeks.",[16,553,554,557],{},[92,555,556],{},"The knowledge dependency."," When rules exist as tribal knowledge, the organization is one resignation away from not understanding its own decision logic. This isn't hypothetical — it's the most common pattern in enterprise software. MIT's Project NANDA report argues that the core barrier to scaling enterprise GenAI is learning: systems fail due to brittle workflows, lack of contextual learning, and misalignment with day-to-day operations. Rules that nobody can explain are rules that nobody can automate.",[16,559,560,563],{},[92,561,562],{},"The testing gap."," In software engineering, untested code is considered a liability. Yet most business rules have never been systematically tested. There's no coverage report for your pricing logic. There's no regression suite for your approval matrix. When a rule changes, the test is production — and the test subjects are real customers.",[16,565,566,569],{},[92,567,568],{},"The audit black hole."," In regulated industries, the question \"why did the system make this decision?\" must have a traceable answer. When rules live in code, the answer requires a developer to read the code, understand the execution path, and reconstruct the reasoning. When rules live in someone's head, there is no answer.",[29,571],{},[11,573,575],{"id":574},"the-compounding-effect-with-ai","The compounding effect with AI",[16,577,578],{},"This problem isn't new. But it has become significantly more urgent with the adoption of AI agents.",[16,580,581],{},"When business rules are clear, versioned, and testable, adding an AI component is relatively straightforward: the AI handles the unstructured part (interpreting a document, extracting data from natural language), and the rules handle the decision part. The boundary is clean.",[16,583,584],{},"When business rules are opaque — buried in code, scattered across systems, undocumented — adding AI doesn't solve the problem. It compounds it. Now you have an opaque AI model feeding into opaque business rules, producing outcomes that nobody can explain, trace, or reproduce. OneTrust's 2025 AI-Ready Governance Report surveyed 1,250 governance-focused IT decision-makers and found that 90% of advanced AI adopters said AI exposed the limits of their siloed or manual processes. Even among organizations still experimenting, 63% reported the same strain.",[16,586,587],{},"The Cyberhaven Labs 2026 AI Adoption & Risk Report reinforces this: AI adoption is becoming fragmented, with the highest usage often occurring in environments with the least mature governance and visibility. The problem isn't AI — it's the absence of structured, governable logic underneath it.",[29,589],{},[11,591,593],{"id":592},"what-rules-as-a-managed-artifact-looks-like","What \"rules as a managed artifact\" looks like",[16,595,596],{},"The alternative is treating business rules with the same engineering discipline that we apply to code: versioned, tested, auditable, and owned by the people who understand the business logic.",[16,598,599],{},"This isn't a new idea. The Decision Model and Notation (DMN) standard — published by the Object Management Group (OMG) — has existed for years and is used for decision modeling in regulated environments including banking, insurance, and financial services. DMN defines a format for decision tables that are readable by business analysts, executable by machines, and versionable like any other artifact.",[16,601,602],{},"The practical difference:",[16,604,605,608],{},[92,606,607],{},"With rules in code:"," Business analyst writes a requirements document → Developer interprets it into code → Code reviewer checks syntax, not business logic → QA tests the feature, not the rule → Rule goes to production → Nobody can trace which rule produced which outcome.",[16,610,611,614],{},[92,612,613],{},"With rules as managed artifacts:"," Business analyst writes or edits the rule directly in a decision table → Test suite runs automatically → Rule is versioned with author, timestamp, and diff → Rule deploys → Every decision in production traces back to a specific rule, a specific version, authored by a specific person.",[16,616,617],{},"The difference isn't just speed (McKinsey describes a case where this kind of analysis identified $200–300M in trackable benefits over 3–5 years). The difference is that the organization can answer the question: \"Why did the system do that?\" — with a precise, auditable answer.",[29,619],{},[11,621,623],{"id":622},"the-question-to-ask-yourself","The question to ask yourself",[16,625,626],{},"If someone on your team left tomorrow — the person who built the pricing logic, the approval workflow, the routing rules — could the rest of the team explain exactly how those rules work? Could they change a threshold without a code deployment? Could they show an auditor which rule, which version, with which conditions produced a specific outcome?",[16,628,629],{},"If the answer is no, you have business rule debt. And unlike code debt, nobody is tracking it.",{"title":151,"searchDepth":152,"depth":152,"links":631},[632,633,634,635,636],{"id":514,"depth":155,"text":515},{"id":532,"depth":155,"text":533},{"id":574,"depth":155,"text":575},{"id":592,"depth":155,"text":593},{"id":622,"depth":155,"text":623},"2026-02-27","Pricing logic, approval thresholds, routing conditions — scattered across code, spreadsheets, and institutional memory. The debt that no dashboard shows.",{},"/blog/2026-02-27-business-rules-technical-debt",{"title":509,"description":638},"blog/2026-02-27-business-rules-technical-debt",[171,172],"cw5qY8OhxlIxtuLgvT7xvJvjL82o27XgI-h3GYcYEV4",{"id":646,"title":647,"body":648,"date":773,"description":774,"draft":163,"extension":164,"meta":775,"navigation":166,"path":776,"seo":777,"stem":778,"tags":779,"__hash__":780},"blog/blog/2026-02-26-explainability-is-architecture-decision.md","Explainability Is an Architecture Decision, Not a Feature",{"type":8,"value":649,"toc":765},[650,654,657,660,662,666,669,672,675,678,680,684,687,693,699,702,705,707,711,714,720,732,735,737,741,744,747,750,752,756,759,762],[11,651,653],{"id":652},"you-cant-bolt-on-transparency-after-the-fact","You can't bolt on transparency after the fact",[16,655,656],{},"There's a growing assumption in enterprise AI that explainability is a feature you add. That you build the system first, then add an \"explain\" button. That there's a library, an API, or a wrapper that makes any AI system transparent.",[16,658,659],{},"This assumption is architecturally wrong — and the regulatory landscape is making the consequences real.",[29,661],{},[11,663,665],{"id":664},"what-the-law-actually-requires","What the law actually requires",[16,667,668],{},"The EU AI Act entered into force on 1 August 2024 and is being phased in, with key obligations applying from 2025, 2 August 2026, and for some systems 2 August 2027. Article 86 applies where a deployer takes a decision based on the output of an Annex III high-risk AI system (with specified exceptions) and that decision produces legal effects or similarly significantly affects the person's health, safety, or fundamental rights. In such cases, the affected individual has the right to obtain \"clear and meaningful explanations of the role of the AI system in the decision-making procedure and the main elements of the decision taken.\"",[16,670,671],{},"Separately, the GDPR already requires \"meaningful information about the logic involved\" in automated decision-making (Article 15(1)(h)). The Court of Justice of the European Union strengthened this on 27 February 2025 (Case C-203/22, Dun & Bradstreet Austria), establishing that simply communicating \"a complex mathematical formula, such as an algorithm\" is insufficient — organizations must explain \"the procedure and principles actually applied\" in a way the affected person can understand.",[16,673,674],{},"In the US, the Equal Credit Opportunity Act (Regulation B) has long required creditors to provide specific reasons for denial. California's AB 2013 (effective January 1, 2026) requires developers of public-use generative AI systems to publish a high-level summary of training datasets. California's SB 942 applies to covered providers (those with over 1,000,000 monthly users) and requires a free AI-detection tool along with manifest and latent disclosures for content created or altered by their generative AI systems. These aren't abstract policy discussions — they're enforceable requirements with compliance deadlines.",[16,676,677],{},"The practical question for any system architect is: when a regulator, auditor, or affected individual asks \"why did the system make this decision?\" — what does your architecture allow you to answer?",[29,679],{},[11,681,683],{"id":682},"two-kinds-of-explainability-and-why-it-matters","Two kinds of explainability — and why it matters",[16,685,686],{},"XAI literature commonly distinguishes two approaches to explainability. A paper in the proceedings of MultiMedia Modeling (MMM 2025) and a TechPolicy.Press analysis both examine this distinction in the context of EU regulation:",[16,688,689,692],{},[92,690,691],{},"Intrinsic explainability"," is possible when the model or decision system is simple enough that the relationship between inputs and outputs can be directly traced. A decision tree, a rule-based system, a decision table — these are intrinsically explainable. You can point to a specific rule, a specific condition, and say: \"This input matched this condition, which triggered this outcome.\" The explanation is exact, not approximate.",[16,694,695,698],{},[92,696,697],{},"Post-hoc explainability"," uses external methods (SHAP, LIME, Shapley Values) to approximate why a complex model produced a particular output. These methods are applied after the fact to models that are too complex to trace internally — which includes virtually all large language models. The TechPolicy.Press analysis states it plainly: \"Any insight gained is only an approximation of the model's actual reasoning path. There's no guarantee of the accuracy or consistency of post-hoc explanations. And the more complex the model, the less reliable the approximations.\"",[16,700,701],{},"The same analysis concludes: \"Post-hoc explanations fall short of providing the kind of protections that are possible when human decisions are contested.\"",[16,703,704],{},"This is not a philosophical distinction. It's an architectural one. If you delegate decisions to an LLM, explainability typically becomes post-hoc — approximate, potentially inconsistent, and vulnerable to regulatory challenge. If an LLM only extracts structured data and explicit rules decide, the decision logic is intrinsically explainable — though the extraction step itself still requires validation and logging.",[29,706],{},[11,708,710],{"id":709},"what-this-looks-like-in-practice","What this looks like in practice",[16,712,713],{},"Consider a common pattern: processing an incoming request (a support ticket, an application, a claim — the domain doesn't matter). The system needs to understand the request, classify it, and route it to the appropriate handler based on business rules.",[16,715,716,719],{},[92,717,718],{},"Architecture A: LLM decides.","\nThe LLM receives the request, determines the category, assesses urgency, and selects the routing destination. When asked \"why was this routed to Team X?\", the answer is: \"The model determined this was the best routing.\" To explain further, you'd need post-hoc analysis tools, and the explanation would be an approximation.",[16,721,722,725,726,112,729,731],{},[92,723,724],{},"Architecture B: LLM extracts, rules decide.","\nThe LLM receives the request and extracts structured data: category, urgency indicators, key entities. This structured data is then evaluated by explicit rules — a decision table that maps combinations of category, urgency, and entity type to routing destinations. When asked \"why was this routed to Team X?\", the answer is: \"The LLM classified this as category Y with urgency Z. Rule 7 in the routing table (version 3.2, last modified by ",[109,727,728],{},"person",[109,730,115],{},") specifies that category Y + urgency Z routes to Team X.\"",[16,733,734],{},"Same outcome. Same use of AI. Fundamentally different explainability. And the difference was determined at architecture time, not after deployment.",[29,736],{},[11,738,740],{"id":739},"the-cost-of-getting-this-wrong","The cost of getting this wrong",[16,742,743],{},"Italy's Garante fined OpenAI €15 million over GDPR breaches tied to ChatGPT's processing of personal data, including lack of an adequate legal basis and transparency/information failures. The FTC's \"Operation AI Comply\" targeted deceptive AI marketing practices. These enforcement actions establish a clear pattern: regulators expect documented controls, technical safeguards, and evidence of compliance.",[16,745,746],{},"EM360Tech's analysis of enterprise AI strategy captures the shift: \"AI auditability is now a design requirement. Inspection-readiness becomes the default posture. Enterprises will be expected to demonstrate AI accountability in a way that holds up under external scrutiny. That includes documentation, decision logs, model and data governance, and clarity on who is responsible for what.\"",[16,748,749],{},"An MDPI-published framework for engineering explainable AI systems explicitly argues for making \"transparency and compliance intrinsic to both development and operation\" rather than treating explainability as \"an isolated post-hoc output.\"",[29,751],{},[11,753,755],{"id":754},"the-architecture-decision","The architecture decision",[16,757,758],{},"Explainability isn't a compliance checkbox. It's a design constraint that shapes the entire system architecture.",[16,760,761],{},"If you're building a system where decisions need to be explained — to regulators, to auditors, to affected individuals, or even to your own team debugging a production issue — the question isn't \"which XAI library should we use?\" The question is: \"Where in my architecture do decisions happen, and can I trace them?\"",[16,763,764],{},"The answer to that question determines whether you can explain your system's decisions exactly — or only approximately. And that distinction is becoming the difference between compliant and non-compliant, auditable and unauditable, trustworthy and not.",{"title":151,"searchDepth":152,"depth":152,"links":766},[767,768,769,770,771,772],{"id":652,"depth":155,"text":653},{"id":664,"depth":155,"text":665},{"id":682,"depth":155,"text":683},{"id":709,"depth":155,"text":710},{"id":739,"depth":155,"text":740},{"id":754,"depth":155,"text":755},"2026-02-26","Intrinsic vs. post-hoc explainability is determined at design time. EU AI Act Article 86, GDPR, and CJEU rulings are making this an engineering constraint, not a policy discussion.",{},"/blog/2026-02-26-explainability-is-architecture-decision",{"title":647,"description":774},"blog/2026-02-26-explainability-is-architecture-decision",[359,172,357],"f0b3uFw5gylzRMrEZTK9_O6utlWzZEGtlQK3vUKvPL0",{"id":782,"title":783,"body":784,"date":936,"description":937,"draft":163,"extension":164,"meta":938,"navigation":166,"path":940,"seo":941,"stem":942,"tags":943,"__hash__":944},"blog/blog/2026-02-25-ai-under-rules-vs-ai-under-hope.md","AI Under Rules vs. AI Under Hope: Two Architectures for Production Agents",{"type":8,"value":785,"toc":928},[786,790,793,805,807,811,814,817,820,823,826,829,831,835,838,841,844,846,849,862,864,868,871,877,880,883,889,895,897,901,904,907,910,913,915,919,922,925],[11,787,789],{"id":788},"the-architecture-question-hiding-behind-every-ai-deployment","The architecture question hiding behind every AI deployment",[16,791,792],{},"When an enterprise deploys an AI agent, there's a fundamental design choice that determines everything downstream: who makes the decisions — the LLM or the rules?",[16,794,795,796,800,801,804],{},"This isn't a theoretical question. It determines whether the system can be audited, whether failures can be traced, whether decisions can be rolled back, and whether the whole thing can be explained to a regulator. It's the difference between a system that produces ",[797,798,799],"em",{},"predictable outcomes"," and a system that produces ",[797,802,803],{},"plausible outputs",".",[29,806],{},[11,808,810],{"id":809},"architecture-a-the-llm-decides","Architecture A: The LLM decides",[16,812,813],{},"In the most common agent architecture today, the LLM is the decision-maker. It receives a task, reasons about it, selects tools, determines the order of operations, and executes. Frameworks like LangChain, CrewAI, and AutoGen follow this pattern. The LLM is the brain, the planner, and the executor.",[16,815,816],{},"This works remarkably well in demos. It also works well for low-stakes, exploratory tasks where occasional errors are acceptable. The problem emerges when you need the same system to behave consistently, explain itself, and operate within defined boundaries.",[16,818,819],{},"The CIO.com reporting on agentic AI trust issues captures this: \"Most enterprises today can stand up an agent but very few can explain, constrain, and coordinate a swarm of them.\" A product manager at Writer (an agent-building platform) observed that many organizations view agents as similar to API calls with predictable outputs, when in reality they behave more like \"junior interns.\"",[16,821,822],{},"IBM's Maryam Ashoori described the failure mode precisely: \"If the model hallucinates and takes the wrong tool, and that tool has access to unauthorized data, then you have a data leak.\" The issue isn't hallucination per se — it's that a hallucination at the decision layer has operational consequences that are difficult to detect, trace, and reverse.",[16,824,825],{},"Google Cloud's 2025 review noted that \"deploying agents has become less a software problem and more a governance challenge. In complex workflows with multiple agents, it's difficult to isolate which agent drove success or caused failure.\"",[16,827,828],{},"The fundamental problem: when the LLM decides, the decision boundary is the model itself — opaque, non-deterministic, and version-dependent. You can observe what it did. You cannot guarantee what it will do.",[29,830],{},[11,832,834],{"id":833},"architecture-b-the-llm-extracts-rules-decide","Architecture B: The LLM extracts, rules decide",[16,836,837],{},"The alternative architecture separates two distinct functions: understanding and deciding.",[16,839,840],{},"The LLM handles understanding — interpreting natural language, extracting structured data from unstructured input, classifying intent, identifying entities. This is what LLMs are demonstrably good at. The output is structured data that conforms to a typed contract.",[16,842,843],{},"The decision engine handles deciding — applying explicit rules to that structured data to determine what action to take. Which rules fire. What the thresholds are. Where to route. Whether to approve or escalate. These rules exist as versioned artifacts — decision tables, policy definitions — that can be read, tested, and audited independently of any AI model.",[16,845,602],{},[16,847,848],{},"In Architecture A, when you ask \"why did the agent route this request to the fraud team?\", the honest answer is: \"The LLM determined this was the best action based on its training and the context provided.\" The explanation is a reconstruction — a plausible story about what the model probably considered.",[16,850,851,852,855,856,112,859,861],{},"In Architecture B, the answer is: \"The LLM extracted these fields from the input: ",[109,853,854],{},"amount: $47,000, country: Nigeria, account_age: 3 days",". Rule 12 in the fraud_screening table (v2.4) specifies that amount > $10,000 AND account_age \u003C 7 days triggers fraud review. The rule was last modified by ",[109,857,858],{},"analyst name",[109,860,115],{},".\" The explanation is a fact — traceable to a specific rule, a specific version, a specific person.",[29,863],{},[11,865,867],{"id":866},"why-this-distinction-matters-more-in-2026","Why this distinction matters more in 2026",[16,869,870],{},"Three converging forces make this architectural choice urgent:",[16,872,873,876],{},[92,874,875],{},"1. Regulatory requirements are being phased in."," The EU AI Act (Regulation 2024/1689) becomes broadly applicable from 2 August 2026, with earlier dates for specific chapters. Article 86 establishes a right — for individuals affected by decisions based on Annex III high-risk AI systems with significant effects on health, safety, or fundamental rights — to obtain from the deployer clear and meaningful explanations of the AI system's role and the main elements of the decision.",[16,878,879],{},"The distinction between intrinsic and post-hoc explainability matters here. A TechPolicy.Press analysis of the GDPR and AI Act argues that intrinsic methods (where decision logic can be traced directly) and post-hoc methods (approximations applied to opaque models) differ materially for contestability — and warns that post-hoc explanations can be unreliable, falling short of the protections possible when human decisions are contested.",[16,881,882],{},"When the model is the decision boundary, explanations are typically post-hoc — reconstructions of what the model probably considered. When decisions flow through explicit rules, explanations are intrinsic — traceable to a specific rule and version. As enforcement of the AI Act begins (first obligations already apply, with the bulk taking effect August 2026), this architectural distinction has direct compliance implications.",[16,884,885,888],{},[92,886,887],{},"2. Agent sprawl is creating governance chaos."," IBM reports that by late 2025, enterprises found themselves with dozens or hundreds of agents on different platforms, built by different teams. Deloitte found only 1 in 5 companies has mature governance for autonomous agents. When decisions are made by LLMs inside black-box agents, governing a fleet of them is practically impossible. When decisions are made by explicit rules that exist as versioned artifacts, governance becomes a version control problem — still hard, but tractable.",[16,890,891,894],{},[92,892,893],{},"3. Failure recovery requires determinism."," When an AI agent makes an incorrect decision, recovery depends on understanding exactly what happened and undoing the consequences. If the LLM chose to call a tool that triggered an irreversible action, and you can't explain why it chose that tool, you can't build systematic failure recovery. If rules made the decision, the failure trace is exact: this input, this rule, this version, this outcome. And the fix is equally exact: change the rule, test, deploy.",[29,896],{},[11,898,900],{"id":899},"this-isnt-anti-ai","This isn't \"anti-AI\"",[16,902,903],{},"Separating \"understand\" from \"decide\" doesn't reduce the role of AI. It focuses the AI on what it does best — handling ambiguity, interpreting natural language, extracting structure from chaos — while keeping the decision boundary explicit and auditable.",[16,905,906],{},"The LLM remains essential. Without it, you're back to rigid forms and structured input — which is where many enterprise systems are stuck. The LLM is what makes the system able to handle real-world, messy, human input.",[16,908,909],{},"But the LLM's output goes through a checkpoint — typed, structured, validated — before it affects any decision. This is the same principle as input validation in security: you don't trust raw user input, and you shouldn't trust raw model output in a system where decisions have consequences.",[16,911,912],{},"Cleanlab's survey of production AI teams found that more than half plan to focus on reducing hallucinations, and 42% of regulated enterprises plan to introduce manager features such as approvals and review controls. These are symptoms of the same problem: when the LLM is the decision-maker, you need extensive human oversight to compensate for its unpredictability. When rules make decisions, the oversight shifts from \"did the AI do something wrong?\" to \"are the rules correct?\" — a more tractable, more scalable question.",[29,914],{},[11,916,918],{"id":917},"the-choice","The choice",[16,920,921],{},"Every AI agent in production sits somewhere on a spectrum between \"the LLM decides everything\" and \"the LLM extracts, rules decide.\"",[16,923,924],{},"Many agent frameworks default to the left side of that spectrum — because it's faster to build, more impressive in demos, and requires less upfront design. The enterprises that report success in production, particularly in regulated environments, tend to move toward the right — because it's auditable, testable, and governable.",[16,926,927],{},"This isn't a technology preference. It's a risk management decision. And like all risk management decisions, it should be made deliberately, not by default.",{"title":151,"searchDepth":152,"depth":152,"links":929},[930,931,932,933,934,935],{"id":788,"depth":155,"text":789},{"id":809,"depth":155,"text":810},{"id":833,"depth":155,"text":834},{"id":866,"depth":155,"text":867},{"id":899,"depth":155,"text":900},{"id":917,"depth":155,"text":918},"2026-02-25","The LLM decides everything or the LLM extracts and rules decide. Two architectures, different audit, compliance, and failure recovery outcomes.",{"slug":939},"ai-under-rules-vs-ai-under-hope","/blog/2026-02-25-ai-under-rules-vs-ai-under-hope",{"title":783,"description":937},"blog/2026-02-25-ai-under-rules-vs-ai-under-hope",[358,357,359],"KyzyQVixbJQZzIQqorJsrq3lMl-zepUOZSJjl9vrGMc",{"id":946,"title":947,"body":948,"date":1097,"description":1098,"draft":163,"extension":164,"meta":1099,"navigation":166,"path":1100,"seo":1101,"stem":1102,"tags":1103,"__hash__":1106},"blog/blog/2026-02-22-why-95-percent-of-enterprise-ai-pilots-fail.md","Why 95% of Enterprise AI Pilots Fail — and It's Not the Model's Fault",{"type":8,"value":949,"toc":1089},[950,954,957,960,967,970,972,976,979,982,988,994,1000,1002,1006,1009,1012,1015,1018,1021,1023,1027,1030,1036,1042,1048,1054,1056,1060,1063,1066,1069,1071,1075],[11,951,953],{"id":952},"the-number-everyone-quotes-and-what-it-actually-means","The number everyone quotes — and what it actually means",[16,955,956],{},"MIT's \"GenAI Divide\" report, published in mid-2025, analyzed over 300 public AI deployments, conducted 52 executive interviews, and surveyed 153 senior leaders. The headline finding: only 5% of enterprise GenAI pilots delivered measurable P&L impact. The rest stalled, delivered nothing, or were quietly abandoned. (Important to note: this is about generative AI pilots broadly — not AI agents specifically. As we'll see below, the data on agents in production is even starker.)",[16,958,959],{},"That number has been repeated everywhere — and also challenged. Critics (including the Marketing AI Institute's Paul Roetzer) have rightly pointed out that the methodology mixes early-stage \"learning pilots\" with production failures, and that defining \"success\" purely through P&L impact within a short observation window paints an incomplete picture. MIT itself describes the findings as \"directionally accurate,\" noting they are based on interviews rather than official reporting.",[16,961,962,963,966],{},"But here's what's worth paying attention to, even if you adjust the number: the ",[797,964,965],{},"pattern"," MIT identified is consistent across every other study. Cleanlab surveyed 1,837 engineering and AI leaders in 2025 and found only 95 had AI agents live in production — roughly the same ratio. Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027. S&P Global's Voice of the Enterprise survey (late 2024) found that 42% of companies had abandoned most of their AI initiatives, up from 17% the year prior. RAND Corporation notes that some estimates put AI project failure rates above 80% — about twice traditional IT projects.",[16,968,969],{},"Whether it's 80% or 95%, the signal is clear: the gap between a working demo and a reliable production system is where projects die.",[29,971],{},[11,973,975],{"id":974},"the-demo-worked-then-what","The demo worked. Then what?",[16,977,978],{},"A CIO quoted in MIT's research captures the pattern perfectly: \"We've seen dozens of demos this year. Maybe one or two are genuinely useful. The rest are wrappers or science projects.\"",[16,980,981],{},"The divide isn't about model quality. GPT-4, Claude, Gemini — they all produce impressive demos. The problem starts when you try to move from demo to production. MIT's report identifies three root causes:",[16,983,984,987],{},[92,985,986],{},"1. Brittle workflows."," AI pilots are usually built in isolation, on clean data, with cooperative inputs. Production means messy data, edge cases, concurrent users, external system dependencies, and failure modes nobody tested for. A corporate lawyer in the MIT study explained why she prefers ChatGPT over her firm's $50,000 contract analysis tool: the official tool \"provided rigid summaries with limited customization options.\" The problem wasn't the AI model — it was everything around it.",[16,989,990,993],{},[92,991,992],{},"2. No learning loop."," MIT's central finding is that \"most GenAI systems do not retain feedback, adapt to context, or improve over time.\" This is a critical observation. A system that makes the same mistakes repeatedly, that doesn't learn from corrections, that requires the same context explained in every session — this isn't a tool, it's a burden. Users noticed. Workers from over 90% of the surveyed companies reported regular use of personal AI tools for work — but abandoned enterprise tools that couldn't keep up.",[16,995,996,999],{},[92,997,998],{},"3. Misalignment with operations."," More than half of generative AI budgets go to sales and marketing tools, yet MIT found the biggest ROI in back-office automation — document processing, compliance workflows, internal operations. Companies invest where it's visible, not where it's valuable. The result: flashy pilots with no operational fit.",[29,1001],{},[11,1003,1005],{"id":1004},"the-governance-gap-nobody-planned-for","The governance gap nobody planned for",[16,1007,1008],{},"Beyond MIT's findings, a separate and equally important pattern has emerged. As enterprises scale from one pilot to many, governance becomes the bottleneck.",[16,1010,1011],{},"Deloitte's State of AI in the Enterprise survey (2025-2026, 3,235 senior leaders across 24 countries) found that only one in five companies has a mature model for governance of autonomous AI agents. In an IBM Think interview, Maryam Ashoori cited figures suggesting only about 19% of organizations focus on monitoring and observability of AI agents in production.",[16,1013,1014],{},"Fortune's reporting from enterprise AI conferences captures what this looks like on the ground. Kathleen Peters, Chief Innovation Officer at Experian, described the core question companies struggle with: \"If something goes wrong, if there's a hallucination, if there's a power outage — what can we fall back to?\" Many enterprises that have deployed agents still struggle to move from knowledge retrieval to action-oriented autonomy — keeping humans in the loop for every action, which limits the efficiency gains agents were supposed to deliver.",[16,1016,1017],{},"The World Economic Forum's December 2025 analysis named the \"trust deficit\" as one of three critical barriers to agentic AI adoption, alongside infrastructure and data gaps. Their assessment: \"AI models are non-deterministic, so they can behave unpredictably, and their deployment across multi-cloud, multi-agent environments introduces new risks and vulnerabilities.\"",[16,1019,1020],{},"IBM's Maryam Ashoori described the shift in enterprise focus: \"What enterprises are dealing with now is managing and governing a collection of agents. That has become an issue.\" By late 2025, enterprises found themselves with dozens or even hundreds of agents running across different platforms, built by different teams, under different assumptions. Building was easy. Running at scale was not.",[29,1022],{},[11,1024,1026],{"id":1025},"what-the-5-did-differently","What the 5% did differently",[16,1028,1029],{},"MIT's data reveals a clear pattern among the companies that succeeded:",[16,1031,1032,1035],{},[92,1033,1034],{},"They bought instead of built — and partnered instead of going solo."," Vendor partnerships succeeded about 67% of the time, while internal builds succeeded only about 33%. This doesn't mean outsourcing everything — it means not reinventing infrastructure. The successful 5% focused their engineering effort on domain-specific workflow integration, not on building generic AI capabilities.",[16,1037,1038,1041],{},[92,1039,1040],{},"They started with back-office operations, not customer-facing demos."," The highest ROI came from eliminating business process outsourcing, cutting external agency costs, and streamlining internal operations. Case studies in the MIT report showed $2–10M in annual savings from replacing outsourced support and document review.",[16,1043,1044,1047],{},[92,1045,1046],{},"They empowered line managers, not central AI labs."," Successful adoption happened when budget holders and domain managers surfaced problems, vetted tools, and led rollouts — rather than waiting for a centralized AI team to identify use cases.",[16,1049,1050,1053],{},[92,1051,1052],{},"They designed for failure."," Successful organizations ran pilots in real workflows (not controlled demos), expected breakdowns, used those breakdowns to improve governance, training, and security — and only then scaled. As the report puts it: \"Organizations that cross the GenAI Divide welcome small, early, contained failures.\"",[29,1055],{},[11,1057,1059],{"id":1058},"the-architectural-question-behind-the-governance-question","The architectural question behind the governance question",[16,1061,1062],{},"Every governance challenge traces back to an architecture decision. If your AI agent is a black box — if you can't explain why it chose action A over action B, if you can't replay a failed session, if you can't roll back a bad decision — then governance isn't just hard. It's impossible.",[16,1064,1065],{},"The question isn't whether to deploy AI agents. The question is: when something goes wrong (and it will), can your system explain what happened, undo the damage, and show an auditor exactly which rule, which version, which data led to the outcome?",[16,1067,1068],{},"That's not a governance policy question. That's an architecture question. And it needs to be answered before the first agent goes into production — not after the first audit failure.",[29,1070],{},[11,1072,1074],{"id":1073},"key-takeaways","Key takeaways",[269,1076,1077,1080,1083,1086],{},[272,1078,1079],{},"The pilot-to-production gap is real and well-documented across multiple studies, not just MIT's headline number.",[272,1081,1082],{},"The bottleneck is operational, not technological: brittle workflows, no feedback loops, misalignment with where value actually lives.",[272,1084,1085],{},"Governance maturity lags far behind deployment speed — only ~20% of companies have mature oversight models for AI agents.",[272,1087,1088],{},"The companies that succeed design for failure from day one, start in back-office operations, and focus on workflow integration over model sophistication.",{"title":151,"searchDepth":152,"depth":152,"links":1090},[1091,1092,1093,1094,1095,1096],{"id":952,"depth":155,"text":953},{"id":974,"depth":155,"text":975},{"id":1004,"depth":155,"text":1005},{"id":1025,"depth":155,"text":1026},{"id":1058,"depth":155,"text":1059},{"id":1073,"depth":155,"text":1074},"2026-02-22","MIT, Gartner, and S&P Global data point to the same pattern: the gap between demo and production kills AI projects. What the 5% that succeed do differently.",{},"/blog/2026-02-22-why-95-percent-of-enterprise-ai-pilots-fail",{"title":947,"description":1098},"blog/2026-02-22-why-95-percent-of-enterprise-ai-pilots-fail",[358,1104,1105],"governance","observability","ibIjSHGjvezi0RzrPoBNyjuSMs2FJGesBetRW6H_DG4",1772500485120]