EnAct Practice · Discipline 01

AI Agent
Design.

A Chatbot Answers. An Agent Completes. The Discipline Between Them Is Engineered, Not Prompted.

The agentic stack is built upward from a single unit — one agent that reliably perceives its context, decides what to do, acts on that decision, and knows when the work is finished. Everything above that layer — orchestration, multi-agent coordination, full workflow automation — compounds the behaviour of this underlying unit. A fragile agent produces a fragile system, regardless of how well the layers above it are engineered. Agent design is the discipline of making that unit robust enough to be trusted in production. It is not a prompt. It is an architecture.

Why most enterprise agents
fail quietly.

There is a recognisable pattern in agent deployments that do not survive their first quarter in production. The agent performs well in development. The team celebrates the demo. The system is pushed live. For the first few days, it works. Then, unexplained behaviours begin to surface — the agent loops on a particular class of input, invents a tool call that does not exist, misinterprets the state of a long-running task, calls the wrong API with the right intent, or exits a reasoning chain one step before it would have succeeded. Each failure is individually minor. In aggregate, they erode trust in the system faster than any competitor ever could.

The cause is almost never the model. The cause is that the agent was treated as a sophisticated prompt rather than an engineered system.

Frontier models are capable of most enterprise agentic tasks. There was no explicit reasoning pattern. No defined memory architecture. No state management layer. No stopping criteria. No input validation. No guardrail on action scope. No observability for the reasoning trace. No versioned definition of what the agent is supposed to be.

Agent design closes all of those gaps — not individually, but as a single coherent discipline. The question "what is the agent?" is answered in engineering terms before the model is invoked for the first time.

What Entiovi means
by AI Agent Design.

Agent design, in Entiovi's service context, is the deliberate specification of a production-grade single agent across seven dimensions — the cognitive architecture, the memory model, the planner, the tool interface, the output contract, the guardrail envelope, and the operational profile. Each dimension is a design decision with reliability, latency, and cost implications. None of them are optional. All of them are versioned, documented, and testable.

The output of an agent design engagement is not a clever system prompt. It is an agent specification — a document and a reference implementation that defines precisely how the agent reasons, what it remembers, which tools it can invoke, under what conditions it must escalate, how it handles failure, and how its behaviour will be evaluated in production. This specification becomes the single source of truth for everything the orchestration, workflow, and tool layers are built around.

It is also the boundary where accountability lives. An organisation that has only an agent prompt cannot meaningfully ask whether the agent is performing as designed — there is no design to check against. An organisation that has an agent specification can measure, audit, regress-test, and improve the agent as a first-class engineered component of its stack.

The anatomy of
a production agent.

Every agent Entiovi builds is constructed from the same six engineered components. The design choices within each are what make the agent fit its task.

Cognitive architecture

How the agent reasons

The reasoning pattern is the most consequential design choice. A ReAct agent interleaves thought, action, and observation in a single loop — fast, flexible, and well-suited to tasks with short, adaptive reasoning chains. A Plan-and-Execute agent produces a full plan before any action, then executes it step by step — stronger on tasks requiring global coherence but heavier on latency and cost. Reflexion adds a self-critique step after each attempt — stronger on tasks where failure modes can be identified by the agent itself. Entiovi's practice treats these as distinct architectural patterns selected against task characteristics, not as labels applied after the fact.

Memory

What the agent remembers, and for how long

Memory is not a single thing. Working memory holds the active reasoning state for the current task. Episodic memory holds a structured record of past interactions relevant to the current user, case, or session. Semantic memory holds learned facts about the organisation, the domain, and the user that persist across sessions. Each memory type has a different storage layer, retrieval mechanism, expiry policy, and access control profile. Conflating them produces agents that either forget things they should remember or remember things they should have forgotten.

Planning

How the agent decomposes the task

For tasks that require more than a handful of steps, implicit planning inside the reasoning loop is unreliable. An explicit planner — producing a structured plan that the agent then executes, revises, or regenerates in response to observations — consistently outperforms it. Hierarchical planners, where a high-level planner emits sub-goals that a lower-level agent executes, are used where the task depth warrants the overhead.

Tool interface

How the agent acts

Tool selection, argument generation, and result interpretation are three distinct engineering problems. Each requires structured schemas, validation layers, and a tool registry that the agent cannot silently circumvent. Tool-use reliability is rarely about model capability — it is about whether the tool definitions are tight, the schemas are strict, and the agent is constrained to invoke tools it has been explicitly granted.

Output contract

What the agent is required to produce

Free-form agent outputs are incompatible with downstream systems. Every production agent has a defined output contract — structured, schema-validated, and enforced. The agent is not considered finished until its final output conforms to that contract. Output validation is treated as part of the agent, not as a post-processing step.

Guardrail envelope

What the agent is not permitted to do

A production agent has a defined action envelope: permitted topics, permitted tools, permitted data scopes, permitted output formats, and permitted escalation paths. The envelope is enforced at multiple layers — in the system prompt, in the tool access layer, in the output validator, and in the human-in-the-loop checkpoints. Agents that rely on a single layer of constraint have no envelope at all; they have a suggestion.

Architecture considerations —
selecting the right pattern.

Architecture selection is a commercial decision before it is a technical one. The trade-offs are measurable, and the choice directly shapes the agent's reliability, latency, and unit economics. Entiovi evaluates every agent design across five architectural axes.

AXIS 01

Reasoning depth versus latency

A deeper reasoning pattern produces better outcomes on complex tasks at the cost of response time. Customer-facing agents have latency ceilings; back-office agents often do not. The architecture must fit the ceiling.

AXIS 02

Determinism versus adaptivity

Some processes demand deterministic execution — the same input must produce the same action, every time. Others benefit from an adaptive agent that reasons about the specifics of each case. Hybrid architectures — deterministic outer workflow with adaptive inner reasoning — are frequently the right answer in regulated environments.

AXIS 03

Statelessness versus continuity

Agents handling independent queries can be stateless; agents handling a multi-turn case, a long-running process, or a relationship with a user need continuity. Continuity requires explicit state persistence, careful state loading, and defensive handling of stale or corrupted state.

AXIS 04

Single-model versus routed

A single-model agent is simpler to build and operate. A routed agent — cheap fast model for easy cases, frontier model for hard ones, specialist model for specific intents — is more cost-efficient at scale and often more reliable. The cost threshold at which routing pays back is a calculation Entiovi runs before it is a pattern Entiovi applies.

AXIS 05

Controlled autonomy versus supervised autonomy

Agents can be granted autonomy in bounded envelopes — cheap actions, reversible actions, low-consequence actions. Higher-consequence actions — approving a payment, sending a regulated communication, modifying a production record — require a supervision model: human-in-the-loop, four-eyes review, or agent-on-agent review. The autonomy model is decided per action, not per agent.

Research perspective

What the field has established
about agent reliability.

Self-consistency improves agent outputs more reliably than model scale for bounded tasks

Running the same reasoning process multiple times and taking a consensus answer consistently outperforms single-shot reasoning on tasks where the model's judgement is the limiting factor — at a manageable multiple of the inference cost.

Structured tool definitions reduce hallucinated tool calls substantially

Research and production experience both confirm that strict schemas, enum constraints on arguments, and explicit tool descriptions reduce invalid tool calls by more than half compared to loosely defined tool interfaces.

Reflection steps improve agent success rates on long-horizon tasks

Adding an explicit critique-and-revise step after each attempt — where the agent evaluates its own output against the original objective — produces material gains on tasks that require multi-step planning, particularly when combined with a bounded retry budget.

Agent observability is the strongest predictor of production reliability

Deployments with trace-level reasoning logs, tool-call records, and token-level cost accounting are corrected in hours when behaviour drifts. Deployments without them are corrected in weeks, if at all.

Where well-designed agents make
the sharpest difference.

Customer operations

Scoped case resolution

An agent designed around a bounded set of resolution paths, a tight tool envelope, and a clear escalation contract reliably closes the majority of Tier-1 cases without human intervention, and hands off cleanly on the minority that fall outside its envelope.

Finance operations

Exception handling at transaction volume

Invoice matching, expense-policy review, and reconciliation exceptions carry a long tail of cases that rules engines cannot resolve. A well-designed agent with access to the relevant contracts, policies, and historical decisions clears the middle band of exceptions that previously required human review, leaving only the genuinely novel cases for specialists.

IT service management

Structured diagnostic loops

An L1 diagnostic agent designed with a defined reasoning pattern, a restricted tool set (logs, monitoring data, CMDB, runbooks), and explicit escalation triggers resolves a meaningful share of incidents before a human engineer is paged — and hands the rest upward with a structured diagnostic summary.

Sales operations

Research and drafting loops

An agent that can traverse internal CRM data, enrichment sources, and public web content to produce structured pre-call briefs is a different product from a copilot that summarises a single article. The former is an agent. The latter is an assistant. Designing for the former requires an explicit multi-tool reasoning architecture.

Regulatory and compliance operations

Bounded research agents

Where the task is to find, synthesise, and cite evidence across a defined knowledge domain, an agent with a constrained envelope, mandatory source citation, and a structured output contract outperforms both unassisted humans and unconstrained AI tools — with a traceable audit record for every output.

Six decisions that determine
whether an agent survives production.

Every agent design engagement Entiovi runs converges on the same six questions. Organisations that answer them clearly at the start of the build ship agents that survive the quarter. Organisations that defer these questions ship agents that do not.

What exactly is this agent responsible for — and what is explicitly outside its scope?

The sharper the answer, the more reliable the agent. Scope that is defined negatively — the agent will not attempt X, will not speak about Y, will not invoke tool Z — is as important as scope defined positively.

How will the agent's success be measured at the individual action level?

"The agent performs well" is not a success metric. Task completion rate, first-pass tool-invocation success rate, correct-escalation rate, mean cost per resolved case, and mean latency to completion — each measurable on every single invocation — are.

What does the agent's memory look like across its three layers?

Working memory strategy, episodic memory scope and retention, and semantic memory boundaries are design decisions that shape behaviour more than the choice of model. Deferring them produces agents that surprise their designers in unflattering ways.

Which tools is the agent authorised to invoke — and which are explicitly forbidden?

The tool envelope is a security boundary, a cost boundary, and a reliability boundary simultaneously. A tool registry enforced at the infrastructure level — not merely described in a prompt — is a requirement, not an optimisation.

What is the supervision model for each class of action the agent can take?

Low-consequence, reversible actions can be autonomous. High-consequence or irreversible actions require supervision. The decision is made per action class, not per agent, and is encoded into the agent's runtime, not left to the model's judgement.

How will the agent's behaviour be observed, regressed, and improved over time?

Every reasoning trace, every tool call, every output, every escalation — logged, queryable, and available for replay. Without this, the agent is a black box. With it, the agent is an improving asset.

Proof points

93% first-pass tool-invocation success rate after agent redesign around a strict tool schema and a structured output contract — up from 64% on the same workload under a loosely defined prompt-based agent.

8.2m → 1.4m mean task completion time on a Tier-1 support resolution workflow following the introduction of an explicit planner and a bounded reasoning budget, with resolution quality independently verified.

81% of cases resolved end-to-end without human intervention — against an escalation-accuracy rate of 96% on the remaining 19%, measured over eight weeks of production operation.

Zero out-of-envelope actions in 14 weeks of production operation on a regulated back-office agent, achieved through multi-layer guardrail enforcement and red-team evaluation prior to deployment.

How Entiovi engages.

Phase 01 01 2–3 weeks

Discovery and agent scoping

A structured assessment of the target workflow, the boundary of the agent's responsibility, the tools and data surfaces it will operate against, and the supervision model for each action class. Delivered as an agent specification — the foundation for everything that follows.

Phase 02 02 3–5 weeks

Architecture design and reference build

Cognitive architecture selection, memory architecture design, planner configuration, tool interface specification, output contract definition, and guardrail envelope construction. A reference implementation of the agent is built and evaluated against a task-specific test suite derived from operational data.

Phase 03 03 3–6 weeks

Production hardening

Observability instrumentation, cost and latency SLO configuration, failure-mode handling, state persistence layer, red-team evaluation against prompt injection and tool-misuse classes, and human-in-the-loop workflow integration. The agent is moved from a working reference build to a production-grade component.

Phase 04 04 2–3 weeks

Deployment and handover

Integration into the target system, staged rollout with canary traffic, operational runbook authoring, and operator training. The organisation receives an agent it understands and can operate — with a specification, a test suite, and a monitoring surface that make continued improvement straightforward.

Phase 05 05 Continuous

Ongoing agent stewardship

Behavioural monitoring, periodic red-teaming, model substitution as better or cheaper models become available, and specification updates as the operational context evolves. Agents that are actively stewarded compound in value. Agents that are not, degrade.

The foundation everything else rests on

The foundation
everything else rests on.

The agentic stack is only as reliable as the agent at its base. An organisation investing in orchestration, multi-agent systems, or autonomous workflows without first establishing how its single agents are designed and operated is building upward on a shifting foundation. The discipline of agent design is what makes the rest of the stack worth building. It is the difference between an impressive demo and a system an enterprise can run a business on.

Entiovi's team will assess, in a structured two-week engagement, whether a given workflow is ready for agent deployment — and exactly what the agent should be designed to do, remember, invoke, and escalate.

Start with an agent scoping engagement

Back to Rigel Practice

Entiovi · Rigel Practice · Discipline 01

AI Agent Design.

Why most enterprise agentsfail quietly.

What Entiovi meansby AI Agent Design.

The anatomy ofa production agent.