AI Engine Integration (Generic Standard)

This document defines generic engineering standards for implementing an AI engine that:

Runs structured LLM calls with typed outputs (Pydantic models).
Supports a multi-agent architecture.
Exposes an extensible tooling layer (function calls).
Provides optional streaming progress/status updates for a better UX.

This page intentionally avoids product/domain specifics. It focuses on architecture and contracts.

Goals

One engine is the single entry point for model calls (consistent retries, logging, and safety defaults).
Typed outputs everywhere: each agent returns a Pydantic model so downstream code remains deterministic.
Tool gating: only mount tools when needed to reduce latency and error surface.
Prompt modularity: prompts are files, composed from reusable sections + per-agent templates.
Testability: agents and tools can be tested independently (prompt mapping, tool behavior, orchestration wiring).

Standard module layout

ai_engine/
├── core.py                   # AIEngine + config (model selection, retries) + tool mounting
├── deps.py                   # dependency container passed into tool contexts / agent runs
├── progress.py               # optional progress tracker for streaming status events
├── utils.py                  # prompt loading + shared helpers (e.g., embeddings, template rendering)
├── prompts/                  # reusable prompt sections shared across agents
├── agents/                   # specialized agents (one directory per agent)
│   └── <agent_name>/
│       ├── prompts/          # sys/user prompt templates
│       ├── generate.py       # reads prompt files + injects placeholders
│       ├── schemas.py        # Pydantic response models
│       ├── response.py       # calls AIEngine.generate(...) with schema + flags
│       ├── process.py        # orchestration (prepare inputs, call response, post-process)
│       └── tools.py          # optional agent-local helpers (avoid duplicating shared tools)
└── tools/                    # shared tools grouped by category
    └── <tool_category>/
        ├── tools.py
        └── prompts/
            └── tool_prompt.txt

Core engine contract (`core.py`)

`AIEngineConfig`

Expose a config object that controls at least:

model identifier (string)
retry count (int)

Keep it small and stable: config should be easy to override at call sites.

`AIEngine.generate(...)`

Implement a single async method that:

Accepts system_prompt, user_prompt, and result_type (a Pydantic model class).
Accepts a typed dependency object (see deps.py) so tools can access shared context.
Allows feature flags for prompt sections and tool mounting (e.g., include knowledge tools, include pricing tools, include “important instructions”, etc.).
Returns either:
- result_type instance (common case), or
- a full run result object when callers need tool history/debug info (optional).

Prompt composition inside the engine

Standard pattern:

Compose a final system prompt by concatenating reusable prompt sections:
- a “job role” or “agent role” template
- optional tenant/org context
- process/agent-specific system prompt
- optional tool guidelines (one per tool category)
- optional extra instructions

Prefer file-based templates for each section so changes are reviewable and testable.

Tool mounting

Tools should be attached to the agent conditionally:

Each tool category provides an attach_to_agent(agent) helper.
The engine attaches tool categories based on include_*_tools flags.
Tools should read the typed dependencies object from the run context rather than global state.

Dependency injection (`deps.py`)

Create a typed dependency container (dataclass) that can carry:

database session (optional, for tools that query storage)
provider client (optional, for embeddings or secondary model calls)
tenant/org id (or equivalent scope identifier)
conversation/transcript ids (optional, for auditing and retrieval)
process metadata (optional: versioning, intent/task ids, etc.)
progress tracker (optional, for streaming status)

Standards:

Keep the dependency surface minimal; add new fields only when a real tool/agent requires it.
Favor optional fields with safe behavior when missing (tools return empty/default results instead of raising).

Prompt and template standards

Store prompt files as plain text under:
- ai_engine/prompts/ for reusable shared sections
- ai_engine/agents/<agent>/prompts/ for agent-specific templates
- ai_engine/tools/<category>/prompts/tool_prompt.txt for tool instructions
Load prompt files with async IO.
Inject variables using a deterministic placeholder strategy (e.g., [...] markers replaced in generate.py).
When prompts require runtime fields from persistent configuration, render templates via a dedicated template renderer utility.

Tool standards (`tools/<category>/tools.py`)

Tool interface

Each tool should:

Be a single-purpose function or small set of functions.
Accept a typed RunContext[Deps] and explicit parameters (type annotated).
Return a predictable JSON-serializable shape (lists/dicts of primitives).
Validate inputs early (empty query, missing deps) and return safe defaults.

Tool prompt (`prompts/tool_prompt.txt`)

Keep tool prompts short and structured:

Tool name
Purpose
When to use
How to call
Return shape
Rules / constraints

Optional: streaming progress (`progress.py`)

For interactive UX, expose progress updates during processing:

Maintain a per-request progress tracker (e.g., an asyncio.Queue).
Provide a helper like emit_status("...") that tools/agents can call.
Surface status updates via SSE (or equivalent) as:
- status events (short messages)
- complete event (final payload)
- error event (structured error)

Guidelines:

Progress messages should be short and meaningful (milestones, not spam).
Progress tracking should be in-memory and best-effort (never block core processing).

Testing standards

Agent tests:
- Verify prompt generation replaces placeholders correctly.
- Verify response parsing returns the expected Pydantic model shape.
Tool tests:
- Verify tools handle missing deps and invalid inputs safely.
- Verify return shapes are stable.
Orchestration tests:
- Verify tool gating (which tools mount under which flags).
- Verify progress emission doesn’t leak tasks or state across requests.