Concepts overview
The agent backend has a small number of moving parts. Once they click, everything else in the API is just a different verb on one of these objects.
The object graph
User (OAuth-issued JWT)
└─ Session (a conversation)
├─ Extensions (tools, RAG, MCP providers)
├─ Messages (turns: user, assistant, system)
│ └─ Content blocks (text, tool_use, tool_result)
└─ Tool Calls (async function invocations)
└─ Heartbeats (keep-alive for long-running work)
Models (factory functions: agent, generator, …)
└─ System prompt (composed at request time)
├─ Product prompt
├─ Tenant prompt (overrides product)
└─ User prompt (overrides tenant)
Quota (per-account cost ceiling in cents)- A session owns its messages, extensions, and any tool calls those messages produced.
- Extensions configure per-session capabilities: client-supplied tools, RAG document retrieval, and MCP providers.
- A model is a factory that bundles a system prompt, a tool set, and inference params. Different endpoints route to different models, and you can register your own.
- Prompts resolve hierarchically — most specific wins.
- Tools come from two sources: client-supplied definitions and the external MCP gateway.
- Quotas gate inference cost across the whole account.
Lifecycle of a request
When you POST /v1/messages/:sessionId with a user message, the server:
- Authenticates the request (OAuth-issued JWT, Bearer header or Authorization cookie) and resolves the user, tenant, and account.
- Loads the session and its message history.
- Resolves the system prompt by stacking product → tenant → user overrides.
- Loads tools from two sources: the session's
extensions.toolsand the MCP gateway (for enabled providers). - If RAG is configured, embeds the query and retrieves grounding documents from the configured indexes.
- Calls Bedrock with the conversation, tools, and retrieved context.
- Either streams tokens back (
/stream) or buffers and returns the whole response (default). - If the model decided to call a tool, persists a tool call with
status
PENDINGand returns it to the caller for fulfillment. - Records token usage against the account's quota.
Info
Tool calls are explicitly two-phase: the model asks for a tool, the caller executes it, and only then is the result fed back into the conversation. See Tools for the full handshake.
What's pluggable
Everything that touches the outside world is configuration, not code:
- Auth — OAuth-only, with the identity provider plugged in via the
@rfsmart/oauth-consumerlibrary. No API-key flow. - MCP gateway —
MCP_GATEWAY_DOMAINpoints at the platform's own gateway; the gateway itself carries a pluggable provider registry. NetSuite is the only provider enabled today. - Models — register your own factory; pick one per endpoint.
- Products — multi-tenancy key used in prompt/dashboard/quick-prompt routes. Register a product per consuming surface.
- Foundation model — Claude 4.5 family on Bedrock today (Sonnet, Opus, Haiku variants); the model definition controls which.
- RAG indexes — any vector index can be registered and scoped per session via tag filters.
Where to dig deeper
- Sessions — how conversations are created and torn down.
- Messages — message shape, streaming, content blocks.
- Tools — the async tool-call protocol with heartbeats and tool definitions.
- Extensions — per-session capabilities: tools, RAG, and MCP providers.
- Models — what a model bundle looks like and which models ship with the platform.