Chapter 11: Reliability
Retries, rate limits, cancellation, and structured logging keep the agent useful when providers fail, users interrupt work, or usage starts to scale.
1. Error Recovery & Retries
The Problem
API calls fail. Your model provider can return 429 (rate limit), 500 (server error), or just time out. Right now, one failed streamText() call crashes the entire agent.
The Fix
Wrap LLM calls with exponential backoff:
Create a helper file:
Edit src/agent/retry.ts:
async function withRetry<T>(
fn: () => Promise<T>,
maxRetries: number = 3,
baseDelay: number = 1000,
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
const err = error as Error & { status?: number };
// Don't retry client errors (400, 401, 403) — they won't succeed
if (err.status && err.status >= 400 && err.status < 500 && err.status !== 429) {
throw error;
}
if (attempt === maxRetries) throw error;
const delay = baseDelay * Math.pow(2, attempt) + Math.random() * 1000;
await new Promise((resolve) => setTimeout(resolve, delay));
}
}
throw new Error("Unreachable");
}
Apply it to every LLM call:
Edit src/agent/run.ts:
const result = await withRetry(async () =>
streamText({
model: provider.chat(MODEL_NAME),
messages,
tools: modelTools,
})
);
Keep using the model-facing modelTools from Chapter 4 here. Retries should repeat the model request, not accidentally execute real tools inside streamText().
Going Further
- Use the AI SDK’s built-in retry options where available
- Implement circuit breakers — if the API fails 5 times in a row, stop trying and tell the user
- Log every retry with timestamps so you can correlate with provider outages
- Set per-call timeouts (don’t let a single request hang forever)
2. Rate Limiting & Cost Controls
The Problem
An agent in a loop can burn through API credits fast. A runaway loop (tool fails → agent retries → fails again → retries) could cost hundreds of dollars before anyone notices.
The Fix
We already track context usage in src/agent/context:
tokenEstimator.tsestimates how many tokens are in the message history.modelLimits.tscompares that estimate against the model context window.run.tsreports context percentage and triggers compaction when needed.
That answers:
Are we close to the model's context window?
Rate limiting and cost controls answer a different question:
Is this agent spending too much, looping too long, or calling too many tools?
Keep those production guardrails in a separate helper so src/agent/context stays focused on context-window management.
Create a usage tracker:
Edit src/agent/usage.ts:
export interface UsageLimits {
maxTokensPerConversation: number;
maxToolCallsPerTurn: number;
maxLoopIterationsPerTurn: number;
maxCostPerConversation: number; // in dollars
}
export const DEFAULT_USAGE_LIMITS: UsageLimits = {
maxTokensPerConversation: 500_000,
maxToolCallsPerTurn: 10,
maxLoopIterationsPerTurn: 50,
maxCostPerConversation: 5.00,
};
export class UsageTracker {
private totalTokens = 0;
private totalCost = 0;
private toolCallsThisTurn = 0;
private loopIterationsThisTurn = 0;
constructor(private limits: UsageLimits) {}
startTurn(): void {
this.toolCallsThisTurn = 0;
this.loopIterationsThisTurn = 0;
}
addTokens(count: number, isOutput: boolean): void {
this.totalTokens += count;
// Approximate cost (adjust rates per model)
const rate = isOutput ? 0.000015 : 0.000005; // per token
this.totalCost += count * rate;
}
addToolCall(): void {
this.toolCallsThisTurn++;
}
addIteration(): void {
this.loopIterationsThisTurn++;
}
check(): { ok: boolean; reason?: string } {
if (this.totalTokens > this.limits.maxTokensPerConversation) {
return { ok: false, reason: `Token limit exceeded (${this.totalTokens})` };
}
if (this.toolCallsThisTurn > this.limits.maxToolCallsPerTurn) {
return { ok: false, reason: `Tool call limit exceeded (${this.toolCallsThisTurn})` };
}
if (this.loopIterationsThisTurn > this.limits.maxLoopIterationsPerTurn) {
return { ok: false, reason: `Loop iteration limit exceeded (${this.loopIterationsThisTurn})` };
}
if (this.totalCost > this.limits.maxCostPerConversation) {
return { ok: false, reason: `Cost limit exceeded ($${this.totalCost.toFixed(2)})` };
}
return { ok: true };
}
}
This tracker intentionally mixes two scopes:
totalTokensandtotalCostpersist across the whole conversation.toolCallsThisTurnandloopIterationsThisTurnreset for each user turn.
That gives you the useful production behavior: stop one runaway turn, but also stop a long conversation if total cost keeps accumulating.
Create the tracker in the UI so it survives across multiple calls to runAgent.
Edit src/ui/App.tsx:
import { useRef } from "react";
import { DEFAULT_USAGE_LIMITS, UsageTracker } from "../agent/usage.ts";
function App() {
const usageTrackerRef = useRef(new UsageTracker(DEFAULT_USAGE_LIMITS));
// ...
const newHistory = await runAgent(
input,
conversationHistory,
callbacks,
usageTrackerRef.current,
);
}
Then accept the tracker in the agent loop:
Edit src/agent/run.ts:
import type { UsageTracker } from "./usage.ts";
function withoutSystemMessages(messages: ModelMessage[]): ModelMessage[] {
return messages.filter((message) => message.role !== "system");
}
export async function runAgent(
userMessage: string,
conversationHistory: ModelMessage[],
callbacks: AgentCallbacks,
usageTracker: UsageTracker,
): Promise<ModelMessage[]> {
let workingHistory = withoutSystemMessages(
filterCompatibleMessages(conversationHistory),
);
usageTracker.startTurn();
const initialLimitCheck = usageTracker.check();
if (!initialLimitCheck.ok) {
const stopMessage = `\n[Agent stopped: ${initialLimitCheck.reason}]`;
callbacks.onToken(stopMessage);
callbacks.onComplete(stopMessage);
return withoutSystemMessages([
...workingHistory,
{ role: "user", content: userMessage },
{ role: "assistant", content: stopMessage.trim() },
]);
}
// Now it is safe to do LLM-backed compaction if needed.
// ...
let fullResponse = "";
while (true) {
usageTracker.addIteration();
const limitCheck = usageTracker.check();
if (!limitCheck.ok) {
const stopMessage = `\n[Agent stopped: ${limitCheck.reason}]`;
callbacks.onToken(stopMessage);
fullResponse += stopMessage;
break;
}
const result = await withRetry(async () =>
streamText({
model: provider.chat(MODEL_NAME),
messages,
tools: modelTools,
})
);
// ... stream text and collect tool calls
const usage = await result.usage;
usageTracker.addTokens(usage.inputTokens ?? 0, false);
usageTracker.addTokens(usage.outputTokens ?? 0, true);
for (const tc of toolCalls) {
const approved = await callbacks.onToolApproval(tc.toolName, tc.args);
if (!approved) {
break;
}
usageTracker.addToolCall();
const toolLimitCheck = usageTracker.check();
if (!toolLimitCheck.ok) {
const stopMessage = `\n[Agent stopped: ${toolLimitCheck.reason}]`;
callbacks.onToken(stopMessage);
fullResponse += stopMessage;
break;
}
// ... execute each approved tool
}
}
}
UsageTracker is capitalized because it is a class. The instance is named usageTracker because variables use lower camel case.
The important thing is that every tracked counter must be updated where the event happens:
- Call
startTurn()once per user turn, before the agent loop starts. - Call
check()before any LLM-backed compaction or generation work. - Call
addIteration()once per agent loop iteration. - Call
addTokens(...)after an LLM response reports usage. - Call
addToolCall()after approval, when a tool call is about to be executed, then check immediately before running it.
Minimal Test
First test the tracker itself without calling an LLM:
npx tsx --eval '
import { UsageTracker } from "./src/agent/usage.ts";
const tracker = new UsageTracker({
maxTokensPerConversation: 100,
maxToolCallsPerTurn: 1,
maxLoopIterationsPerTurn: 2,
maxCostPerConversation: 1,
});
tracker.startTurn();
console.log("start", tracker.check());
tracker.addToolCall();
console.log("one tool", tracker.check());
tracker.addToolCall();
console.log("two tools", tracker.check());
tracker.startTurn();
console.log("new turn", tracker.check());
tracker.addTokens(101, false);
console.log("tokens", tracker.check());
'
Expected shape:
start { ok: true }
one tool { ok: true }
two tools { ok: false, reason: 'Tool call limit exceeded (2)' }
new turn { ok: true }
tokens { ok: false, reason: 'Token limit exceeded (101)' }
Then do a tiny integration test for the tool-call guard.
Temporarily lower the limit in src/agent/usage.ts:
maxToolCallsPerTurn: 0,
Run the app:
npm run start
Ask:
Run pwd
Expected result: after you approve the tool call, the agent should print something like:
[Agent stopped: Tool call limit exceeded (1)]
Because the limit is 0, the first approved tool call is counted, checked immediately, and blocked before the command executes.
Finally test conversation-level accumulation.
Temporarily lower the token limit in src/agent/usage.ts:
maxTokensPerConversation: 1,
Run the app:
npm run start
Send one normal message:
hi
Then send a second message:
hi again
Expected result: the second turn should stop immediately with something like:
[Agent stopped: Token limit exceeded (...)]
This confirms UsageTracker is stored outside runAgent, so token/cost usage survives across multiple turns in the same UI session.
After testing, restore the normal limits.
Going Further
- Per-user and per-organization limits
- Daily/monthly budget caps with email alerts
- Show cost estimates to users before expensive operations
- Implement token budgets per tool call (truncate large file reads)
3. Cancellation
The Problem
The user asks the agent to do something, then realizes it’s wrong.
Ctrl+C can kill the whole Node process, but production agents need a gentler option: cancel the current model/tool run, clean up UI state, and return control to the prompt without corrupting the session.
The Fix
Use an AbortController. The controller lives in the UI, and its signal is passed into the agent runner.
Add signal support to the agent runner:
Edit src/agent/run.ts:
export async function runAgent(
userMessage: string,
conversationHistory: ModelMessage[],
callbacks: AgentCallbacks,
signal?: AbortSignal, // NEW
): Promise<ModelMessage[]> {
// ...
while (true) {
// Check for cancellation at the top of each loop
if (signal?.aborted) {
callbacks.onToken("\n[Cancelled by user]");
break;
}
const result = streamText({
model: provider.chat(MODEL_NAME),
messages,
tools: modelTools,
abortSignal: signal, // Pass to AI SDK
});
// ...
}
}
In the UI, wire Ctrl+C to the abort controller.
First, disable Ink’s default Ctrl+C exit behavior in the entry files. Otherwise Ink exits the app before your useInput handler gets a chance to cancel the active run.
Edit src/index.ts:
render(React.createElement(App), {
exitOnCtrlC: false,
});
Edit src/cli.ts:
render(React.createElement(App), {
exitOnCtrlC: false,
});
Then import useInput if App.tsx does not already import it:
import { Box, Text, useApp, useInput } from "ink";
Then add cancellation state near the other useState calls inside App:
Edit src/ui/App.tsx:
const [abortController, setAbortController] = useState<AbortController | null>(null);
Add the Ctrl+C handler inside the App component, after the state declarations and before handleSubmit:
useInput((input, key) => {
if (key.ctrl && input === "c") {
if (abortController) {
abortController.abort();
} else {
exit();
}
}
});
Finally, create the controller inside handleSubmit, immediately before the runAgent(...) call. Do not put this at the top level of the component:
const controller = new AbortController();
setAbortController(controller);
try {
const newHistory = await runAgent(
userInput,
conversationHistory,
{
onToken: (token) => {
setStreamingText((prev) => prev + token);
},
onToolCallStart: (name, args) => {
// existing callback body
},
onToolCallEnd: (name, result) => {
// existing callback body
},
onComplete: (response) => {
// existing callback body
},
onToolApproval: (name, args) => {
// existing callback body
},
onTokenUsage: (usage) => {
setTokenUsage(usage);
},
},
controller.signal,
);
setConversationHistory(newHistory);
} finally {
setAbortController(null);
setIsLoading(false);
}
The placement matters:
exitOnCtrlC: falsebelongs in the Inkrender(...)options so the app, not Ink, decides what Ctrl+C means.useStatebelongs at the top ofApp, next to the other state.useInputbelongs insideApp, but outsidehandleSubmit.new AbortController()belongs insidehandleSubmit, right before the currentrunAgent(...)call.controller.signalis passed as the fourth argument torunAgent.- The Ctrl+C handler only calls
abort(). It does not clear loading state directly. finallyclears the controller and loading state afterrunAgentactually unwinds.
Minimal Test
Run the app:
npm run start
Submit a prompt that takes a moment:
help me draft something 50 words
While the UI shows Thinking..., press Ctrl+C.
Expected behavior:
- The app does not immediately exit.
- The current run is cancelled.
- The input prompt becomes usable again.
- Pressing Ctrl+C again while idle exits the app.
Going Further
This is basic cancellation. It gives the UI a way to ask the active model request to stop, but it does not make every part of the agent fully cancellation-safe.
The remaining hardening is inside runAgent and tools:
- Check
signal.abortedinside the streaming loop, not only at the top of the outer agent loop. - Treat abort errors from
result.fullStreamas cancellation, not normal failures. - Avoid waiting on
result.finishReason,result.usage, orresult.responseafter cancellation. - Resolve pending tool approvals when cancellation happens.
- Pass cancellation into long-running tools, especially shell commands and code execution.
Those are production hardening steps. The minimal version above is enough to distinguish “cancel this run” from “exit the whole app,” which is the first behavior users expect.
4. Structured Logging
The Problem
When something goes wrong in production, console.log isn’t enough. You need to know which conversation, which tool call, what inputs, what the LLM decided, and why.
The Fix
Create a small JSONL logger, then wire it into runAgent.
JSONL means “one JSON object per line.” It is easy to append, stream, grep, and import into other tools later.
Edit src/agent/logger.ts:
import { appendFileSync, mkdirSync } from "node:fs";
type LogEvent =
| "agent_run_started"
| "agent_run_completed"
| "llm_call_started"
| "llm_call_completed"
| "tool_call"
| "tool_execution_started"
| "tool_result"
| "approval"
| "error";
interface LogEntry {
timestamp: string;
conversationId: string;
runId: string;
event: LogEvent;
data: Record<string, unknown>;
}
export class AgentLogger {
private entries: LogEntry[] = [];
private logPath = ".agent/logs/agent.jsonl";
constructor(
private conversationId: string,
private runId: string,
) {
mkdirSync(".agent/logs", { recursive: true });
}
log(event: LogEvent, data: Record<string, unknown> = {}): void {
const entry: LogEntry = {
timestamp: new Date().toISOString(),
conversationId: this.conversationId,
runId: this.runId,
event,
data,
};
this.entries.push(entry);
appendFileSync(this.logPath, JSON.stringify(entry) + "\n");
}
logToolCall(name: string, args: unknown): void {
this.log("tool_call", { toolName: name, args });
}
logToolExecutionStarted(name: string, args: unknown): void {
this.log("tool_execution_started", { toolName: name, args });
}
logToolResult(name: string, result: string, durationMs: number): void {
this.log("tool_result", {
toolName: name,
resultLength: result.length,
durationMs,
});
}
logError(error: Error, context: string): void {
this.log("error", {
message: error.message,
stack: error.stack,
context,
});
}
}
This logger is intentionally boring. It writes local JSONL, creates the directory if needed, and includes both a conversationId and a per-turn runId.
Wire It Into runAgent
Edit src/agent/run.ts:
Add the imports:
import { randomUUID } from "node:crypto";
import { AgentLogger } from "./logger.ts";
Create a logger near the top of runAgent:
export async function runAgent(
userMessage: string,
conversationHistory: ModelMessage[],
callbacks: AgentCallbacks,
usageTracker: UsageTracker,
signal?: AbortSignal,
): Promise<ModelMessage[]> {
const logger = new AgentLogger("default", randomUUID());
logger.log("agent_run_started", {
model: MODEL_NAME,
historyLength: conversationHistory.length,
userMessageLength: userMessage.length,
});
try {
// existing runAgent logic goes here
} catch (error) {
logger.logError(error as Error, "runAgent");
throw error;
}
}
In the real file, do not delete the existing runAgent body. Add the logger, log agent_run_started, and wrap the existing body in the try block so failures are logged before they are re-thrown to the UI.
For now, "default" matches the saved conversation id used by the app. Later, if you support multiple conversations, pass the real conversation id into runAgent instead.
Log The Model Call
Before streamText, log that the model request is starting:
logger.log("llm_call_started", {
model: MODEL_NAME,
messageCount: messages.length,
});
const result = await withRetry(async () =>
streamText({
model: provider.chat(MODEL_NAME),
messages,
tools: modelTools,
allowSystemInMessages: true,
experimental_telemetry: {
isEnabled: true,
tracer: getTracer(),
},
abortSignal: signal,
}),
);
After usage is available, log the result:
const usage = await result.usage;
usageTracker.addTokens(usage.inputTokens ?? 0, false);
usageTracker.addTokens(usage.outputTokens ?? 0, true);
logger.log("llm_call_completed", {
finishReason,
inputTokens: usage.inputTokens ?? 0,
outputTokens: usage.outputTokens ?? 0,
toolCallCount: toolCalls.length,
});
Log Tool Calls And Approvals
When the stream reports a tool call, log it at the same place you notify the UI:
if (chunk.type === "tool-call") {
const input = "input" in chunk ? chunk.input : {};
toolCalls.push({
toolCallId: chunk.toolCallId,
toolName: chunk.toolName,
args: input as Record<string, unknown>,
});
logger.logToolCall(chunk.toolName, input);
callbacks.onToolCallStart(chunk.toolName, input);
}
When asking for human approval, log whether the tool was approved:
const approved = await callbacks.onToolApproval(tc.toolName, tc.args);
logger.log("approval", {
toolName: tc.toolName,
approved,
});
if (!approved) {
rejected = true;
break;
}
Around executeTool, measure how long the real tool took:
const toolStart = Date.now();
const toolResult = await executeTool(tc.toolName, tc.args);
const durationMs = Date.now() - toolStart;
logger.logToolResult(tc.toolName, toolResult, durationMs);
callbacks.onToolCallEnd(tc.toolName, toolResult);
At the end of the run, log completion:
callbacks.onComplete(fullResponse);
logger.log("agent_run_completed", {
responseLength: fullResponse.length,
messageCount: messages.length,
});
return withoutSystemMessages(messages);
Minimal Test
Run the app:
npm run start
Ask for something that uses either the model or a tool. Then inspect the log:
tail -n 20 .agent/logs/agent.jsonl
You should see events like:
{"timestamp":"...","conversationId":"default","runId":"...","event":"agent_run_started","data":{"model":"...","historyLength":0,"userMessageLength":24}}
{"timestamp":"...","conversationId":"default","runId":"...","event":"llm_call_started","data":{"model":"...","messageCount":2}}
{"timestamp":"...","conversationId":"default","runId":"...","event":"llm_call_completed","data":{"finishReason":"stop","inputTokens":123,"outputTokens":45,"toolCallCount":0}}
{"timestamp":"...","conversationId":"default","runId":"...","event":"agent_run_completed","data":{"responseLength":280,"messageCount":3}}
Privacy Note
This version logs metadata, lengths, tool names, and tool arguments. In a real product, be careful with raw tool arguments because they may contain file paths, secrets, or user content. A stronger production logger would redact sensitive fields before writing them.
Next: Chapter 12: Memory →