Chapter 14: Tooling and Tests

Production agents need tool output limits, safe parallelism, and real integration tests so tool behavior stays reliable beyond mocked evals.

1. Tool Result Size Limits

The Problem

readFile on a 10MB log file returns the entire content. That’s ~2.7 million tokens — far more than any context window. The API call fails or the conversation becomes unusable.

The Fix

Create an agent-level helper for formatting tool output before it goes back into the model:

Edit src/agent/toolResults.ts:

export const MAX_TOOL_RESULT_LENGTH = 50_000; // ~13k tokens

export function truncateResult(
  result: string,
  maxLength: number = MAX_TOOL_RESULT_LENGTH,
): string {
  if (result.length <= maxLength) return result;

  const half = Math.floor(maxLength / 2);
  const truncatedLines = result.slice(half, result.length - half).split("\n").length;

  return (
    result.slice(0, half) +
    `\n\n... [${truncatedLines} lines truncated] ...\n\n` +
    result.slice(result.length - half)
  );
}

This file lives next to run.ts because it is not a tool implementation. It is agent-loop infrastructure for controlling what tool results are allowed back into the conversation.

Apply to every tool result before adding to messages:

Edit src/agent/run.ts:

import { truncateResult } from "./toolResults.ts";

// ...

const rawToolResult = await executeTool(tc.toolName, tc.args);
const toolResult = truncateResult(rawToolResult);

Use toolResult for callbacks.onToolCallEnd(...), conversation history, and anything sent back to the model. Keep rawToolResult only if you need full local logs or debugging output.

This belongs on the real execution path after approval. The model still receives modelTools; only the agent loop calls executable tools and prepares their results for history.

For file tools specifically, add pagination:

Edit src/agent/tools/file.ts:

export const readFile = tool({
  description: "Read file contents. For large files, use offset and limit.",
  inputSchema: z.object({
    path: z.string(),
    offset: z.number().optional().describe("Line number to start from"),
    limit: z.number().optional().describe("Max lines to read").default(200),
  }),
  execute: async ({
    path: filePath,
    offset = 0,
    limit = 200,
  }: {
    path: string;
    offset?: number;
    limit?: number;
  }) => {
    const content = await fs.readFile(filePath, "utf-8");
    const lines = content.split("\n");
    const slice = lines.slice(offset, offset + limit);
    const totalLines = lines.length;

    let result = slice.join("\n");
    if (offset + slice.length < totalLines) {
      result += `\n\n[Showing lines ${offset + 1}-${offset + slice.length} of ${totalLines}. Use offset to read more.]`;
    }
    return result;
  },
});

Minimal Test

Create a large mock Markdown file to check file-tool pagination:

node -e 'let s="# Large Test\n\n"; for (let i=1;i<=250;i++) s += `## Section ${i}\n${"x".repeat(400)}\n\n`; require("fs").writeFileSync("large-test.md", s)'

Call the readFile tool directly:

node --import tsx/esm -e 'const { executeTool } = await import("./src/agent/executeTool.ts"); const result = await executeTool("readFile", { path: "large-test.md", limit: 200 }); console.log(result.split("\n").slice(-2).join("\n"));'

You should see a pagination footer:

[Showing lines 1-200 of 753. Use offset to read more.]

Check the next page:

node --import tsx/esm -e 'const { executeTool } = await import("./src/agent/executeTool.ts"); const result = await executeTool("readFile", { path: "large-test.md", offset: 200, limit: 200 }); console.log(result.split("\n").slice(-2).join("\n"));'

Expected footer:

[Showing lines 201-400 of 753. Use offset to read more.]

This confirms the file tool is slicing results with limit and offset. To test truncateResult specifically, use a tool result that is still larger than MAX_TOOL_RESULT_LENGTH after pagination, or temporarily lower MAX_TOOL_RESULT_LENGTH.

2. Parallel Tool Execution

The Problem

When the LLM requests multiple tool calls in one turn (e.g., read three files), we execute them sequentially. This is unnecessarily slow — file reads are independent.

The Fix

Use one shared helper for executing an approved real tool call, then add a small scheduler around it.

For background on why this shape mirrors larger coding agents, see the Tool Orchestration Reference.

Edit src/agent/run.ts:

const CONCURRENCY_SAFE_TOOLS = new Set(["readFile", "listFiles", "webSearch"]);

function isConcurrencySafe(tc: ToolCallInfo): boolean {
  return CONCURRENCY_SAFE_TOOLS.has(tc.toolName);
}

type ToolBatch = {
  isConcurrencySafe: boolean;
  toolCalls: ToolCallInfo[];
};

function partitionToolCalls(toolCalls: ToolCallInfo[]): ToolBatch[] {
  const batches: ToolBatch[] = [];

  for (const tc of toolCalls) {
    const safe = isConcurrencySafe(tc);
    const last = batches[batches.length - 1];

    if (safe && last?.isConcurrencySafe) {
      last.toolCalls.push(tc);
    } else {
      batches.push({ isConcurrencySafe: safe, toolCalls: [tc] });
    }
  }

  return batches;
}

Then extract the shared execution work into one helper inside runAgent, near the tool loop. This helper should use the executable tool registry, not the schema-only modelTools passed to streamText().

If your logger does not have this event yet, add "tool_execution_started" to the LogEvent union and add this method to src/agent/logger.ts:

logToolExecutionStarted(name: string, args: unknown): void {
  this.log("tool_execution_started", { toolName: name, args });
}

async function executeApprovedToolCall(
  tc: ToolCallInfo,
): Promise<ModelMessage> {
  usageTracker.addToolCall();
  const toolLimitCheck = usageTracker.check();

  if (!toolLimitCheck.ok) {
    throw new Error(toolLimitCheck.reason);
  }

  const toolStart = Date.now();
  logger.logToolExecutionStarted(tc.toolName, tc.args);
  const rawToolResult = await executeTool(tc.toolName, tc.args);
  const toolResult = truncateResult(rawToolResult);
  const durationMs = Date.now() - toolStart;

  logger.logToolResult(tc.toolName, toolResult, durationMs);
  previousToolResults.push(toolResult);
  callbacks.onToolCallEnd(tc.toolName, toolResult);

  const wrappedToolResult = wrapToolResult(tc.toolName, toolResult);

  return {
    role: "tool",
    content: [
      {
        type: "tool-result",
        toolCallId: tc.toolCallId,
        toolName: tc.toolName,
        output: { type: "text", value: wrappedToolResult },
      },
    ],
  };
}

Now replace the old sequential for (const tc of toolCalls) block with batched execution:

let rejected = false;

for (const batch of partitionToolCalls(toolCalls)) {
  const approvedToolCalls: ToolCallInfo[] = [];

  // Keep validation and approval sequential so the user sees one clear decision
  // at a time, even when execution can run in parallel later.
  for (const tc of batch.toolCalls) {
    const validation = validateToolCall(
      tc.toolName,
      tc.args,
      previousToolResults,
    );

    if (!validation.valid) {
      const stopMessage = `\n[Tool blocked: ${validation.reason}]`;
      callbacks.onToken(stopMessage);
      fullResponse += stopMessage;
      rejected = true;
      break;
    }

    const approved = await callbacks.onToolApproval(tc.toolName, tc.args);
    logger.log("approval", { toolName: tc.toolName, approved });

    if (!approved) {
      rejected = true;
      break;
    }

    approvedToolCalls.push(tc);
  }

  if (rejected) break;

  try {
    if (batch.isConcurrencySafe) {
      const toolMessages = await Promise.all(
        approvedToolCalls.map(executeApprovedToolCall),
      );
      messages.push(...toolMessages);
      reportTokenUsage();
    } else {
      for (const tc of approvedToolCalls) {
        const toolMessage = await executeApprovedToolCall(tc);
        messages.push(toolMessage);
        reportTokenUsage();
      }
    }
  } catch (error) {
    const err = error as Error;
    const stopMessage = `\n[Agent stopped: ${err.message}]`;
    callbacks.onToken(stopMessage);
    fullResponse += stopMessage;
    rejected = true;
    break;
  }
}

if (rejected) {
  break;
}

This gives you the production shape used by larger coding agents:

consecutive read-only tools can run together
write/delete/shell tools run alone and in order
every path still uses the same truncation, logging, wrapping, usage tracking, and history updates
permission prompts stay sequential, so the UI does not need to handle multiple approval dialogs at once

If you later auto-approve read-only tools, you can skip onToolApproval for batch.isConcurrencySafe, but keep the shared execution helper.

Minimal Test

Create two small files:

printf "A\n%.0s" {1..500} > parallel-a.md
printf "B\n%.0s" {1..500} > parallel-b.md

Start the app and ask:

Read parallel-a.md and parallel-b.md in one turn.

Approve both readFile calls if prompted. Then check .agent/logs/agent.jsonl.

For a parallel-safe batch, you should see both tool executions start before either one finishes:

tool_execution_started readFile parallel-a.md
tool_execution_started readFile parallel-b.md
tool_result readFile parallel-a.md
tool_result readFile parallel-b.md

That ordering is the useful signal. It means the runtime started the safe reads together instead of waiting for the first result before starting the second.

npm install -D vitest

Add a test script to package.json:

{
  "scripts": {
    "test": "vitest run"
  }
}

Create an integration test file:

Edit tests/file-tools.test.ts:

import { describe, it, expect, afterEach } from "vitest";
import fs from "fs/promises";
import { executeTool } from "../src/agent/executeTool.ts";

describe("file tools (integration)", () => {
  const testDir = ".agent-test";

  afterEach(async () => {
    // Clean up test files
    await fs.rm(testDir, { recursive: true, force: true });
  });

  it("writeFile creates parent directories", async () => {
    const filePath = `${testDir}/deep/nested/file.txt`;
    const result = await executeTool("writeFile", {
      path: filePath,
      content: "hello",
    });

    expect(result).toContain("Successfully wrote");
    const content = await fs.readFile(filePath, "utf-8");
    expect(content).toBe("hello");
  });

  it("readFile returns error for missing file", async () => {
    const result = await executeTool("readFile", {
      path: `${testDir}/missing.txt`,
    });
    expect(result).toContain("File not found");
  });

  it("runCommand captures stderr", async () => {
    const result = await executeTool("runCommand", {
      command: "ls /nonexistent 2>&1",
    });
    expect(result).toContain("No such file");
  });
});

Run it:

npm test

Next: Chapter 15: Agent Planning →

Building Production AI Coding Agents — TypeScript Edition

Chapter 14: Tooling and Tests

1. Tool Result Size Limits

The Problem

The Fix

Minimal Test

2. Parallel Tool Execution

The Problem

The Fix

Minimal Test

3. Real Tool Testing

The Problem

The Fix

Keyboard shortcuts

Building Production AI Coding Agents — TypeScript Edition