What I Learned Building My Own Mini Coding Agent
Coding agents like Claude Code can feel like magic at first. So I wrote a tiny one in TypeScript to peek behind the curtain. It turns out the trick is mostly a while loop and a few well-described tools.
Part of being a good engineer, I think, is being willing to dig one or two layers below the thing we are responsible for. If we work on the frontend, it pays to know roughly how the API we are calling actually serves a request, what the database underneath is doing, and how the app gets shipped to production once we merge. We do not have to be experts at every layer, but we should not treat any of them as a black box either. Black boxes are where bad assumptions hide.
Coding agents Cursor and Claude Code started feeling like a black box to me pretty quickly. These tools can read the codebase, navigate around, edit files, run commands, and somehow stitch all of that into something that feels like working with another engineer. It is genuinely impressive, and the first few times we use one it can feel like there must be something exotic going on under the hood.
Then I came across Mihail Eric's article on writing Claude Code in 200 lines of code. The thesis was the same one that always tempts me when something feels magical: the magic is probably overstated, and it is probably worth seeing for ourselves. So I sat down and wrote a tiny version of my own in TypeScript, mini-coding-agent. What I learned is the kind of thing that is obvious in retrospect and a little freeing once it clicks.
If you'd rather just read the code, it's here:
The Agent Is Just a While Loop
The first thing that stood out is how unceremonious the "agent" part is. There is no clever scheduler, no graph, no hidden state machine. The whole loop fits on one screen:
while (true) {
const response = await client.messages.create({
model,
system: SYSTEM_PROMPT,
tools,
messages,
});
const toolUseBlocks = response.content.filter(
(block) => block.type === 'tool_use'
);
if (toolUseBlocks.length === 0) {
messages.push({ role: 'assistant', content: response.content });
break;
}
// The model asked for one or more tools.
messages.push({ role: 'assistant', content: response.content });
const toolResults = [];
for (const toolUse of toolUseBlocks) {
const result = executeTool(toolUse.name, toolUse.input);
toolResults.push({
type: 'tool_result',
tool_use_id: toolUse.id,
content: JSON.stringify(result),
});
}
// Feed the results back as the next message, then loop.
messages.push({ role: 'user', content: toolResults });
}We send the conversation so far to the model, and the model either talks to us or asks to use a tool. When it asks, we call executeTool to run that tool locally, wrap what it returns in a tool_result, and push that back onto messages as the next turn. We will get to executeTool further down; here it is enough that it runs the requested tool and hands back a result. Then we loop. That is the entire shape of the thing. The "agentic" behavior is an emergent property of doing this over and over until the model is satisfied and stops asking for tools.
The LLM Does Not Run the Tools
This is the part that genuinely reframed how I think about agents. The model never touches the filesystem. When it decides it wants to read a file, what it sends back is, structurally, a request: a JSON-shaped block that names a tool and supplies arguments. Something like this:
{
"type": "tool_use",
"name": "read_file",
"input": { "path": "src/app.ts" }
}That is it. It is our code, the agent we wrote, that decides to actually open the file, read its contents, and feed them back to the model as the next message. The LLM is reasoning about what it would like to happen. We are the ones who let it happen.
Once that lands, a lot of what we associate with "agent capability" stops being about the model and starts being about us. What tools do we expose? How do we describe them? What do we let the model touch, and what do we gate behind a confirmation? The model is powerful, but the agent's behavior is something we shape.
Tools Are Mostly a Description
Registering a tool with the model turned out to be much less ceremony than I expected. It is essentially a name, a sentence or two telling the model when to reach for it, and a JSON Schema describing the inputs:
{
name: "read_file",
description:
"Read the full contents of a file. Use this to inspect existing code or text.",
input_schema: {
type: "object",
properties: {
path: { type: "string", description: "Path to the file to read" },
},
required: ["path"],
},
}That schema is everything the model sees. The actual work happens in a regular function on our side, which is just as plain as we would expect:
function readFile(filePath: string): Record<string, unknown> {
try {
const abs = resolveAbsPath(filePath);
const content = fs.readFileSync(abs, 'utf-8');
return { path: abs, content };
} catch (err: any) {
return { error: `Failed to read file: ${err.message}` };
}
}When a tool_use block comes back from the model with name: "read_file", we look at the input object, pull out path, call readFile(path), and append whatever it returns to the conversation as the next message. No framework, no glue layer. Just a function call.
What stood out is how much weight the description field carries. The model is using it to decide whether this tool is the right one for the situation. Vague descriptions lead to the model picking the wrong tool, or skipping a tool we expected it to use. Treating each description as a small piece of prompt engineering, instead of an afterthought, made the agent noticeably more reliable.
Registering the Tools With the SDK
A schema on its own does nothing. To make the model aware of a tool, we collect every schema into an array and hand it to the SDK on each request. The Anthropic SDK gives us a type for that array, Anthropic.Messages.Tool[], so a malformed schema fails at compile time instead of confusing the model at runtime:
import Anthropic from "@anthropic-ai/sdk";
export const tools: Anthropic.Messages.Tool[] = [
{
name: "read_file",
description: "Read the full contents of a file. ...",
input_schema: {
/* the same object we saw above */
},
},
// list_files, edit_file, delete_file, run_gh_command, bash
];That tools array is the same one the loop hands to client.messages.create through the tools field back at the start. The model reads it on every turn to decide what it is allowed to ask for. That single field is the whole registration step. There is no setup call to make and no plugin to install.
One Function Routes Every Tool
So far we have a schema the model reads and a function that does the work. The piece in between is a small router that maps a tool name to the function that handles it. In mini-coding-agent that is a single executeTool, a switch over the name the model sent back:
export function executeTool(
name: string,
input: Record<string, any>
): Record<string, unknown> {
switch (name) {
case "read_file":
return readFile(input.path);
case "list_files":
return listFiles(input.path);
case "edit_file":
return editFile(input.path, input.old_str, input.new_str);
case "delete_file":
return deleteFile(input.path);
case "run_gh_command":
return runGhCommand(input.args);
case "bash":
return runBashCommand(input.command);
default:
return { error: `Unknown tool: ${name}` };
}
}This is the seam between the loop and the tools. The loop knows nothing about reading files or running commands. It pulls the name and input off each tool_use block, calls executeTool, and appends whatever comes back as the next message. Adding a capability is two small edits: register the schema in the tools list, and add a case here that forwards the input to a function.
Three Tools Is Most of What We Need
For something genuinely useful, we really only need three tools:
read_fileso the model can see our codelist_filesso it can navigate the projectedit_fileso it can create and change files
Production agents like Claude Code add more, of course: grep, bash, web search, and so on. They smooth out the experience and unlock workflows that read, list, and edit cannot reach on their own. But the leap from "no tools" to "read, list, edit" is the big one. Everything after that is incremental.
Closing Thoughts
The point of building the toy was never to replace the real thing. The real ones are better, and they should be. The point was to stop thinking of them as a black box. Once we have seen that the loop is a loop, that the model is asking and our code is doing, it gets a lot easier to reason about why an agent did or did not do what we wanted, and how to nudge it. Less "I wonder why it ignored that file" and more "right, the description on that tool was probably too thin."
That is the part I keep coming back to. Understanding things one layer down does not always change what we build. But it almost always changes how confidently we use what other people built.