writing

What I Learned Building My Own Mini Coding Agent

Coding agents like Claude Code can feel like magic at first. So I wrote a tiny one in TypeScript to peek behind the curtain. It turns out the trick is mostly a while loop and a few well-described tools.

Michael Movsesov
Michael Movsesov

Part of being a good engineer, I think, is being willing to dig one or two layers below the thing we are responsible for. If we work on the frontend, it pays to know roughly how the API we are calling actually serves a request, what the database underneath is doing, and how the app gets shipped to production once we merge. We do not have to be experts at every layer, but we should not treat any of them as a black box either. Black boxes are where bad assumptions hide.

Coding agents Cursor and Claude Code started feeling like a black box to me pretty quickly. These tools can read the codebase, navigate around, edit files, run commands, and somehow stitch all of that into something that feels like working with another engineer. It is genuinely impressive, and the first few times we use one it can feel like there must be something exotic going on under the hood.

Then I came across Mihail Eric's article on writing Claude Code in 200 lines of code. The thesis was the same one that always tempts me when something feels magical: the magic is probably overstated, and it is probably worth seeing for ourselves. So I sat down and wrote a tiny version of my own in TypeScript, mini-coding-agent. What I learned is the kind of thing that is obvious in retrospect and a little freeing once it clicks.

If you'd rather just read the code, it's here:

repositorymichaelmov/mini-coding-agent
michaelmov/mini-coding-agent
TypeScript00

The Agent Is Just a While Loop

The first thing that stood out is how unceremonious the "agent" part is. There is no clever scheduler, no graph, no hidden state machine. The whole loop fits on one screen:

while (true) {
  const response = await client.messages.create({
    model,
    system: SYSTEM_PROMPT,
    tools,
    messages,
  });
 
  const toolUseBlocks = response.content.filter(
    (block) => block.type === 'tool_use'
  );
 
  if (toolUseBlocks.length === 0) {
    messages.push({ role: 'assistant', content: response.content });
    break;
  }
 
  // run the tools, append the results to messages, loop again
}

We send the conversation so far to the model, the model either talks to us or asks to use a tool, and if it asked for a tool we run it locally and feed the result back as the next message. Then we loop. That is the entire shape of the thing. The "agentic" behavior is an emergent property of doing this over and over until the model is satisfied and stops asking for tools.

The LLM Does Not Run the Tools

This is the part that genuinely reframed how I think about agents. The model never touches the filesystem. When it decides it wants to read a file, what it sends back is, structurally, a request: a JSON-shaped block that names a tool and supplies arguments. Something like this:

{
  "type": "tool_use",
  "name": "read_file",
  "input": { "path": "src/app.ts" }
}

That is it. It is our code, the agent we wrote, that decides to actually open the file, read its contents, and feed them back to the model as the next message. The LLM is reasoning about what it would like to happen. We are the ones who let it happen.

Once that lands, a lot of what we associate with "agent capability" stops being about the model and starts being about us. What tools do we expose? How do we describe them? What do we let the model touch, and what do we gate behind a confirmation? The model is powerful, but the agent's behavior is something we shape.

Tools Are Mostly a Description

Registering a tool with the model turned out to be much less ceremony than I expected. It is essentially a name, a sentence or two telling the model when to reach for it, and a JSON Schema describing the inputs:

{
  name: "read_file",
  description:
    "Read the full contents of a file. Use this to inspect existing code or text.",
  input_schema: {
    type: "object",
    properties: {
      path: { type: "string", description: "Path to the file to read" },
    },
    required: ["path"],
  },
}

That schema is everything the model sees. The actual work happens in a regular function on our side, which is just as plain as we would expect:

function readFile(filePath: string): Record<string, unknown> {
  try {
    const abs = resolveAbsPath(filePath);
    const content = fs.readFileSync(abs, 'utf-8');
    return { path: abs, content };
  } catch (err: any) {
    return { error: `Failed to read file: ${err.message}` };
  }
}

When a tool_use block comes back from the model with name: "read_file", we look at the input object, pull out path, call readFile(path), and append whatever it returns to the conversation as the next message. No framework, no glue layer. Just a function call.

What stood out is how much weight the description field carries. The model is using it to decide whether this tool is the right one for the situation. Vague descriptions lead to the model picking the wrong tool, or skipping a tool we expected it to use. Treating each description as a small piece of prompt engineering, instead of an afterthought, made the agent noticeably more reliable.

Three Tools Is Most of What We Need

For something genuinely useful, we really only need three tools:

  • read_file so the model can see our code
  • list_files so it can navigate the project
  • edit_file so it can create and change files

Production agents like Claude Code add more, of course: grep, bash, web search, and so on. They smooth out the experience and unlock workflows that read, list, and edit cannot reach on their own. But the leap from "no tools" to "read, list, edit" is the big one. Everything after that is incremental.

Closing Thoughts

The point of building the toy was never to replace the real thing. The real ones are better, and they should be. The point was to stop thinking of them as a black box. Once we have seen that the loop is a loop, that the model is asking and our code is doing, it gets a lot easier to reason about why an agent did or did not do what we wanted, and how to nudge it. Less "I wonder why it ignored that file" and more "right, the description on that tool was probably too thin."

That is the part I keep coming back to. Understanding things one layer down does not always change what we build. But it almost always changes how confidently we use what other people built.