Every AI Coding Tool: A Solo Developer's 14-Month Odyssey (Part 2)

2026-05-06 · 11 min read

The Moment I Stopped Chasing Tools

Part 1 of this series covered the frantic period - the monthly tool rotation, the eternal migration of dotfiles, the dawning realisation that swapping editors every six weeks was not, in fact, a productivity strategy. If you haven’t read it, the short version is: I tried everything, nothing stuck, and by January 2026 I was running OpenCode with a hand-maintained Backlog.md file as my task management system.

That was rock bottom. Not because OpenCode was bad, but because I finally understood that the tool wasn’t the problem. I was.

Every time a new AI coding assistant launched, I’d spend a week migrating my workflow, lose a week of momentum, gain maybe a 10% improvement in one area, lose 30% in another, and start eyeing the next shiny thing. The problem wasn’t that these tools were bad. The problem was that I was treating them like products to consume rather than collaborators to onboard.

January: The Workspace Gets Its Own Repo

The first real shift happened in January 2026. I extracted my workspace configuration into its own repository. Not the application code - that lived in its module repos. The workspace itself: the directory structure, the task management conventions, the workflow documentation.

This sounds boring. It was transformative.

Before this, my “system” was whatever configuration happened to live inside my editor’s settings directory. When I switched tools, the system vanished. Extracting it meant the system was portable. The tool could change; the workflow persisted.

I created an .opencode directory with skills - reusable prompt patterns for common workflows. TDD. Brainstorming. Code review. These weren’t tool-specific features. They were text files describing how I wanted to work, written in a way that any sufficiently capable AI model could follow.

The skills were crude. A markdown file called brainstorming.md that said things like “explore three different approaches before recommending one” and “present tradeoffs as a table.” But they worked. For the first time, switching tools didn’t mean retraining.

March: Claude Code and the End of the Search

In March 2026, I installed Claude Code. I also installed Tessl, a dependency documentation tool that generates “tiles” - structured reference docs for your actual dependency versions. No more hallucinated API signatures for Mantine components or Zod schemas. The AI reads from tiles that describe the exact version you have installed.

But the real change wasn’t the tool. It was what I built around it.

Claude Code’s CLAUDE.md file is simple: it points to AGENTS.md. That’s it. One line. The actual system lives in a tool-agnostic file that any AI agent can read. If Claude Code vanishes tomorrow, the AGENTS.md file walks out the door with me.

The same week, I migrated the entire development environment into Nix. A flake.nix with devenv that reproduces every tool, every MCP server, every shell configuration. Any machine, same setup. The Nix devshell doesn’t just install dependencies - it configures the AI tooling. MCP servers for Playwright (browser testing), ClickUp (task management), email, calendar. All declared, all reproducible.

This is the stack that stuck. Not because Claude Code is perfect, but because the system around it is tool-agnostic enough to survive the next migration, whenever that comes.

The AGENTS.md That Actually Works

My AGENTS.md file is around 480 lines. That sounds absurd until you read it and realise it’s not instructions for a tool - it’s a collaboration protocol between two engineers who happen to be different species.

It opens with this:

You are a collaborating engineer, not an assistant. Act accordingly.

That sentence does more work than every tool configuration I’ve ever written combined. It sets the relationship. Not “do what I say.” Not “help me with this.” Instead: think through problems together, challenge weak reasoning, surface assumptions, stop when stuck.

The file has sections that read like an engineering team’s operating manual:

Coding style. No new comments. DRY - search the codebase before writing any helper. Prefer elegance. YAGNI. Surgical changes: every modified line should trace directly to the task.

Goal-driven execution. Transform vague requests into verifiable goals. “Fix the bug” becomes “write a test that reproduces it, then make it pass.” State a step-then-verify plan for multi-step work. Verify before claiming done - run the command, check the output, don’t say “this should work.”

Research protocol. “Two strikes and you research.” If the first fix doesn’t work, the second attempt must be informed by documentation, not another guess. Cite your sources. Don’t confuse familiarity with knowledge.

CI safeguards. Ship CI changes separately from feature code. One change per CI cycle when debugging. Validate assumptions about the runner environment.

These aren’t AI-specific rules. They’re engineering discipline I should have been applying to my own work all along. Writing them down for an AI collaborator forced me to articulate standards I’d been keeping vague and aspirational.

Subagent-Driven Development

Here’s where it gets genuinely novel. The AGENTS.md mandates what I call subagent-driven development, and it changed how I think about the relationship between planning and execution.

The model is simple: there’s a main agent (the orchestrator) and subagents (the implementors). The orchestrator plans, dispatches work, reviews results, and manages the git workflow. Subagents write code, run tests, fix bugs - they do the implementation work in isolation and report back.

The rules are strict:

All implementation tasks must be delegated to a subagent
Subagents must not spawn other subagents - only the orchestrator dispatches
Each task gets its own fresh subagent
Subagents stage changes but do not commit - the orchestrator reviews the diff first

Why? Because AI coding tools are wildly capable at writing code and embarrassingly bad at knowing when to stop. A subagent will happily “improve” adjacent code, refactor things that aren’t broken, add helpful comments you didn’t ask for, and commit everything in one giant bundle. Constraining them to “implement this specific thing in this specific worktree and report back” produces dramatically better results than “here’s a feature, go build it.”

The orchestrator’s job is equally defined:

Review the diff (don’t trust the report alone)
Run quality gates (lint, typecheck, tests)
Validate the pattern before scaling - if dispatching five subagents for batch work, validate one or two results first
Check for environment mismatches

That last one bit me repeatedly. Subagents default to writing jsdom-style assertions when the code actually runs in a real browser. They assume GitHub-hosted runners when we use self-hosted ARM64 machines. Every assumption needs to be explicit in the dispatch prompt.

Git Worktrees as Isolation Boundaries

The subagent model requires isolation. You can’t have three subagents working in the same git checkout without them stomping on each other’s changes.

Git worktrees solve this perfectly. Each task gets its own worktree under a worktrees/ directory at the workspace root. Each worktree has its own branch. Subagents work entirely within their assigned worktree. They can’t accidentally modify main. They can’t see each other’s work-in-progress.

~/workspace/
  modules/
    platform/          # stays on main, always clean
  worktrees/
    ENG-42-user-search/   # feature branch worktree
    ENG-57-cors-fix/      # another feature branch worktree

The worktree lifecycle is managed by a skill - a markdown file that the AI reads before starting any development work. It enforces the workflow: clarify the work type (task or hotfix), pull latest on main, create the worktree, verify the branch isn’t in detached HEAD state (a gotcha that cost me hours - subagents sometimes detach from their branch and commit to nowhere).

The combination of worktrees + subagents means I can have multiple features in flight simultaneously without context pollution. The orchestrator dispatches work to isolated environments, reviews results, and promotes changes to PRs. It’s not unlike managing a small team, except the team members are stateless and need very explicit instructions.

Skills: Reusable Prompt Patterns

The .claude/skills/ directory contains twelve skills at last count. Each one is a markdown file that describes a specific workflow: how to use git worktrees, how to run E2E tests, how to brainstorm features, how to triage Sentry errors, how to fix a broken deploy pipeline.

These aren’t plugins or code. They’re structured text. A skill called staging-e2e describes how to authenticate as a test user, launch a Playwright browser, exercise the application, and file ClickUp tasks for any issues found. The AI reads it, follows the protocol, and produces consistent results every time.

The skill abstraction was the key insight that survived every tool migration. When I moved from OpenCode to Claude Code, the skills came with me. They’re just markdown. Any AI model that can read a file can follow them.

Separately, I use Tessl tiles for dependency documentation and a library called Superpowers for cross-project skills - TDD, systematic debugging, code review patterns. The Superpowers skills are more generic: “when encountering a bug, reproduce it first, then bisect to find the root cause.” The workspace skills are specific: “when running E2E tests in this project, fetch credentials from SSM, start the dev server on port 3000, and use Playwright MCP for browser interaction.”

The ClickUp Integration

The most surprising part of the system is the task management integration. The AI doesn’t just write code - it reads tasks from ClickUp, follows acceptance criteria checklists, ticks items as they’re satisfied, and only marks a task complete when every criterion is resolved.

The workflow looks like this:

I create a plan doc in ClickUp (brainstorming, design spec, implementation plan)
I break the plan into tasks with acceptance criteria checklists
The orchestrator reads the task, dispatches subagents for implementation
Subagents implement, the orchestrator ticks checklist items as they’re verified
When all items are resolved, the task is marked complete

This closes the loop between planning and execution in a way that no amount of inline comments or PR descriptions ever achieved. The task is the source of truth. The AI reads it before starting and updates it as work progresses.

What I Actually Learned

After fourteen months of tool-hopping and system-building, here’s what I know:

The tool matters less than you think. Claude Code is excellent, but the reason it works for me is the system I built around it. The AGENTS.md, the skills, the worktree workflow, the ClickUp integration. If I had to switch to a different AI coding tool tomorrow, 80% of my productivity would transfer because it lives in tool-agnostic text files.

Write it down for the AI and you write it down for yourself. Every rule in AGENTS.md is something I should have been doing before AI tools existed. “Verify before claiming done.” “Two strikes and you research.” “Ship CI changes separately.” Articulating these for an AI collaborator forced me to be honest about my own discipline gaps.

Isolation is everything. The single biggest improvement to AI-assisted development was giving each task its own worktree. No context pollution. No accidental changes to the wrong branch. No “I was just trying to help” refactors of unrelated code. Worktrees impose boundaries that keep both human and AI focused.

Subagents need explicit constraints, not autonomy. The temptation is to give the AI more freedom. “Here’s the feature, figure it out.” This produces spectacular demos and terrible pull requests. Constraining subagents to specific tasks in specific worktrees with specific acceptance criteria produces boring, reliable, reviewable work. Which is what you actually want.

Reproducibility is the foundation. The Nix devshell made everything else possible. Without it, every new machine was a half-day of “install this, configure that, why is this MCP server not connecting.” With it, nix develop and you’re working. The AI tooling is as reproducible as the application tooling.

The Current State

As of today, this is the stack: Claude Code as the interface, AGENTS.md as the collaboration protocol, twelve workspace skills, Tessl tiles for dependency docs, Superpowers for cross-project patterns, MCP servers for browser testing and task management, git worktrees for isolation, and a Nix devshell that reproduces all of it.

We shipped to production two days ago. The platform serves real users. The commit history has Co-Authored-By: Claude headers on most commits. The AI reads tasks, writes code, runs tests, and updates task status. I plan, review, and make judgment calls.

It took fourteen months and a half-dozen tool migrations to arrive at something this simple. The irony is that the final system barely depends on the specific tool. It’s a set of text files that describe how to work together. The AI reads them. We get things done.

If there’s a Part 3 to this series, I suspect it won’t be about another tool migration. It’ll be about what happens when the system evolves - when the AGENTS.md gets long enough to need its own refactoring, when the skills library needs governance, when the subagent model hits scaling limits. But that’s a future problem.

Right now, for the first time in over a year, I’m not looking at what’s next. I’m just building.

Every AI Coding Tool I Tried (And Why I Kept Ditching Them): Part 2

2026-05-06 · 8 min read

The Tool Wasn’t the Problem

Part 1 of this series covered six months of tool-hopping: Cursor, Kiro, OpenSpec, OpenCode, the BMAD Method. Every few weeks, a new arrival full of promise. Every few weeks, a quiet removal.

By January 2026, I was running a hand-maintained backlog document as my task system. That was the low point. Not because the setup was terrible — it was actually pretty functional — but because I’d finally understood something I’d been avoiding: I was the problem.

Every time a new AI coding tool launched, I’d spend a week migrating my workflow, lose momentum, gain a 10% improvement in one thing, lose 30% in another, and start eyeing the next release. I was treating AI tools like products to consume. What I needed was to treat them like collaborators to onboard.

The First Real Shift: Extract the System

The first meaningful change happened in January 2026. I pulled my workspace configuration out of the project repository and into its own separate place. Not the application code — that stayed where it was. The system itself: the conventions, the workflow documentation, the prompt patterns I’d developed.

This sounds dull. It was transformative.

Before this, my “system” was whatever happened to live in my editor’s settings folder. When I switched tools, the system disappeared with it. Making the system portable meant it could survive any future migration. The tool could change; the way of working wouldn’t.

March: Claude Code and the End of the Search

In March 2026, I installed Claude Code — an AI assistant that runs in your terminal and can read files, run commands, and work through tasks autonomously.

The actual switch wasn’t dramatic. What was different this time was what I built around it.

Claude Code reads a file called CLAUDE.md when it starts. Mine contains one line: it points to AGENTS.md. That’s it. The entire collaboration protocol lives in a tool-agnostic document that any AI assistant can read. If Claude Code disappears tomorrow, the document walks out the door with me.

Around the same time, I also moved the entire development environment into a declarative configuration system — one file that, when you run it on any machine, installs every tool, every helper service, every configuration needed to work on the project. Previously, setting up a new machine was a half-day of “install this, configure that, why isn’t this connecting.” Now it’s one command.

The Document That Actually Works

My AGENTS.md file is around 480 lines. That sounds excessive until you understand what it is: not instructions for a tool, but a collaboration protocol between two engineers who happen to be different species.

It opens with:

You are a collaborating engineer, not an assistant. Act accordingly.

That sentence does more work than every tool configuration I’ve ever written. It sets the relationship: think through problems together, challenge weak reasoning, surface assumptions, stop when stuck.

The rest of the document reads like an engineering team’s operating manual:

How we write code. No unnecessary comments. Don’t repeat yourself — search the codebase before writing any helper function that might already exist. Every change should trace directly to the task at hand.

How we define done. Transform vague requests into verifiable goals. “Fix the bug” becomes “write a test that reproduces it, then make it pass.” Don’t say “this should work” — run the command, check the output, then say “this works.”

How we handle being stuck. If the first fix doesn’t work and the second attempt would just be another guess, stop and do proper research first. Read the documentation. Cite sources.

These aren’t rules I invented for AI. They’re engineering discipline I should have been applying to my own work all along. Writing them down for a collaborator forced me to be honest about the gaps in my own habits.

Dividing the Work

Here’s the piece of the system that surprised me most by how well it works.

When working on a task, there are two roles: an orchestrator and implementors. The orchestrator plans the work, breaks it into chunks, reviews the results, and handles version control. The implementors do the actual coding — they write the code, run the tests, fix the bugs.

In practice: I plan with the AI, then send subagents (separate AI instances) to implement specific pieces in isolation. The rules are strict. Each subagent gets one clearly defined task. They can’t go off and improve adjacent code they noticed. They work in a separate copy of the codebase so they can’t accidentally touch anything else. They stage their changes but don’t commit — I review the diff first.

Why all the constraints? Because AI coding tools are extremely capable at writing code and extremely bad at knowing when to stop. Left to their own devices, they’ll “help” by refactoring things you didn’t ask them to touch, adding comments you don’t want, and committing everything in one unreviable pile. Giving each subagent a narrow scope with hard boundaries produces boring, predictable, reviewable work — which is exactly what you want.

Isolated Working Copies

The subagent model requires proper isolation. You can’t have multiple AI instances working in the same copy of the codebase without them stepping on each other.

Git worktrees solve this. A worktree is a separate directory that contains a different branch of your codebase, but shares the underlying repository. Each task gets its own worktree, its own branch, and its own subagent. They can’t see each other’s work in progress, can’t accidentally modify the main codebase, and can’t land on a detached branch and start committing to nowhere (which happened to me several times before I made it an explicit rule to check).

The result: I can have multiple features being worked on simultaneously without any of them interfering with each other.

Reusable Prompt Patterns

There’s a folder called .claude/skills/ that contains twelve files at the moment. Each one is a markdown document describing a specific workflow: how to run end-to-end tests, how to investigate an error report, how to set up a new feature branch, how to triage a broken deployment.

These aren’t code. They’re structured text. When I need to do one of these things, the AI reads the relevant file and follows the protocol. Consistent results every time.

When I migrated from OpenCode to Claude Code, every skill came with me. They’re just files.

The Task Management Loop

The part of the system I least expected to work as well as it does: the AI reads tasks from our project management tool, follows the acceptance criteria written there, and ticks items off as they’re satisfied.

The workflow: I write a plan, break it into tasks with specific completion criteria, and then the orchestrator reads those tasks, dispatches subagents to implement them, and marks criteria as satisfied when they’re verified. When everything on the list is checked off, the task is marked complete.

This closes a loop that no amount of inline code comments or pull request descriptions ever managed to close for me. The task is the source of truth from start to finish.

What 14 Months Actually Taught Me

The tool matters less than the system around it. Claude Code is excellent, but the reason things work now is AGENTS.md, the skills library, the worktree workflow, and the task management integration. If I had to switch to a different AI tool tomorrow, 80% of my productivity would transfer because it lives in plain text files that any capable AI can read.

Writing rules for an AI means writing rules for yourself. Every principle in AGENTS.md is something I should have been doing before AI tools existed. Articulating them for a collaborator forced me to be honest about where my own discipline had been vague and aspirational.

Isolation is everything. The single biggest improvement was giving each task its own isolated working environment. No context bleeding between tasks. No accidental changes to the wrong branch. No AI helpfully improving things you didn’t ask it to touch.

Give AI narrow scope, not broad autonomy. The temptation is to say “here’s the feature, go build it.” This produces impressive demos and terrible pull requests. Narrow tasks, explicit constraints, and defined acceptance criteria produce boring, reliable, reviewable work. Which is what you actually want.

Reproducibility is the foundation. Without a declarative environment setup, every new machine was a productivity black hole. With it, you’re working in minutes.

Where Things Stand

As of today: Claude Code as the interface, AGENTS.md as the collaboration protocol, twelve workspace skills, helper tools for dependency documentation, browser testing and task management integrations, isolated worktrees for each task, and a declarative environment that reproduces all of it on any machine.

We shipped to production two days ago. The platform serves real users. Most commits have Co-Authored-By: Claude in them. The AI reads tasks, writes code, runs tests, and updates task status. I plan, review, and make judgment calls.

It took 14 months and too many migrations to arrive at something this simple. The irony is that the final system barely depends on the specific tool. It’s a set of text files describing how to work together. The AI reads them. We get things done.

For the first time in over a year, I’m not looking at what’s next. I’m just building.