← Back to posts

Every AI Coding Tool: A Solo Developer's 14-Month Odyssey (Part 2)

· 11 min read

The Moment I Stopped Chasing Tools

Part 1 of this series covered the frantic period - the monthly tool rotation, the eternal migration of dotfiles, the dawning realisation that swapping editors every six weeks was not, in fact, a productivity strategy. If you haven’t read it, the short version is: I tried everything, nothing stuck, and by January 2026 I was running OpenCode with a hand-maintained Backlog.md file as my task management system.

That was rock bottom. Not because OpenCode was bad, but because I finally understood that the tool wasn’t the problem. I was.

Every time a new AI coding assistant launched, I’d spend a week migrating my workflow, lose a week of momentum, gain maybe a 10% improvement in one area, lose 30% in another, and start eyeing the next shiny thing. The problem wasn’t that these tools were bad. The problem was that I was treating them like products to consume rather than collaborators to onboard.

January: The Workspace Gets Its Own Repo

The first real shift happened in January 2026. I extracted my workspace configuration into its own repository. Not the application code - that lived in its module repos. The workspace itself: the directory structure, the task management conventions, the workflow documentation.

This sounds boring. It was transformative.

Before this, my “system” was whatever configuration happened to live inside my editor’s settings directory. When I switched tools, the system vanished. Extracting it meant the system was portable. The tool could change; the workflow persisted.

I created an .opencode directory with skills - reusable prompt patterns for common workflows. TDD. Brainstorming. Code review. These weren’t tool-specific features. They were text files describing how I wanted to work, written in a way that any sufficiently capable AI model could follow.

The skills were crude. A markdown file called brainstorming.md that said things like “explore three different approaches before recommending one” and “present tradeoffs as a table.” But they worked. For the first time, switching tools didn’t mean retraining.

In March 2026, I installed Claude Code. I also installed Tessl, a dependency documentation tool that generates “tiles” - structured reference docs for your actual dependency versions. No more hallucinated API signatures for Mantine components or Zod schemas. The AI reads from tiles that describe the exact version you have installed.

But the real change wasn’t the tool. It was what I built around it.

Claude Code’s CLAUDE.md file is simple: it points to AGENTS.md. That’s it. One line. The actual system lives in a tool-agnostic file that any AI agent can read. If Claude Code vanishes tomorrow, the AGENTS.md file walks out the door with me.

The same week, I migrated the entire development environment into Nix. A flake.nix with devenv that reproduces every tool, every MCP server, every shell configuration. Any machine, same setup. The Nix devshell doesn’t just install dependencies - it configures the AI tooling. MCP servers for Playwright (browser testing), ClickUp (task management), email, calendar. All declared, all reproducible.

This is the stack that stuck. Not because Claude Code is perfect, but because the system around it is tool-agnostic enough to survive the next migration, whenever that comes.

The AGENTS.md That Actually Works

My AGENTS.md file is around 480 lines. That sounds absurd until you read it and realise it’s not instructions for a tool - it’s a collaboration protocol between two engineers who happen to be different species.

It opens with this:

You are a collaborating engineer, not an assistant. Act accordingly.

That sentence does more work than every tool configuration I’ve ever written combined. It sets the relationship. Not “do what I say.” Not “help me with this.” Instead: think through problems together, challenge weak reasoning, surface assumptions, stop when stuck.

The file has sections that read like an engineering team’s operating manual:

Coding style. No new comments. DRY - search the codebase before writing any helper. Prefer elegance. YAGNI. Surgical changes: every modified line should trace directly to the task.

Goal-driven execution. Transform vague requests into verifiable goals. “Fix the bug” becomes “write a test that reproduces it, then make it pass.” State a step-then-verify plan for multi-step work. Verify before claiming done - run the command, check the output, don’t say “this should work.”

Research protocol. “Two strikes and you research.” If the first fix doesn’t work, the second attempt must be informed by documentation, not another guess. Cite your sources. Don’t confuse familiarity with knowledge.

CI safeguards. Ship CI changes separately from feature code. One change per CI cycle when debugging. Validate assumptions about the runner environment.

These aren’t AI-specific rules. They’re engineering discipline I should have been applying to my own work all along. Writing them down for an AI collaborator forced me to articulate standards I’d been keeping vague and aspirational.

Subagent-Driven Development

Here’s where it gets genuinely novel. The AGENTS.md mandates what I call subagent-driven development, and it changed how I think about the relationship between planning and execution.

The model is simple: there’s a main agent (the orchestrator) and subagents (the implementors). The orchestrator plans, dispatches work, reviews results, and manages the git workflow. Subagents write code, run tests, fix bugs - they do the implementation work in isolation and report back.

The rules are strict:

  • All implementation tasks must be delegated to a subagent
  • Subagents must not spawn other subagents - only the orchestrator dispatches
  • Each task gets its own fresh subagent
  • Subagents stage changes but do not commit - the orchestrator reviews the diff first

Why? Because AI coding tools are wildly capable at writing code and embarrassingly bad at knowing when to stop. A subagent will happily “improve” adjacent code, refactor things that aren’t broken, add helpful comments you didn’t ask for, and commit everything in one giant bundle. Constraining them to “implement this specific thing in this specific worktree and report back” produces dramatically better results than “here’s a feature, go build it.”

The orchestrator’s job is equally defined:

  • Review the diff (don’t trust the report alone)
  • Run quality gates (lint, typecheck, tests)
  • Validate the pattern before scaling - if dispatching five subagents for batch work, validate one or two results first
  • Check for environment mismatches

That last one bit me repeatedly. Subagents default to writing jsdom-style assertions when the code actually runs in a real browser. They assume GitHub-hosted runners when we use self-hosted ARM64 machines. Every assumption needs to be explicit in the dispatch prompt.

Git Worktrees as Isolation Boundaries

The subagent model requires isolation. You can’t have three subagents working in the same git checkout without them stomping on each other’s changes.

Git worktrees solve this perfectly. Each task gets its own worktree under a worktrees/ directory at the workspace root. Each worktree has its own branch. Subagents work entirely within their assigned worktree. They can’t accidentally modify main. They can’t see each other’s work-in-progress.

~/workspace/
  modules/
    platform/          # stays on main, always clean
  worktrees/
    ENG-42-user-search/   # feature branch worktree
    ENG-57-cors-fix/      # another feature branch worktree

The worktree lifecycle is managed by a skill - a markdown file that the AI reads before starting any development work. It enforces the workflow: clarify the work type (task or hotfix), pull latest on main, create the worktree, verify the branch isn’t in detached HEAD state (a gotcha that cost me hours - subagents sometimes detach from their branch and commit to nowhere).

The combination of worktrees + subagents means I can have multiple features in flight simultaneously without context pollution. The orchestrator dispatches work to isolated environments, reviews results, and promotes changes to PRs. It’s not unlike managing a small team, except the team members are stateless and need very explicit instructions.

Skills: Reusable Prompt Patterns

The .claude/skills/ directory contains twelve skills at last count. Each one is a markdown file that describes a specific workflow: how to use git worktrees, how to run E2E tests, how to brainstorm features, how to triage Sentry errors, how to fix a broken deploy pipeline.

These aren’t plugins or code. They’re structured text. A skill called staging-e2e describes how to authenticate as a test user, launch a Playwright browser, exercise the application, and file ClickUp tasks for any issues found. The AI reads it, follows the protocol, and produces consistent results every time.

The skill abstraction was the key insight that survived every tool migration. When I moved from OpenCode to Claude Code, the skills came with me. They’re just markdown. Any AI model that can read a file can follow them.

Separately, I use Tessl tiles for dependency documentation and a library called Superpowers for cross-project skills - TDD, systematic debugging, code review patterns. The Superpowers skills are more generic: “when encountering a bug, reproduce it first, then bisect to find the root cause.” The workspace skills are specific: “when running E2E tests in this project, fetch credentials from SSM, start the dev server on port 3000, and use Playwright MCP for browser interaction.”

The ClickUp Integration

The most surprising part of the system is the task management integration. The AI doesn’t just write code - it reads tasks from ClickUp, follows acceptance criteria checklists, ticks items as they’re satisfied, and only marks a task complete when every criterion is resolved.

The workflow looks like this:

  1. I create a plan doc in ClickUp (brainstorming, design spec, implementation plan)
  2. I break the plan into tasks with acceptance criteria checklists
  3. The orchestrator reads the task, dispatches subagents for implementation
  4. Subagents implement, the orchestrator ticks checklist items as they’re verified
  5. When all items are resolved, the task is marked complete

This closes the loop between planning and execution in a way that no amount of inline comments or PR descriptions ever achieved. The task is the source of truth. The AI reads it before starting and updates it as work progresses.

What I Actually Learned

After fourteen months of tool-hopping and system-building, here’s what I know:

The tool matters less than you think. Claude Code is excellent, but the reason it works for me is the system I built around it. The AGENTS.md, the skills, the worktree workflow, the ClickUp integration. If I had to switch to a different AI coding tool tomorrow, 80% of my productivity would transfer because it lives in tool-agnostic text files.

Write it down for the AI and you write it down for yourself. Every rule in AGENTS.md is something I should have been doing before AI tools existed. “Verify before claiming done.” “Two strikes and you research.” “Ship CI changes separately.” Articulating these for an AI collaborator forced me to be honest about my own discipline gaps.

Isolation is everything. The single biggest improvement to AI-assisted development was giving each task its own worktree. No context pollution. No accidental changes to the wrong branch. No “I was just trying to help” refactors of unrelated code. Worktrees impose boundaries that keep both human and AI focused.

Subagents need explicit constraints, not autonomy. The temptation is to give the AI more freedom. “Here’s the feature, figure it out.” This produces spectacular demos and terrible pull requests. Constraining subagents to specific tasks in specific worktrees with specific acceptance criteria produces boring, reliable, reviewable work. Which is what you actually want.

Reproducibility is the foundation. The Nix devshell made everything else possible. Without it, every new machine was a half-day of “install this, configure that, why is this MCP server not connecting.” With it, nix develop and you’re working. The AI tooling is as reproducible as the application tooling.

The Current State

As of today, this is the stack: Claude Code as the interface, AGENTS.md as the collaboration protocol, twelve workspace skills, Tessl tiles for dependency docs, Superpowers for cross-project patterns, MCP servers for browser testing and task management, git worktrees for isolation, and a Nix devshell that reproduces all of it.

We shipped to production two days ago. The platform serves real users. The commit history has Co-Authored-By: Claude headers on most commits. The AI reads tasks, writes code, runs tests, and updates task status. I plan, review, and make judgment calls.

It took fourteen months and a half-dozen tool migrations to arrive at something this simple. The irony is that the final system barely depends on the specific tool. It’s a set of text files that describe how to work together. The AI reads them. We get things done.

If there’s a Part 3 to this series, I suspect it won’t be about another tool migration. It’ll be about what happens when the system evolves - when the AGENTS.md gets long enough to need its own refactoring, when the skills library needs governance, when the subagent model hits scaling limits. But that’s a future problem.

Right now, for the first time in over a year, I’m not looking at what’s next. I’m just building.