Autopilot: Claude as a Self-Directed Intern

Most of the time you spend with an AI coding agent, you are the bottleneck: it does one thing, then stops and waits for you. Autopilot is a skill that takes you out of that seat. The agent picks the work, does it, verifies it, commits it, and picks the next thing, while you review the stream of commits on your own schedule. Here is how it works, why it is safe to leave running, and why it does not just bury you in busywork.

S

Sascha Becker

Author

8 min read

Autopilot: Claude as a Self-Directed Intern

Most of the time you spend with an AI coding agent, you are the bottleneck. You hand it a task, it does the task, then it stops and waits. What next? You answer. It does the next thing and stops again. The agent is fast. The loop around it is slow, because you sit at the top of every turn of it.

Autopilot is a skill that takes you out of that seat. You hand off the project instead of the task, and the agent runs its own loop: pick the work, do it, prove it works, commit it, pick the next thing. You review the stream of commits when it suits you instead of approving each step.

That line from the skill is the whole design problem in one sentence. An agent that never stops is easy. An agent that never stops and is still worth leaving alone is the hard part, and everything below is how it gets there.

The loop

Autopilot puts the agent in a mode where it works without waiting for direction. Each iteration is the same seven steps:

Survey. Read the README, the project instructions, recent commits, the tests, the open TODO and FIXME markers. Run the build and tests to learn the baseline. If there is a UI, run it and look at the real screens. You cannot improve what you have not seen.
Pick one improvement. Generate candidates first, then rank them. One concern per iteration.
Scope it small enough to finish and review in one sitting. If it cannot be, take the smallest valuable slice and leave the rest as a note.
Do it in the idiom of the surrounding code, matching the project's conventions rather than your own preferences.
Verify. Run the tests, the build, the linter. For anything user-facing, run the app and look at the rendered result, including its loading, empty, and error states. A green build never tells you a screen looks right.
Commit on its own, with a message that says what changed and why. One iteration, one reviewable commit.
Loop. Pick the next thing. Do not ask permission to continue.

The seventh step is the one that matters and the one a model resists hardest. Stopping to ask feels safe and polite. It is also the single behavior that makes autonomy worthless. Most of what the skill does is keep the agent moving through step seven without letting it run off a cliff.

Why it is safe to leave running

Two rules make unattended looping sane:

Every iteration leaves the project better than it found it and is independently reviewable. Small, focused, committed on its own. You are never handed one giant diff to untangle.
The agent never crosses a dangerous line silently. When it hits one, it writes the decision down and moves to the next safe thing. A blocked avenue never stops the loop.

The second rule rests on a hard boundary between work the agent does freely and work it has to flag first.

Do freely (additive, reversible, local)	Flag first, then keep going (never do silently)
Fix bugs, with a test that proves the fix	Anything destructive: deleting, migrations, history rewrites
Add tests, tighten error handling and validation	Anything outward-facing: publishing, deploying, pushing shared branches
Improve docs, comments, dev tooling	Public API or contract changes, major dependency bumps
Refactor while preserving behavior	User-facing redesigns that rest on a judgment call
Objective UI fixes: labels, focus order, contrast	Anything touching secrets, auth, payments, production data

When the agent hits a flag-first item, it records a short proposal (what it is, why, and the risk) and keeps going on safe work. You come back to a list of decisions waiting for you, never to a surprise.

Why it is not just busywork

The real risk with an eager agent is not danger. It is noise: a hundred commits that technically change something and improve nothing. Most of autopilot's rules exist to fight that.

Stabilize before you build. A broken build, a failing or flaky test, or a live bug preempts everything else. The agent is not allowed to stack features on a foundation that is on fire. Green first, then the field is open.

Generate, then rank. Instead of a checklist, the agent generates candidate improvements from the viewpoints of everyone who depends on the project, then ranks them on four axes: leverage (impact over effort), alignment (the project's evident direction, not the agent's taste), confidence (it knows the area well enough to be right), and reversibility (easy to undo if wrong). Anything weak on two of the last three is dropped, however tempting. That combination is exactly where autonomous work does damage.

Self-check before committing. Before each commit the agent checks itself against the intern it could become. Is it rewriting working code for taste? Inventing an abstraction for a single caller? Padding coverage with tests for trivial getters while a real path stays bare? Generating docs nobody asked for while the actual bug sits there? If a senior reviewer would sigh instead of nod, the work is dropped and it picks again.

The one habit that quietly kills it

The menu. The agent stops, lists five options, and asks which to pursue. The moment it writes "tell me which of these to do," the autonomy is over. Such a list almost always already contains a safe, high-value option: an accessibility fix, a test, a doc, a clean refactor. Autopilot's rule is to take the best one now, note the rest, and keep moving. It only truly stops once a fresh survey shows every remaining option is a genuine judgment call, which is rare.

What you get back

In practice it changes the shape of a session. You hand off a repo before a meeting, or overnight, with something as loose as "work on your own for a while," or you invoke /autopilot. You come back to a run of small commits, each one a finished, verified change with a message that says what and why. You read them at your pace. The good ones you keep. The rare wrong one you revert in isolation, precisely because it was committed on its own. Time that used to be dead waiting becomes a queue of work to review.

What it will not do is architect your system or make the product calls. Those are the judgment calls it is built to flag and leave for you. What it removes is the long tail of obvious, low-risk, valuable work that never gets done because it was never worth booking a session for: the missing test, the broken empty state, the stale doc, the unhandled error, the accessibility gap. An intern who clears that backlog on their own, and knows exactly when to stop and ask, is worth a lot.

Try it

Autopilot installs into any agent that speaks the skills.sh format (Claude Code, Cursor, Codex, Cline, Windsurf, OpenCode):

bash
npx skills@latest add saschb2b/skills --skill autopilot

Then hand a project off open-endedly and let it loop.

There's a skill for this

The full loop, the in-bounds boundary, and the candidate-ranking rubric live as an agent skill. Install it with npx skills@latest add saschb2b/skills --skill autopilot or read the full autopilot skill.

One aside, because it is unusual for me. I normally write the post first and distill a skill from it afterward. Autopilot went the other way: I built the skill, ran it for weeks, and wrote this afterward. A loop is something you have to run before you can describe it, so for once the skill came first and the post is the field report.

Sources

S

Written by