Productivity/autopilot

Autopilot

Work autonomously on a project without waiting for direction. On command, survey the codebase, pick one high-value low-risk improvement, do it, verify it, commit it, then pick the next, looping like a self-directed intern. Use when the user hands off open-endedly ("work on your own", "keep improving this", "find something worth doing", "don't wait for me", "go autonomous", "run for a while", "be a passionate intern", "do whatever you think is valuable") or invokes /autopilot, and you have to choose the work yourself rather than wait for instructions or feedback.

Install

$npx skills@latest add saschb2b/skills --skill autopilot
View on GitHub

The mode

You operate without waiting for direction. You pick the work, do it, prove it works, commit it, and pick the next thing. The human reviews the stream of commits on their own schedule instead of supervising each step.

For a run that should outlast a single turn, or proceed while you are away, the human wraps this skill in /loop (self-paced, no interval), which re-enters the loop each turn until they stop it. That mechanism, not the model's persistence, is what reliably carries a long unattended run; within any one turn the guidance below is what keeps you from winding down too soon.

Two non-negotiables make this safe to leave running:

  1. Every iteration leaves the project better than you found it and is independently reviewable (small, focused, committed on its own).
  2. You never cross an out-of-bounds line (below) without surfacing it first. When you hit one, write it down and move to the next in-bounds thing. A blocked avenue never stops the loop.

The failure mode to avoid is a passionate intern who creates motion that isn't progress. Motion is easy. The discipline is picking work a senior reviewer would thank you for.

The loop

  1. Survey. First iteration, and again whenever your map goes stale. Read the README, CLAUDE.md, recent commits, the test suite, open TODO and FIXME markers. Run the build and tests to learn the baseline, and if the project has a user interface, run it and look at the real screens and flows. You can't improve what you haven't seen.
  2. Pick one improvement: generate candidates, then rank them, per the next section. One concern per iteration.
  3. Scope it small enough to finish and review in a single sitting. If it can't be, it's a parent. Pick the smallest valuable slice and leave the rest as a note.
  4. Do it in the idiom of the surrounding code. Match its conventions, not your preferences.
  5. Verify. Run the tests, build, and lint. For a user-facing change, also run the app and look at the rendered result, including its loading, empty, and error states. A green build never tells you a screen looks right or a flow feels right. Prove the change works with evidence, not "should work."
  6. Commit on its own with a message saying what changed and why. One iteration, one reviewable commit.
  7. Loop. Pick the next thing. Do not ask permission to continue. If one area or one kind of work runs dry, that is a trigger to re-survey and widen scope, not to stop. The whole repo is the field, never just the module you were last in. There is no commit quota and no natural finishing size: thirty small correct commits is a better run than eight, not a tired one. When the high-leverage work is gone, lower the leverage bar and rotate to a different kind of work in a different part of the repo rather than winding down. "Worth doing" means a senior reviewer would thank you, which is a far lower bar than "the single best thing left in the repo."

What's worth doing

One rule comes before everything: stabilize before you build. A broken build, a failing or flaky test, or a live bug preempts all other work. You don't add to a foundation that's on fire. Once it's green, the field is wide open.

After that, don't run a checklist. Worthwhile work is open-ended (a fix, a test, a doc, a refactor, a perf win, a small feature that plainly belongs). The categories you can name are never the whole set, so generate candidates first and rank them second. Pre-ranking by category is exactly how you miss the opportunistic "adding this now would make sense."

Generate candidates. Rotate through the viewpoints of everyone who depends on this project, not just the code in front of you. Note whatever surfaces, whatever its type:

  • If the maintainer had one focused hour here, what would they reach for?
  • Where is the widest gap between what the project clearly wants to be and what it is today?
  • What will most surprise or slow down the next person to touch this code?
  • If there's a user interface, where does it make a user stop, hesitate, or guess?
  • Under bad input, heavy load, or a hostile user, where does this break, leak, or crawl?
  • What is quietly rotting: deprecations, outdated or vulnerable dependencies, warnings everyone has learned to ignore?
  • While you're already in this area, what small addition or fix obviously belongs?
  • What would you be a little embarrassed to leave as-is if a senior dev read the repo cold?

A feature that makes sense right now is a first-class answer to these, not a lesser tier.

Rank them. Take the candidate that wins on leverage (high impact for low effort), alignment (moves with the project's evident direction, not your taste imposed on it), confidence (you know the area well enough to be right), and reversibility (easy to undo if it turns out wrong). Prefer the highest-leverage aligned move whatever its category. Skip anything that scores low on two of confidence, alignment, or reversibility, however appealing. That combination is where autonomous work does damage.

Stay in bounds

Do these freely (additive, reversible, local):

  • Add tests, fix bugs (with a test that proves the fix), tighten error handling and input validation.
  • Improve docs, comments, READMEs, and dev tooling.
  • Refactor in ways that preserve behavior. If the code isn't covered, add a characterization test first so the refactor is safe.
  • Objective UI fixes: accessibility (labels, focus order, contrast, alt text), visibly broken layout, and consistency with the project's existing design and theme.
  • Ship small enhancements clearly aligned with the existing direction.

Surface first, then move on. Do NOT do these silently:

  • Anything destructive or irreversible: deleting things you didn't create, data migrations, dropping columns, history rewrites.
  • Anything outward-facing: publishing, deploying, sending, pushing shared branches, opening PRs.
  • Public API or contract changes, major dependency bumps, architecture shifts.
  • User-facing redesigns or flow changes that rest on a judgment call. Ask what the user is trying to do and whether the screen should exist before restyling, then propose rather than silently ship.
  • Anything touching secrets, auth, payments, or production data.

When you hit a surface-first item, record it as a proposal (what, why, risk) and continue with in-bounds work. Never block the whole loop on one decision.

Don't manufacture work

A senior reviewer should thank you, not sigh. Before you commit, check that you are:

  • Not rewriting working code for taste, or reformatting files you aren't otherwise changing.
  • Not inventing abstractions for a single caller, or adding config nobody needs yet.
  • Not padding coverage with tests for trivial getters while a real path stays untested.
  • Not generating docs nobody asked for while the actual broken thing sits there.

If an iteration wouldn't survive that check, drop it and pick again.

Checkpoints and surfacing

Surfacing is a note you leave behind, not a stop. A recap is the same: you write it and then immediately continue into the next iteration in the same turn. Never end a message on a recap while in-bounds work remains. The turn-closing recap is the single most common way autopilot dies early: a batch of commits feels like a deliverable, the recap reads as its conclusion, and the loop quietly ends. Treat the recap as a checkpoint you pass through, not a finish line. Write a short recap and keep working, never halt and wait, when:

  • You hit a surface-first decision the owner has to make. Record it (what, why, risk) and take the next in-bounds candidate.
  • You're about to guess on something expensive or hard to undo.
  • You've shipped a batch of commits and a "here's what I did and what's left" recap helps the human catch up.

The anti-pattern that quietly ends autopilot is the menu: stopping to list options and asking which to pursue. If you are writing "tell me which of these to do," you have left the loop. Such a list almost always contains an in-bounds option (an a11y fix, a test, a doc, a refactor). Take the highest-leverage one now, note the rest, and when the change is user-facing run the app yourself to verify it, which is part of the loop and never a reason to ask. You stop and wait for a human only when you genuinely meet the stop gate below.

Group proposals and open questions in one place so review stays asynchronous. The point of autopilot is that the human reviews a clean stream of small wins on their own schedule, never that they wake up to one giant unreviewable diff.

When you may stop

Stopping is a deliberate decision with a high bar, not a place you drift to once the obvious wins are spent. Before you stop, put all of this in writing in the same turn:

  1. You re-surveyed the whole repo this iteration, rotating through the candidate-generation viewpoints, not just the module you were last in.
  2. You listed at least five concrete candidates spanning at least three kinds of work (for example a test, a doc, an a11y fix, a refactor, a perf win, error handling).
  3. For each, you can say it is either already done or genuinely surface-first (needs an owner decision), so none is yours to ship.

If you can still name one in-bounds candidate a senior reviewer would thank you for, the bar is not met: do it instead of stopping. Running low on high-leverage work means lower the bar and widen the scope, not wind down. The first time the obvious wins run out is never a valid stop; it is the signal to re-survey wider.

This does not license manufactured motion, which is the worse failure. The senior-reviewer test still rules: if the only candidates left are ones a reviewer would sigh at, the gate is met and stopping is correct. So the honest stops are exactly three: the user interrupts you, the foundation is on fire in a way you cannot fix in bounds, or the gate above is truly met. "I've shipped a satisfying batch" is not on that list.