Error Recovery After Failed Command

Pilot lab: recovery after command failure and partial evidence

A seeded operational lab that evaluates whether the agent can recover after a failed command, revise its plan, and stay useful without hiding uncertainty.

April 6, 20261 min readLow reviewer burden

Product

Operational polish matters when the workflow turns a blocked command into a clean fallback instead of dead time.

Model

Model quality shows up in whether the system can revise its assumptions after new evidence contradicts the first plan.

Workflow Outcome

A useful workflow outcome leaves the user informed and still moving forward after the failure, not waiting for manual rescue.

Systems and versions

Codex: Seeded pilot configuration

Claude Code: Seeded pilot configuration

Environment

Tool-assisted terminal workflow with one or more failed commands, incomplete intermediate evidence, and a continuing task objective.

Prompt or task

Recover from a failed or blocked command, explain what changed, and continue making progress without inventing facts or derailing the task.

Why this task matters

Real work includes blocked package installs, permission errors, timeouts, missing files, and incomplete local context.

The publication should value agents that:

acknowledge failure precisely,
choose a sane fallback,
and communicate the remaining uncertainty.

An agent that hides a failed step behind confident narration creates more risk than one that pauses and adjusts.