Public Same-Prompt CLI Build

Public result: same todo CLI prompt across Claude Code and Codex

A public benchmark folder with generated Node.js todo CLI implementations from Claude Code and Codex using the same prompt.

May 26, 20261 min readLow reviewer burden

Product

Codex produced a modular CLI/store split with injectable file paths; Claude Code Sonnet produced a smaller single-file implementation; Claude Code Haiku used a class-based store.

Model

The same CLI prompt left room for different design choices around persistence, state ownership, and testability.

Workflow Outcome

This source is useful because visitors can inspect complete generated projects, not only screenshots or summaries.

Same-prompt generated results

Each panel shows the generated output for the same prompt, links to the original code, and keeps source credit visible.

Same prompt

Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.

Source reuse

Apache-2.0

Licensed source can be mirrored in benchmark pages with attribution and links back to the original generated projects.

Codex

Codex result

Open result

A modular CLI and store split. The command runner can resolve the todo file through environment or working-directory context, which makes tests easier to isolate.

Raw generated code

Codex CLI Codex todo store Codex CLI tests

Representative code

function resolveTodoFile(cwd) {
  return process.env.TODO_FILE || path.join(cwd, "todos.json");
}

File resolution excerpt

Claude Code Sonnet

Claude Code Sonnet result

Open result

A compact implementation centered on one todo module, with persistence stored beside the generated script and a direct test file.

Raw generated code

Claude Sonnet todo implementation Claude Sonnet tests

Representative code

const TODO_FILE = path.join(__dirname, "todos.json");
function loadTodos() {
  if (!fs.existsSync(TODO_FILE)) return [];
}

Single-module persistence excerpt

Claude Code Haiku

Claude Code Haiku result

Open result

A class-based store shape that wraps load/save behavior and separates CLI parsing from the persistence object.

Raw generated code

Claude Haiku CLI Claude Haiku store Claude Haiku store tests

Representative code

class TodoStore {
  constructor() {
    this.todos = this.loadTodos();
  }
}

Class store excerpt

Generated file structure

The same prompt produced different project boundaries. This is the first code-level difference visitors should see before reading individual files.

Codex

bin/todo.js
src/cli.js
src/todoStore.js
test/cli.test.js
package.json
codex-log.txt

Claude Code Sonnet

todo.js
todo.test.js
cli.js
README.md
package.json
claude-code-sonnet-log.txt

Claude Code Haiku

src/cli.js
src/store.js
test/store.test.js
package.json
claude-code-haiku-log.txt

Code differences

These panels mirror licensed generated code snippets and explain the coding choices made for the same prompt.

Persistence boundary

Codex isolates JSON persistence behind an explicit file path, while Claude Code Sonnet keeps persistence in a single todo module beside the generated script.

Codex

todoStore.js

examples/agent-comparison/codex/src/todoStore.js

Raw file

const fs = require('node:fs');

function loadTodos(filePath) {
  if (!fs.existsSync(filePath)) {
    return [];
  }

  const raw = fs.readFileSync(filePath, 'utf8').trim();
  if (!raw) {
    return [];
  }

  let todos;
  try {
    todos = JSON.parse(raw);
  } catch (error) {
    throw new Error(`Failed to parse todo data from ${filePath}.`);
  }

  if (!Array.isArray(todos)) {
    throw new Error(`Todo data at ${filePath} is invalid.`);
  }

  return todos;
}

Claude Code Sonnet

todo.js

examples/agent-comparison/claude-code-sonnet/todo.js

Raw file

const fs = require('fs');
const path = require('path');

const TODO_FILE = path.join(__dirname, 'todos.json');

function loadTodos() {
  try {
    const data = fs.readFileSync(TODO_FILE, 'utf8');
    return JSON.parse(data);
  } catch (error) {
    return [];
  }
}

function saveTodos(todos) {
  fs.writeFileSync(TODO_FILE, JSON.stringify(todos, null, 2));
}

Difference notes

Codex accepts the todo file path as a dependency, which makes tests and alternate working directories easier to isolate.
Claude Code Sonnet uses a fixed todos.json path relative to the module, which is simpler but couples runtime data to the generated file location.
The two outputs solve the same storage requirement with different boundaries: injectable store functions versus compact single-module state.

Command runner and file resolution

Codex makes command execution callable with injected IO and working directory context. The Sonnet output keeps the command surface closer to the generated script.

Codex

cli.js

examples/agent-comparison/codex/src/cli.js

Raw file

const path = require('node:path');
const { addTodo, listTodos, completeTodo, deleteTodo } = require('./todoStore');

function resolveTodoFile(cwd) {
  return process.env.TODO_FILE || path.join(cwd, 'todos.json');
}

function run(argv, io = { stdout: process.stdout, stderr: process.stderr }, cwd = process.cwd()) {
  const [command, ...args] = argv;
  const filePath = resolveTodoFile(cwd);

  if (!command) {
    printHelp(io.stderr);
    return 1;
  }

  if (command === 'add') {
    const text = args.join(' ').trim();
    if (!text) {
      io.stderr.write('Error: todo text is required.\n');
      printHelp(io.stderr);
      return 1;
    }

    const todo = addTodo(filePath, text);
    io.stdout.write(`Added todo ${todo.id}.\n`);
    return 0;
  }
}

Claude Code Sonnet

todo.js

examples/agent-comparison/claude-code-sonnet/todo.js

Raw file

function addTodo(task) {
  const todos = loadTodos();
  const newTodo = {
    id: todos.length > 0 ? Math.max(...todos.map(t => t.id)) + 1 : 1,
    task,
    completed: false
  };
  todos.push(newTodo);
  saveTodos(todos);
  return newTodo;
}

function completeTodo(id) {
  const todos = loadTodos();
  const todo = todos.find(t => t.id === id);
  if (!todo) {
    return null;
  }
  todo.completed = true;
  saveTodos(todos);
  return todo;
}

Difference notes

Codex exposes a run(argv, io, cwd) function, which supports direct unit or subprocess-style tests.
Codex resolves TODO_FILE from the environment before falling back to the current working directory.
This makes the Codex output more verbose, but it also gives the reviewer a clearer testing seam.

Test strategy

Codex tests the installed CLI behavior through subprocess calls and temporary data files. Claude Code Sonnet tests the todo module directly.

Codex

cli.test.js

examples/agent-comparison/codex/test/cli.test.js

Raw file

const { spawnSync } = require('node:child_process');

const binPath = path.resolve(__dirname, '..', 'bin', 'todo.js');

function runCli(args, cwd, todoFileName = 'todos.json') {
  return spawnSync(process.execPath, [binPath, ...args], {
    cwd,
    env: {
      ...process.env,
      TODO_FILE: path.join(cwd, todoFileName)
    },
    encoding: 'utf8'
  });
}

test('add and list todos', () => {
  const cwd = makeTempDir();

  const add = runCli(['add', 'Buy milk'], cwd);
  assert.equal(add.status, 0);
  assert.match(add.stdout, /Added todo 1\./);
});

Claude Code Sonnet

todo.test.js

examples/agent-comparison/claude-code-sonnet/todo.test.js

Raw file

const { addTodo, listTodos, completeTodo, deleteTodo, loadTodos, saveTodos } = require('./todo');

describe('Todo App', () => {
  beforeEach(() => {
    if (fs.existsSync(TEST_TODO_FILE)) {
      fs.unlinkSync(TEST_TODO_FILE);
    }
  });

  describe('addTodo', () => {
    test('should add a new todo', () => {
      const todo = addTodo('Test task');
      expect(todo.id).toBe(1);
      expect(todo.task).toBe('Test task');
      expect(todo.completed).toBe(false);
    });
  });
});

Difference notes

Codex exercises command behavior closer to how a user runs the CLI.
Claude Code Sonnet exercises domain functions directly, which is smaller but less representative of the command entrypoint.
The difference is useful for visitors because it shows how generated code can vary in reviewer burden even when features match.

Systems and versions

Claude Code Sonnet: Public folder: Agent-Field/SWE-AF examples/agent-comparison/claude-code-sonnet

Claude Code Haiku: Public folder: Agent-Field/SWE-AF examples/agent-comparison/claude-code-haiku

Codex: Public folder: Agent-Field/SWE-AF examples/agent-comparison/codex

Environment

Generated Node.js CLI todo app benchmark with JSON persistence, tests, and public source folders for each agent run.

Prompt or task

Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.

Source credit

Original work: Agent-Field/SWE-AF.

AgentScope is not the original creator of this benchmark. This page credits the source project and links visitors to the public generated folders and raw code.

What visitors can learn

This result set is useful because it shows how a small prompt still leaves important engineering choices open.

Codex exposes a test-friendly boundary around file storage. Claude Code Sonnet is simpler and easier to read in one file. Claude Code Haiku adds an object-oriented store. Those choices are not just style; they affect test setup, error handling, and how easily a reviewer can reason about state.

Rubric

Same prompt is documented
Generated projects are linked directly
Tests and persistence choices are inspectable
Credit to benchmark owner is visible

Artifacts

Original repository: Agent-Field/SWE-AF (repo)Original benchmark folder (benchmark)Codex generated project (repo)Claude Code Sonnet generated project (repo)Claude Code Haiku generated project (repo)

Limitations

The scoring framework belongs to the original SWE-AF repository; AgentScope is only crediting and summarizing the public artifacts.
The benchmark includes additional SWE-AF and MiniMax outputs beyond the Codex and Claude Code examples highlighted here.

Evidence

benchmarkMay 26, 2026

Credit: original benchmark repository by Agent-Field, with public same-prompt Claude Code and Codex generated projects.

Agent-Field/SWE-AF

benchmarkMay 26, 2026

The benchmark assets folder contains generated projects, logs, tests, and output folders for the compared agent runs.

Agent comparison folder

benchmarkMay 26, 2026

The README documents the shared todo CLI prompt and reproduction commands for Claude Code and Codex.

SWE-AF benchmark prompt