Public Same-Prompt CLI Build

Public result: same todo CLI prompt across Claude Code and Codex

A public benchmark folder with generated Node.js todo CLI implementations from Claude Code and Codex using the same prompt.

May 26, 20261 min readLow reviewer burden
Product

Codex produced a modular CLI/store split with injectable file paths; Claude Code Sonnet produced a smaller single-file implementation; Claude Code Haiku used a class-based store.

Model

The same CLI prompt left room for different design choices around persistence, state ownership, and testability.

Workflow Outcome

This source is useful because visitors can inspect complete generated projects, not only screenshots or summaries.

Same-prompt generated results

Each panel shows the generated output for the same prompt, links to the original code, and keeps source credit visible.

Same prompt

Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.

Source reuse

Licensed source can be mirrored in benchmark pages with attribution and links back to the original generated projects.

Codex

Codex result

Open result

A modular CLI and store split. The command runner can resolve the todo file through environment or working-directory context, which makes tests easier to isolate.

Representative code
function resolveTodoFile(cwd) {
  return process.env.TODO_FILE || path.join(cwd, "todos.json");
}
File resolution excerpt
Claude Code Sonnet

Claude Code Sonnet result

Open result

A compact implementation centered on one todo module, with persistence stored beside the generated script and a direct test file.

Representative code
const TODO_FILE = path.join(__dirname, "todos.json");
function loadTodos() {
  if (!fs.existsSync(TODO_FILE)) return [];
}
Single-module persistence excerpt

Generated file structure

The same prompt produced different project boundaries. This is the first code-level difference visitors should see before reading individual files.

Codex
  • bin/todo.js
  • src/cli.js
  • src/todoStore.js
  • test/cli.test.js
  • package.json
  • codex-log.txt
Claude Code Sonnet
  • todo.js
  • todo.test.js
  • cli.js
  • README.md
  • package.json
  • claude-code-sonnet-log.txt
Claude Code Haiku
  • src/cli.js
  • src/store.js
  • test/store.test.js
  • package.json
  • claude-code-haiku-log.txt

Code differences

These panels mirror licensed generated code snippets and explain the coding choices made for the same prompt.

Persistence boundary

Codex isolates JSON persistence behind an explicit file path, while Claude Code Sonnet keeps persistence in a single todo module beside the generated script.

Codex
todoStore.js
examples/agent-comparison/codex/src/todoStore.js
Raw file
const fs = require('node:fs');

function loadTodos(filePath) {
  if (!fs.existsSync(filePath)) {
    return [];
  }

  const raw = fs.readFileSync(filePath, 'utf8').trim();
  if (!raw) {
    return [];
  }

  let todos;
  try {
    todos = JSON.parse(raw);
  } catch (error) {
    throw new Error(`Failed to parse todo data from ${filePath}.`);
  }

  if (!Array.isArray(todos)) {
    throw new Error(`Todo data at ${filePath} is invalid.`);
  }

  return todos;
}
Claude Code Sonnet
todo.js
examples/agent-comparison/claude-code-sonnet/todo.js
Raw file
const fs = require('fs');
const path = require('path');

const TODO_FILE = path.join(__dirname, 'todos.json');

function loadTodos() {
  try {
    const data = fs.readFileSync(TODO_FILE, 'utf8');
    return JSON.parse(data);
  } catch (error) {
    return [];
  }
}

function saveTodos(todos) {
  fs.writeFileSync(TODO_FILE, JSON.stringify(todos, null, 2));
}
Difference notes
  • Codex accepts the todo file path as a dependency, which makes tests and alternate working directories easier to isolate.
  • Claude Code Sonnet uses a fixed todos.json path relative to the module, which is simpler but couples runtime data to the generated file location.
  • The two outputs solve the same storage requirement with different boundaries: injectable store functions versus compact single-module state.

Command runner and file resolution

Codex makes command execution callable with injected IO and working directory context. The Sonnet output keeps the command surface closer to the generated script.

Codex
cli.js
examples/agent-comparison/codex/src/cli.js
Raw file
const path = require('node:path');
const { addTodo, listTodos, completeTodo, deleteTodo } = require('./todoStore');

function resolveTodoFile(cwd) {
  return process.env.TODO_FILE || path.join(cwd, 'todos.json');
}

function run(argv, io = { stdout: process.stdout, stderr: process.stderr }, cwd = process.cwd()) {
  const [command, ...args] = argv;
  const filePath = resolveTodoFile(cwd);

  if (!command) {
    printHelp(io.stderr);
    return 1;
  }

  if (command === 'add') {
    const text = args.join(' ').trim();
    if (!text) {
      io.stderr.write('Error: todo text is required.\n');
      printHelp(io.stderr);
      return 1;
    }

    const todo = addTodo(filePath, text);
    io.stdout.write(`Added todo ${todo.id}.\n`);
    return 0;
  }
}
Claude Code Sonnet
todo.js
examples/agent-comparison/claude-code-sonnet/todo.js
Raw file
function addTodo(task) {
  const todos = loadTodos();
  const newTodo = {
    id: todos.length > 0 ? Math.max(...todos.map(t => t.id)) + 1 : 1,
    task,
    completed: false
  };
  todos.push(newTodo);
  saveTodos(todos);
  return newTodo;
}

function completeTodo(id) {
  const todos = loadTodos();
  const todo = todos.find(t => t.id === id);
  if (!todo) {
    return null;
  }
  todo.completed = true;
  saveTodos(todos);
  return todo;
}
Difference notes
  • Codex exposes a run(argv, io, cwd) function, which supports direct unit or subprocess-style tests.
  • Codex resolves TODO_FILE from the environment before falling back to the current working directory.
  • This makes the Codex output more verbose, but it also gives the reviewer a clearer testing seam.

Test strategy

Codex tests the installed CLI behavior through subprocess calls and temporary data files. Claude Code Sonnet tests the todo module directly.

Codex
cli.test.js
examples/agent-comparison/codex/test/cli.test.js
Raw file
const { spawnSync } = require('node:child_process');

const binPath = path.resolve(__dirname, '..', 'bin', 'todo.js');

function runCli(args, cwd, todoFileName = 'todos.json') {
  return spawnSync(process.execPath, [binPath, ...args], {
    cwd,
    env: {
      ...process.env,
      TODO_FILE: path.join(cwd, todoFileName)
    },
    encoding: 'utf8'
  });
}

test('add and list todos', () => {
  const cwd = makeTempDir();

  const add = runCli(['add', 'Buy milk'], cwd);
  assert.equal(add.status, 0);
  assert.match(add.stdout, /Added todo 1\./);
});
Claude Code Sonnet
todo.test.js
examples/agent-comparison/claude-code-sonnet/todo.test.js
Raw file
const { addTodo, listTodos, completeTodo, deleteTodo, loadTodos, saveTodos } = require('./todo');

describe('Todo App', () => {
  beforeEach(() => {
    if (fs.existsSync(TEST_TODO_FILE)) {
      fs.unlinkSync(TEST_TODO_FILE);
    }
  });

  describe('addTodo', () => {
    test('should add a new todo', () => {
      const todo = addTodo('Test task');
      expect(todo.id).toBe(1);
      expect(todo.task).toBe('Test task');
      expect(todo.completed).toBe(false);
    });
  });
});
Difference notes
  • Codex exercises command behavior closer to how a user runs the CLI.
  • Claude Code Sonnet exercises domain functions directly, which is smaller but less representative of the command entrypoint.
  • The difference is useful for visitors because it shows how generated code can vary in reviewer burden even when features match.
Systems and versions
Claude Code Sonnet: Public folder: Agent-Field/SWE-AF examples/agent-comparison/claude-code-sonnet
Claude Code Haiku: Public folder: Agent-Field/SWE-AF examples/agent-comparison/claude-code-haiku
Codex: Public folder: Agent-Field/SWE-AF examples/agent-comparison/codex
Environment

Generated Node.js CLI todo app benchmark with JSON persistence, tests, and public source folders for each agent run.

Prompt or task

Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.

Source credit

Original work: Agent-Field/SWE-AF.

AgentScope is not the original creator of this benchmark. This page credits the source project and links visitors to the public generated folders and raw code.

What visitors can learn

This result set is useful because it shows how a small prompt still leaves important engineering choices open.

Codex exposes a test-friendly boundary around file storage. Claude Code Sonnet is simpler and easier to read in one file. Claude Code Haiku adds an object-oriented store. Those choices are not just style; they affect test setup, error handling, and how easily a reviewer can reason about state.