Latest Changes

Recent changes

Short notes on product, model, and workflow changes that are worth revisiting.

productWorkflow changes can alter practical usability without any dramatic model benchmark movement.

Why workflow notes should trigger retests before headline takes

Product-level workflow changes can alter real usefulness even when the underlying model story appears mostly unchanged.

CodexClaude Code
April 14, 2026Read analysis
modelBig context claims need to survive real repo tasks, not just marketing language or isolated prompt demos.

Context claims only matter when they survive repo-scale tasks

Long-context positioning is useful only if the system maintains structure, scope control, and reviewer trust on actual engineering work.

GPT modelsClaude models
April 10, 2026Read analysis
methodologyMixing incomparable layers makes benchmark moves look clearer than they really are and hides the source of improvements or regressions.

Three scoreboards are a feature, not a reporting inconvenience

A serious publication should not merge product quality, model quality, and workflow outcome quality into one synthetic score.

CodexClaude CodeGPT modelsClaude models
April 5, 2026Read analysis