$ tail -f stew/log  ·  ← log

sonnet 5, the hedgehog test

Anthropic shipped four models in a month. the pricing page couldn't tell me which one to use, so I asked them to draw a hedgehog instead.

$ tail -f stew

four models in about a month — Opus 4.8, Fable 5, Sonnet 5. the pitch on Sonnet 5 looks clean on paper: close to Opus on benchmarks, forty percent cheaper. except it ships with a new tokenizer that turns the same text into roughly 30% more tokens — up to 42% for plain English. same rate per mile, longer route.

that's not a number you can feel. so I ran a dumber, more honest test: I asked every model — Sonnet 5, Opus 4.8, Fable 5, and old Sonnet 4.6 as a baseline — to draw an SVG hedgehog playing a violin, at a few effort levels each. borrowed the idea from Simon Willison's pelican-on-a-bicycle benchmark. a violin is a genuinely hard shape to reason about — bow against strings, four limbs, spines everywhere — and unlike a coding benchmark, you can just look at the result and know if it's lying to you.

nine hedgehogs, ranked worst to best, judged and scored: the full comparison →

the short version: Sonnet 5 on medium effort came dead last. Sonnet 5 on high effort came third. same model, same prompt — the entire pricing story, drawn in hedgehogs.

more on this — and the actual verdict — soon. for now, the receipts are live.