Anthropic has to keep revising its technical interview test as Claude improves
What Happened
The issue of AI cheating is already wreaking havoc at schools and universities around the world, so it's ironic that AI labs are having to deal with it too. But Anthropic is also uniquely well-equipped to deal with the problem.
Our Take
Anthropic can't keep a consistent technical interview baseline because their own model keeps acing it. That's not a problem they should fix by making the test harder — it's a signal that Claude's already at a level where it's handling mid-level engineering questions.
The irony of 'AI cheating in school' hitting the lab that built the cheating tool isn't lost. But what's actually interesting is what this says about evaluation velocity — they're reshuffling benchmarks monthly.
That's the real story here.
What To Do
If you're building LLM hiring tools, assume the baseline shifts faster than you think.
Cited By
React
