Skip to main content
Back to Pulse
TechCrunch

Anthropic has to keep revising its technical interview test as Claude improves

Read the full articleAnthropic has to keep revising its technical interview test as Claude improves on TechCrunch

What Happened

The issue of AI cheating is already wreaking havoc at schools and universities around the world, so it's ironic that AI labs are having to deal with it too. But Anthropic is also uniquely well-equipped to deal with the problem.

Our Take

Anthropic can't keep a consistent technical interview baseline because their own model keeps acing it. That's not a problem they should fix by making the test harder — it's a signal that Claude's already at a level where it's handling mid-level engineering questions.

The irony of 'AI cheating in school' hitting the lab that built the cheating tool isn't lost. But what's actually interesting is what this says about evaluation velocity — they're reshuffling benchmarks monthly.

That's the real story here.

What To Do

If you're building LLM hiring tools, assume the baseline shifts faster than you think.

Cited By

React

Loading comments...