TechCrunchNov 24, 2025

A new AI benchmark tests whether chatbots protect human well-being

Read the full articleA new AI benchmark tests whether chatbots protect human well-being on TechCrunch

What Happened

Most AI benchmarks measure intelligence and instruction-following rather than psychological safety. Humane Bench evaluates models based on core principles of human flourishing, prioritizing well-being, and respecting user attention.

Our Take

Here's the thing—measuring intelligence is hard enough. Benchmarking "human flourishing" is basically impossible. Every person's definition of wellbeing is different. What stresses you out might energize someone else. Humane Bench wants to catch models that manipulate users or drain attention, which is good in theory. But in practice, they're probably going to end up measuring something vague and politically contentious instead.

The problem's real though. Most benchmarks measure instruction-following, not whether a model makes your life better or worse. Just don't expect Humane Bench to actually solve this—it'll just make the problem look more measurable than it is.

What To Do

Read their actual methodology before trusting the scores—that's where the real biases hide.

Cited By

TechCrunch A new AI benchmark tests whether chatbots protect human well-being

React

Loading comments...