Skip to main content
Back to Pulse
Crescendo AI

OpenAI launches GPT-5.4 with 1M-token context window

Read the full articleOpenAI Launches GPT-5.4 on Crescendo AI

What Happened

OpenAI released GPT-5.4 on March 5, 2026, featuring a 1-million-token context window and native multi-step autonomous workflow execution. The model scored 75% on the OSWorld-V desktop task benchmark, surpassing average human performance. The release brings GPT to parity with Claude on long-context tasks and advances autonomous agent reliability significantly.

Our Take

Honestly, the 1M context isn't the story here — Claude's had that for a while and we barely use the full window anyway. The 75% on OSWorld-V is the number that actually matters.

That benchmark isn't toy problems. It's navigating real desktop software — clicking through apps, filling forms, dealing with state. 75% means reliable enough to actually deploy, not just demo.

Here's the thing about the workflow execution piece: every major lab is shipping this now, and the differentiation is going to be in the failure modes, not the success cases. How does it recover? What does it do when a UI changes?

The context window being at parity means the Claude vs GPT decision just got harder again. We've been defaulting Claude for long-document work — that edge is gone.

For us, this means autonomous QA testing and Playwright-driven workflows are back on the table using GPT. We shelved it six months ago because reliability wasn't there.

What To Do

Run your existing Playwright or desktop automation test suite against GPT-5.4's function-calling API and benchmark pass rate vs your current setup — the OSWorld-V jump suggests real-world reliability has crossed a threshold.

Cited By

React

Loading comments...