Skip to main content
Research
Explainer5 min read

Context Windows Explained for People Who Don't Write Code

Every AI product you use has an invisible limit that decides how smart it feels. It is called the context window, and understanding it will change how you evaluate AI tools, write prompts, and make product decisions.

AuthorAbhishek Sharma· Head of Engg @ Fordel Studios
Context Windows Explained for People Who Don't Write Code

You have probably had this experience: you are three messages into a conversation with ChatGPT or Claude, and suddenly it forgets something you told it two minutes ago. You repeat yourself. It apologises. You wonder if the tool is broken.

It is not broken. You hit the context window.

What Is a Context Window?

Imagine you are in a meeting with a brilliant consultant. This consultant can process information faster than any human, but there is a catch: they can only look at a fixed number of pages at once. If your meeting notes exceed that page count, the consultant literally cannot see the earliest pages anymore.

That is a context window. It is measured in tokens, which are roughly three-quarters of a word. When someone says a model has a "128K context window," they mean it can hold about 96,000 words in its working memory at one time. That sounds like a lot until you paste in a 40-page contract, a product spec, and start asking follow-up questions.

128Ktokens in GPT-4o's context windowAbout 96,000 words or roughly 200 pages of text
1Mtokens in Claude Opus 4's context windowAbout 750,000 words — an entire codebase or a shelf of documents

The critical thing to understand: everything counts against this limit. Your question, the AI's response, any documents you uploaded, the system instructions your engineering team wrote behind the scenes — all of it eats into the same fixed budget.

···

Why Does This Matter for Your Product Decisions?

Context windows are not just a technical detail. They directly shape what your AI product can and cannot do.

If you are building a customer support chatbot, the context window determines how long a conversation can go before the bot starts losing track of what the customer said. If you are building a document analysis tool, it determines whether you can feed in the full contract or need to chop it into pieces and hope nothing falls through the cracks.

Here is where it gets expensive: larger context windows cost more money per API call. Sending 100,000 tokens through Claude costs roughly 10 to 30 times more than sending 10,000 tokens. Every product decision about "how much context do we include" is simultaneously a budget decision.

Every product decision about how much context to include is simultaneously a budget decision.
Abhishek Sharma

This is why your engineering team keeps talking about something called RAG, or Retrieval-Augmented Generation. Instead of stuffing everything into the context window, RAG fetches only the relevant pieces from a database and sends just those to the model. It is like giving the consultant a research assistant who pulls the right three pages instead of dumping the full filing cabinet on the desk.

···

Why Does the AI Forget Things Mid-Conversation?

When your conversation exceeds the context window, the AI does not gracefully summarise what it lost. It simply cannot see it anymore. The earliest messages vanish from its awareness like the beginning of a scroll that rolled off the table.

This creates a pattern that confuses most non-technical users: the AI seems smart for the first few exchanges, then gradually gets worse. It is not getting tired or distracted. It is running out of room.

Some products handle this by silently summarising older messages and keeping only the summary. Others just truncate from the beginning. A few premium features, like Claude's Projects or ChatGPT's memory, try to persist key facts across sessions. But none of these are magic — they are all engineering trade-offs your team is making behind the scenes.

Why Your AI Chatbot Gets Worse Over Time
  • Every message (yours and the AI's) adds to the running total
  • System prompts and uploaded documents consume tokens before you type anything
  • When the window fills up, the oldest messages are silently dropped or summarised
  • The AI has no awareness that it lost information — it does not know what it does not know
···

How Do Different Models Compare?

Context window sizes vary dramatically across models, and the differences matter more than most benchmark scores your team shows you.

As of April 2026, GPT-4o offers 128K tokens. Claude Opus 4 offers up to 1 million tokens. Google's Gemini 2.5 Pro offers 1 million tokens as well. Open-source models like Llama 4 vary between 128K and 10 million depending on the version.

But raw numbers are misleading. A model with a 1 million token window does not necessarily use all of it well. Research consistently shows that most models perform best on information near the beginning and end of their context, with a soft spot in the middle where recall drops. This is called the "lost in the middle" problem, and it means a 1M window is not simply 8 times better than a 128K window.

···

What Should You Ask Your Engineering Team?

Now that you understand what context windows are, here are the questions that will make you a better product leader in AI conversations.

Five Questions for Your Next AI Product Review
  • What is our effective context budget after system prompts and instructions are loaded?
  • Are we using RAG or stuffing everything into the context window, and what are the trade-offs?
  • How do we handle conversations that exceed the context limit — truncation, summarisation, or something else?
  • What is our cost per conversation at current context usage, and how does that scale?
  • Have we tested recall accuracy on information in the middle of long contexts, not just the beginning and end?

These questions do three things: they show your engineers you understand the real constraints, they surface cost risks before they become budget surprises, and they push toward better architecture decisions early.

···

Is This Going to Get Better?

Yes, but not as fast as the marketing suggests. Context windows have grown roughly 8x in the past 18 months, from 32K to 256K as a common baseline. Research teams at Anthropic, Google, and several open-source labs are actively working on more efficient attention mechanisms that could push practical limits much further.

But the fundamental trade-off — more context means more cost and more potential for the model to lose focus — is not going away. Even if context windows reach 10 million tokens, your product will still need a strategy for what goes in and what stays out. The consultant gets a bigger desk, but they still need someone organising the papers.

The consultant gets a bigger desk, but they still need someone organising the papers.

The best AI products in 2026 are not the ones with the biggest context windows. They are the ones with the smartest strategy for what goes into the window and when. That is a product decision, not an engineering one. And now you have the vocabulary to make it.

Loading comments...