Every client we talk to in late 2025 wants AI-powered search. They assume making their search bar smarter is a weekend project. It is not. We have built semantic search into eleven production applications, and the gap between a demo and a production system is enormous.
The demo version is easy. Chunk your documents, generate embeddings with OpenAI text-embedding-3-small, store them in Pinecone, and do cosine similarity search. Takes a day to build. Works beautifully with fifty documents. Falls apart with fifty thousand.
The first problem is relevance. Pure vector similarity returns results that are semantically related but not relevant. A search for "cancel my subscription" returns results about billing cycles and account management. The fix is hybrid search: combine vector similarity with BM25 keyword matching and a reranker model. We use Cohere rerank at about one cent per thousand queries. The relevance improvement is dramatic.
The second problem is chunking. Splitting by paragraph loses context. A paragraph saying "see the pricing above" is useless alone. We use hierarchical chunking where each chunk carries metadata about its parent section and document. Costs more storage but improves relevance by about fifteen percentage points.
The third problem is cost at scale. One client had eighty thousand articles changing daily. Re-embedding everything daily would cost forty dollars. We implemented incremental embedding with content hashes, dropping daily costs to two dollars.
The fourth problem nobody discusses is evaluation. We build test datasets of two hundred to five hundred query-result pairs rated by humans and benchmark after every pipeline change. Without this you are flying blind.
Our production stack: PostgreSQL with pgvector, Cohere for reranking, OpenAI text-embedding-3-small for embeddings. Total cost for one hundred thousand documents serving one thousand queries per day: roughly ninety dollars per month. A fraction of what Algolia costs, with significantly better semantic understanding.
AI search is not plug-and-play. But done right, it transforms content-heavy applications. The investment in proper chunking, hybrid retrieval, and evaluation pays for itself within months.
About the Author
Fordel Studios
AI-native app development for startups and growing teams. 14+ years of experience shipping production software.
Not every feature needs AI. We developed a framework for evaluating whether an AI-powered approach delivers enough value over traditional logic to justify the complexity and cost.
The industry is fixated on chatbots. Meanwhile, the highest-ROI AI features we have shipped are multimodal applications that combine vision, text, and structured data extraction.

While everyone debates GPT-4o vs Claude, we quietly moved most of our production workloads to Gemini Flash Lite. The performance-to-cost ratio is unmatched for structured tasks.
We love talking shop. If this article resonated, let's connect.
Start a ConversationTell us about your project. We'll give you honest feedback on scope, timeline, and whether we're the right fit.
Start a Conversation