Building AI-Powered Search That Actually Works in Production

Every client we talk to in late 2025 wants AI-powered search. They assume making their search bar smarter is a weekend project. It is not. We have built semantic search into eleven production applications, and the gap between a demo and a production system is enormous.

The demo version is easy. Chunk your documents, generate embeddings with OpenAI text-embedding-3-small, store them in Pinecone, and do cosine similarity search. Takes a day to build. Works beautifully with fifty documents. Falls apart with fifty thousand.

The first problem is relevance. Pure vector similarity returns results that are semantically related but not relevant. A search for "cancel my subscription" returns results about billing cycles and account management. The fix is hybrid search: combine vector similarity with BM25 keyword matching and a reranker model. We use Cohere rerank at about one cent per thousand queries. The relevance improvement is dramatic.

The second problem is chunking. Splitting by paragraph loses context. A paragraph saying "see the pricing above" is useless alone. We use hierarchical chunking where each chunk carries metadata about its parent section and document. Costs more storage but improves relevance by about fifteen percentage points.

The third problem is cost at scale. One client had eighty thousand articles changing daily. Re-embedding everything daily would cost forty dollars. We implemented incremental embedding with content hashes, dropping daily costs to two dollars.

The fourth problem nobody discusses is evaluation. We build test datasets of two hundred to five hundred query-result pairs rated by humans and benchmark after every pipeline change. Without this you are flying blind.

Our production stack: PostgreSQL with pgvector, Cohere for reranking, OpenAI text-embedding-3-small for embeddings. Total cost for one hundred thousand documents serving one thousand queries per day: roughly ninety dollars per month. A fraction of what Algolia costs, with significantly better semantic understanding.

AI search is not plug-and-play. But done right, it transforms content-heavy applications. The investment in proper chunking, hybrid retrieval, and evaluation pays for itself within months.

Related Articles

How We Evaluate Whether an AI Feature Is Worth Building

Multimodal AI Beyond Chatbots: Five Production Use Cases That Print Money

Gemini Flash Lite: The Underrated LLM That Powers Half Our Projects

Want to discuss this further?

Ready to build
something real?