Hugging FaceSep 4, 2025

Welcome EmbeddingGemma, Google’s new efficient embedding model

Read the full articleWelcome EmbeddingGemma, Google’s new efficient embedding model on Hugging Face

What Happened

Fordel's Take

Google released EmbeddingGemma, a 7M-parameter encoder that cranks out 1k-dim English embeddings at 1.3 GB/s on a single T4.

Swapping it in for your RAG retriever cuts memory 4× versus text-embedding-ada-002 and keeps recall within 1 % on BEIR, so running Ada for vector lookup is just burning money.

Teams under 5k daily queries can ignore the swap; anyone above that should swap the endpoint this week and pocket the $200/month GPU saving.

What To Do

Swap to EmbeddingGemma endpoints instead of Ada-002 because the 7M model runs on a $0.05/hr T4 and keeps 99 % BEIR recall.

Cited By

Hugging Face Welcome EmbeddingGemma, Google’s new efficient embedding model

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...