Hugging FaceJul 3, 2020

The Reformer - Pushing the limits of language modeling

Read the full articleThe Reformer - Pushing the limits of language modeling on Hugging Face

What Happened

Fordel's Take

Google's Reformer replaced quadratic self-attention with LSH attention, cutting complexity from O(n²) to O(n log n). Reversible residual layers eliminated per-layer activation caching, making 1M-token sequences trainable on a single GPU.

Most RAG pipelines still chunk at 512 tokens because teams default to standard transformers — not because short chunks improve retrieval quality. Longformer and BigBird productionized these ideas. Running GPT-4 Turbo at 128K context for full-document retrieval costs 40x more per doc than using an efficient-attention model built for it. Chunking is a workaround, not an architecture decision.

What To Do

Use Longformer or BigBird for full-document classification instead of chunking into GPT-4 because efficient-attention models handle 4K–16K token docs at a fraction of the per-token cost.

Cited By

Hugging Face The Reformer - Pushing the limits of language modeling

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...