Hugging FaceMar 1, 2020

How to generate text: using different decoding methods for language generation with Transformers

Read the full articleHow to generate text: using different decoding methods for language generation with Transformers on Hugging Face

↗

What Happened

Fordel's Take

HuggingFace Transformers exposes five decoding strategies — greedy, beam search, top-k, top-p, and contrastive search — each producing different output distributions from identical model weights. The default varies by framework and wrapper.

Most RAG pipelines ship with whatever the SDK default is. Beam search with width 5 costs 5x the inference compute of greedy for marginal coherence gains on retrieval tasks. Defaulting to beam search because it sounds rigorous is cargo-cult engineering.

Agent builders doing structured JSON extraction: use greedy (temperature=0). Summarization or copy tasks: top-p=0.9. Switch decoding config before switching models — it's cheaper and faster to test.

What To Do

Use greedy decoding (temperature=0) for structured agent outputs instead of beam search because beam search multiplies inference cost with no coherence benefit on constrained JSON tasks.

Cited By

Hugging Face How to generate text: using different decoding methods for language generation with Transformers