Hugging FaceJan 24, 2023

Optimum+ONNX Runtime - Easier, Faster training for your Hugging Face models

Read the full articleOptimum+ONNX Runtime - Easier, Faster training for your Hugging Face models on Hugging Face

↗

What Happened

Fordel's Take

Hugging Face's Optimum library now surfaces ONNX Runtime as a first-class training backend, letting you swap `Trainer` for `ORTTrainer` with one import change. No graph rewrites, no custom kernels — it compiles your model to ONNX mid-training and hands execution to ORT's fused ops.

On A100s, ORTTrainer cuts BERT fine-tuning time by roughly 35% with zero accuracy delta. Most teams still default to vanilla PyTorch for fine-tuning because it's familiar — that habit is leaving real GPU-hours on the table. This matters most if you're fine-tuning on spot instances where wall-clock time directly maps to cost.

Teams running weekly fine-tune cycles on models like DistilBERT or RoBERTa should pilot ORTTrainer immediately. If you're doing one-off fine-tunes on a fixed budget, skip it — setup overhead isn't worth it.

What To Do

Use ORTTrainer instead of Trainer for recurring fine-tune jobs because the 30-35% speedup compounds into real cost reduction on spot GPU billing.

Cited By

Hugging Face Optimum+ONNX Runtime - Easier, Faster training for your Hugging Face models