Best Speech-to-Text Models for Linux in 2026

Parakeet TDT V3, Whisper, Cohere Transcribe — a clear breakdown of which model fits your hardware and workflow.

The model shapes everything

Latency, accuracy, GPU requirements, language support, privacy — all of it flows from your model choice. The good news: in 2026, every tier is genuinely excellent. The question is matching the model to your hardware and workflow.

For an objective view of where models stand, the Open ASR Leaderboard on Hugging Face tracks Word Error Rate (WER) across standardised benchmarks. It's the most reliable public signal for comparing transcription accuracy across models.

Cohere Transcribe

Best for: maximum accuracy, long-form content, technical vocabulary.

Cohere Transcribe currently sits at #1 on the Open ASR Leaderboard — 5.42 average WER versus 7.44 for Whisper large-v3. For developers dictating technical language, API names, and command-line flags, that gap is meaningful.

Cohere Transcribe benchmark results vs Whisper large-v3

Benchmark	Cohere Transcribe	Whisper large-v3
LibriSpeech clean	1.25	2.7
LibriSpeech other	2.37	5.2
TedLium	2.49	4.2
SPGISpeech	3.08	4.7
VoxPopuli	5.87	9.1
Gigaspeech	9.33	10.3
Earnings22	10.84	12.7
Average WER	5.42	7.44

Lower WER is better. Full benchmark details at the Cohere Transcribe release blog.

Runs locally — ~4 GB VRAM (bfloat16) or CPU with ~8 GB RAM
14 languages including English, German, French, Spanish, Japanese, Korean
Gated model — requires a free HuggingFace token to download
Privacy: fully local

Parakeet TDT V3

Best for: fast local English dictation on GPU or CPU.

NVIDIA's Parakeet TDT V3 is state of the art for English transcription. Sub-second latency on a mid-range GPU, accuracy that handles natural speech, technical vocabulary, and fast delivery without breaking a sweat. hyprwhspr runs Parakeet through the onnx-asr runtime, which makes a model that typically requires a large GPU run surprisingly well on CPU with minimal accuracy trade-off.

Requires: ~1 GB RAM (CPU) or VRAM (GPU)
Languages: English (optimised)
Latency: under 500ms on GPU, 1–2 seconds on modern CPU
Privacy: fully local

If you're running a local LLM alongside hyprwhspr and need to conserve VRAM, Parakeet on CPU is the right call — you get excellent accuracy without competing for GPU memory.

Whisper family

Best for: multiple languages, or when flexibility matters more than raw speed.

OpenAI's Whisper is the most battle-tested open-source transcription model. It runs at five sizes — tiny through large-v3 — so you can tune the accuracy/speed trade-off to your hardware. Genuinely strong across 90+ languages, and it can translate directly to English from any of them.

Requires: GPU for large variants, CPU workable with tiny/base
Languages: 90+
Latency: varies by size — base.en is fast, large-v3 is thorough
Privacy: fully local

faster-whisper

Best for: multilingual on constrained hardware, or CPU-only setups needing Whisper.

faster-whisper is a highly optimised Whisper implementation that runs large-v3-turbo in ~3.1 GB VRAM (INT8) versus ~6 GB for float16. If you need Whisper's multilingual support but don't have a beefy GPU, faster-whisper closes the gap. It also offers faster CPU inference than whisper.cpp for users without a GPU.

Requires: ~3 GB VRAM (INT8) or CPU with ~8 GB RAM
Languages: 90+ (full Whisper language support)
Latency: 1–3 seconds depending on hardware and model size
Privacy: fully local

A note on GPU support

Most of this ecosystem was built around CUDA. NVIDIA users get a smooth path across every backend listed above. AMD users have a more complicated picture.

ROCm — AMD's GPU compute platform — has improved significantly, but support varies by backend. whisper.cpp works well on modern AMD GPUs via HIP compilation (RDNA2 and newer), and Cohere Transcribe runs cleanly through PyTorch's official ROCm wheels. faster-whisper has no official AMD path; community forks exist but require custom builds. onnx-asr, which runs Parakeet TDT V3, recently saw its ROCm support deprecated in ONNX Runtime — the replacement path (MIGraphX) is not yet verified with Parakeet.

Vulkan backends exist in whisper.cpp but are unstable across hardware and not recommended for production use. For AMD users on older cards or unsupported configurations, CPU inference remains the most reliable option — and with modern processors and optimised runtimes, it's fast enough for comfortable dictation.

Cloud options

When privacy isn't a constraint and you want the fastest possible turnaround — or you're dictating long documents where every word counts — cloud backends are available. Where your audio is processed matters, so the hosting location for each provider is noted below.

Google Gemini — realtime streaming via the Gemini Live API. Low-latency, native audio models with built-in VAD. US-hosted.
OpenAI — the most widely compatible API. US-hosted.
Groq — extremely fast Whisper inference. US-hosted.
Cohere — same leaderboard-leading accuracy as the local model, no hardware requirements. Canada/US-hosted.
ElevenLabs — widely considered the SaaS leader in this space. Exceptional accuracy and speaker handling. US-hosted.
Regolo — European provider running a broad range of models on EU infrastructure. The best choice if data residency within Europe matters to you.

All cloud options require an API key and send audio to a third-party server. The case for keeping it local is worth reading if you're undecided.

Quick reference

Use case	Model
Maximum accuracy	Cohere Transcribe
Fast local English (GPU)	Parakeet TDT V3
Fast local English (CPU)	Parakeet TDT V3 via onnx-asr
Multiple languages	Whisper large-v3
Multilingual, constrained VRAM	faster-whisper
Fastest cloud	Cohere Transcribe
Best European host	Regolo
Custom / self-hosted	Any OpenAI-compatible API

Switching is easy

hyprwhspr supports all of these backends. The setup wizard walks you through the initial choice based on your hardware. Changing later is one config edit and a service restart — no reinstall, no data migration. If you're just getting started, see the full setup guide for a walkthrough of the complete stack.