← hyprwhspr

Best Speech-to-Text Models for Linux in 2026

Parakeet TDT V3, Whisper, Cohere Transcribe — a clear breakdown of which model fits your hardware and workflow.

The model shapes everything

Latency, accuracy, GPU requirements, language support, privacy — all of it flows from your model choice. The good news: in 2026, every tier is genuinely excellent. The question is matching the model to your hardware and workflow.

For an objective view of where models stand, the Open ASR Leaderboard on Hugging Face tracks Word Error Rate (WER) across standardised benchmarks. It's the most reliable public signal for comparing transcription accuracy across models.

Cohere Transcribe

Best for: maximum accuracy, long-form content, technical vocabulary.

Cohere Transcribe currently sits at #1 on the Open ASR Leaderboard — 5.42 average WER versus 7.44 for Whisper large-v3. For developers dictating technical language, API names, and command-line flags, that gap is meaningful.

Cohere Transcribe benchmark results vs Whisper large-v3
BenchmarkCohere TranscribeWhisper large-v3
LibriSpeech clean1.252.7
LibriSpeech other2.375.2
TedLium2.494.2
SPGISpeech3.084.7
VoxPopuli5.879.1
Gigaspeech9.3310.3
Earnings2210.8412.7
Average WER5.427.44

Lower WER is better. Full benchmark details at the Cohere Transcribe release blog.

  • Runs locally — ~4 GB VRAM (bfloat16) or CPU with ~8 GB RAM
  • 14 languages including English, German, French, Spanish, Japanese, Korean
  • Gated model — requires a free HuggingFace token to download
  • Privacy: fully local

Parakeet TDT V3

Best for: fast local English dictation on GPU or CPU.

NVIDIA's Parakeet TDT V3 is state of the art for English transcription. Sub-second latency on a mid-range GPU, accuracy that handles natural speech, technical vocabulary, and fast delivery without breaking a sweat. hyprwhspr runs Parakeet through the onnx-asr runtime, which makes a model that typically requires a large GPU run surprisingly well on CPU with minimal accuracy trade-off.

  • Requires: ~1 GB RAM (CPU) or VRAM (GPU)
  • Languages: English (optimised)
  • Latency: under 500ms on GPU, 1–2 seconds on modern CPU
  • Privacy: fully local

If you're running a local LLM alongside hyprwhspr and need to conserve VRAM, Parakeet on CPU is the right call — you get excellent accuracy without competing for GPU memory.

Whisper family

Best for: multiple languages, or when flexibility matters more than raw speed.

OpenAI's Whisper is the most battle-tested open-source transcription model. It runs at five sizes — tiny through large-v3 — so you can tune the accuracy/speed trade-off to your hardware. Genuinely strong across 90+ languages, and it can translate directly to English from any of them.

  • Requires: GPU for large variants, CPU workable with tiny/base
  • Languages: 90+
  • Latency: varies by size — base.en is fast, large-v3 is thorough
  • Privacy: fully local

faster-whisper

Best for: multilingual on constrained hardware, or CPU-only setups needing Whisper.

faster-whisper is a highly optimised Whisper implementation that runs large-v3-turbo in ~3.1 GB VRAM (INT8) versus ~6 GB for float16. If you need Whisper's multilingual support but don't have a beefy GPU, faster-whisper closes the gap. It also offers faster CPU inference than whisper.cpp for users without a GPU.

  • Requires: ~3 GB VRAM (INT8) or CPU with ~8 GB RAM
  • Languages: 90+ (full Whisper language support)
  • Latency: 1–3 seconds depending on hardware and model size
  • Privacy: fully local

Cloud options

When privacy isn't a constraint and you want the fastest possible turnaround — or you're dictating long documents where every word counts — cloud backends are available. Groq's Whisper inference is extremely fast. OpenAI's API is the most widely compatible. Cohere's cloud API brings the same leaderboard-leading accuracy without local hardware requirements.

All cloud options require an API key and send audio to a third-party server. The case for keeping it local is worth reading if you're undecided.

Quick reference

Use caseModel
Maximum accuracyCohere Transcribe
Fast local English (GPU)Parakeet TDT V3
Fast local English (CPU)Parakeet TDT V3 via onnx-asr
Multiple languagesWhisper large-v3
Multilingual, constrained VRAMfaster-whisper
Fastest cloudCohere Transcribe
Custom / self-hostedAny OpenAI-compatible API

Switching is easy

hyprwhspr supports all of these backends. The setup wizard walks you through the initial choice based on your hardware. Changing later is one config edit and a service restart — no reinstall, no data migration. If you're just getting started, see the full setup guide for a walkthrough of the complete stack.

Ready to try it? Free, open source, runs on any Linux with Wayland.
Install hyprwhspr →