$ hyprwhspr
System-wide speech‑to‑text.
Local models. Secure Cloud providers. Fully featured. Top
performance.
── Features
Cutting-edge local
Ships with support for Parakeet TDT V3, Cohere Transcribe, and the full Whisper family. On CPU? onnx-asr delivers wild speeds without a GPU. Models stay hot in memory.
GPU-intelligent
Auto-detects NVIDIA CUDA, AMD/Intel Vulkan, or falls back to CPU. Unload the model from VRAM on demand — free resources, then reload instantly without restarting the service.
Private by default
Local inference means nothing leaves your machine. Cloud provider — Gemini, OpenAI, ElevenLabs, and more — credentials are stored securely and never touch config files.
Your voice stays on your machine.
Local inference by default — no data ever leaves without your say-so.
Audio ducking
System volume steps down while you record, back up when you're done.
Four recording modes
Toggle, push-to-talk, auto (tap vs. hold), and long-form with pause and save.
Paste anywhere
Text injects into any active buffer via ydotool. Auto-submit optional.
Themed visualizer
Mic-OSD overlay that auto-matches your Omarchy theme. Looks great.
Waybar tray
Live status indicator: idle, recording, processing, error — all at a glance.
Multi-lingual
Strong performance across many languages. Optional translate-to-English mode.
Text processing
Word overrides, filler word removal, symbol replacements, custom prompts.
WebSocket streaming
Stream in near realtime via Google Gemini, 11Labs, OpenAI or similar services.
Works everywhere
Hyprland, GNOME, KDE Plasma, Sway — any Wayland compositor with systemd.
Free and open source, forever. For the people!
MIT licensed. No subscriptions, no telemetry, no monetization — ever.
── Install
── Writing
Voice Typing on Linux in 2026
Speech-to-text on Linux has been broken for years. Local models and Wayland maturity have finally changed that.
Read → modelsBest Speech-to-Text Models for Linux
Parakeet TDT V3, Whisper, onnx-asr, Cohere — which model fits your hardware and workflow.
Read → opinionDictation is the Future of Programming
The keyboard made sense when code was the output. Now that language is the input, your voice is faster.
Read →── FAQ
Why hyprwhspr over other Linux speech tools?
Model breadth and performance. hyprwhspr supports more local backends than any comparable tool — Cohere Transcribe, pywhispercpp, faster-whisper, onnx-asr, and Parakeet TDT V3 — plus a full range of cloud APIs. The model stays hot in memory, so transcription is instant rather than gated behind a multi-second load on every invocation. It's also the most fully featured: toggle, push-to-talk, and auto modes; Waybar integration; live model unload for VRAM management; and evdev-based hotkeys that work on any Wayland compositor — Hyprland, GNOME, KDE, Sway.
Why Python and not Rust?
The ecosystem is Python. It's the right tool for the job. Every transcription model that matters — Whisper, Parakeet, Cohere Transcribe — runs on PyTorch or ONNX Runtime, both of which are C++ and CUDA at the compute layer with Python as the interface. The actual inference isn't happening in hyprwhspr's code — it's happening in those runtimes. Rewriting the orchestration in Rust would add significant complexity without touching the performance-critical path.
Does it cost anything?
No. The default setup runs a local model — no API key, no account, no ongoing cost. Cloud backends are available for users who want maximum accuracy without local hardware requirements, but they're explicitly opt-in and require you to supply your own API key.
Can I pay you?
If you'd like to give something back, donate to a local wildlife charity — that would be most excellent. If you're set on buying me a coffee, I appreciate that: ko-fi.com/goodroot.
Why no precompiled binary?
By design. The AUR builds from source — that's the philosophy, and it's the right one. A precompiled binary is a black box: no audit trail, no way to verify what you're actually running. For a tool that sits on your microphone and injects text into every application on your system, that's not a tradeoff worth making. Read the code, build it yourself, know what's on your machine.
Is my audio private?
With the default local backend, audio never leaves your machine. Nothing is logged, no account is required, and no third party is involved. Cloud backends send audio to the provider you configure — that tradeoff is documented and the choice is yours.