Linux · Wayland

$ hyprwhspr

System-wide speech‑to‑text.
Local models. Secure cloud providers. Fully featured. Top performance.

hyprwhspr ● idle ⏺ recording ◌ processing ✓ done
❯  Open a pull request for the feature branch.
❯  Write unit tests for the auth module.
❯  Deploy to staging and monitor the logs.

── Features

Cutting-edge local

Ships with support for Parakeet TDT V3, Cohere Transcribe, and the full Whisper family. On CPU? onnx-asr delivers wild speeds without a GPU. Models stay hot in memory.

GPU-intelligent

Auto-detects NVIDIA CUDA, AMD/Intel Vulkan, or falls back to CPU. Unload the model from VRAM on demand — free resources, then reload instantly without restarting the service.

Private by default

Local inference means nothing leaves your machine. Cloud provider credentials — Gemini, OpenAI, ElevenLabs, and more — are stored securely and never touch config files.

Your voice stays on your machine.

Local inference by default — no data ever leaves without your say-so.

Privacy policy ↗
🦆

Audio ducking

System volume steps down while you record, back up when you're done.

Five recording modes

Toggle, push-to-talk, auto (tap vs. hold), continuous (auto-flush on silence), and long-form with pause and save.

Paste anywhere

Injects text into any active buffer via wtype or ydotool. Per-app paste rules and optional auto-submit.

Themed visualizer

Mic OSD that auto-matches your Omarchy theme — an animated overlay on Hyprland, Sway, and KDE, or desktop notifications on GNOME. Looks great.

Waybar tray

Live status indicator: idle, recording, processing, error — all at a glance.

Multilingual

Strong performance across many languages. Optional translate-to-English mode.

Text processing

Word overrides, filler word removal, symbol replacements, custom prompts.

WebSocket streaming

Stream in near realtime via Google Gemini, ElevenLabs, OpenAI, or similar services.

Works everywhere

Hyprland, GNOME, KDE Plasma, Sway — any Wayland compositor with systemd.

Self-healing

Recovers automatically from suspend/resume, USB mic unplug, and keyboard hotplug — and warns when the mic is muted.

Post-transcription hook

Pipe every transcription through your own shell command before it pastes — reformat, route, or log it.

$

Scriptable control

CLI start/stop/toggle/cancel — wire it to KDE, GNOME, sxhkd, or Espanso, beyond the built-in evdev hotkeys.

Free and open source, forever. For the people!

MIT licensed. No subscriptions, no telemetry, no monetization — ever.

MIT License ↗

── Install

Arch Linux
# stable
$ yay -S hyprwhspr
# bleeding edge
$ yay -S hyprwhspr-git
Debian · Ubuntu · Fedora · openSUSE
# clone
$ git clone https://github.com/goodroot/hyprwhspr.git ~/hyprwhspr
# install deps, then set up
$ cd ~/hyprwhspr && ./scripts/install-deps.sh
$ ./bin/hyprwhspr setup

── FAQ

Why hyprwhspr over other Linux speech tools?

Pair it with a capable GPU and this is the absolute top end of Linux dictation — and it's excellent without one, too.

hyprwhspr is built for accuracy and speed: it runs the top local models — Parakeet TDT V3, Cohere Transcribe, the full Whisper family, faster-whisper, onnx-asr — held hot in memory, so transcription is instant rather than gated behind a multi-second load. GPU acceleration (NVIDIA CUDA, AMD/Intel Vulkan) is auto-detected; on CPU, onnx-asr is genuinely fast and accurate — a first-class path, not a fallback.

It supports more local backends than any comparable tool, plus a full range of cloud APIs, and it's well featured: five recording modes, per-app paste rules, Waybar integration, live model unload for VRAM, and evdev hotkeys on any Wayland compositor — Hyprland, GNOME, KDE, Sway. Actively maintained.

Why Python and not Rust?

The ecosystem is Python. It's the right tool for the job. Every transcription model that matters — Whisper, Parakeet, Cohere Transcribe — runs on PyTorch or ONNX Runtime, both of which are C++ and CUDA at the compute layer with Python as the interface. The actual inference isn't happening in hyprwhspr's code — it's happening in those runtimes. Rewriting the orchestration in Rust would add significant complexity without touching the performance-critical path.

Does it cost anything?

No. The default setup runs a local model — no API key, no account, no ongoing cost. Cloud backends are available for users who want maximum accuracy without local hardware requirements, but they're explicitly opt-in and require you to supply your own API key.

Can I pay you?

If you'd like to give something back, donate to a local wildlife charity — that would be most excellent. If you're set on buying me a coffee, I appreciate that: ko-fi.com/goodroot.

Why no precompiled binary?

By design. The AUR builds from source — that's the philosophy, and it's the right one. A precompiled binary is a black box: no audit trail, no way to verify what you're actually running. For a tool that sits on your microphone and injects text into every application on your system, that's not a tradeoff worth making. Read the code, build it yourself, know what's on your machine.

Is my audio private?

With the default local backend, audio never leaves your machine. Nothing is logged, no account is required, and no third party is involved. Cloud backends send audio to the provider you configure — that tradeoff is documented and the choice is yours.