Linux · Wayland

$ hyprwhspr

System-wide speech‑to‑text.
Local models. Secure Cloud providers. Fully featured. Top performance.

hyprwhspr ● idle ⏺ recording ◌ processing ✓ done
❯  Open a pull request for the feature branch.
❯  Write unit tests for the auth module.
❯  Deploy to staging and monitor the logs.

── Features

Cutting-edge local

Ships with support for Parakeet TDT V3, Cohere Transcribe, and the full Whisper family. On CPU? onnx-asr delivers wild speeds without a GPU. Models stay hot in memory.

GPU-intelligent

Auto-detects NVIDIA CUDA, AMD/Intel Vulkan, or falls back to CPU. Unload the model from VRAM on demand — free resources, then reload instantly without restarting the service.

Private by default

Local inference means nothing leaves your machine. Cloud provider — Gemini, OpenAI, ElevenLabs, and more — credentials are stored securely and never touch config files.

Your voice stays on your machine.

Local inference by default — no data ever leaves without your say-so.

Privacy policy ↗
🦆

Audio ducking

System volume steps down while you record, back up when you're done.

Four recording modes

Toggle, push-to-talk, auto (tap vs. hold), and long-form with pause and save.

Paste anywhere

Text injects into any active buffer via ydotool. Auto-submit optional.

Themed visualizer

Mic-OSD overlay that auto-matches your Omarchy theme. Looks great.

Waybar tray

Live status indicator: idle, recording, processing, error — all at a glance.

Multi-lingual

Strong performance across many languages. Optional translate-to-English mode.

Text processing

Word overrides, filler word removal, symbol replacements, custom prompts.

WebSocket streaming

Stream in near realtime via Google Gemini, 11Labs, OpenAI or similar services.

Works everywhere

Hyprland, GNOME, KDE Plasma, Sway — any Wayland compositor with systemd.

Free and open source, forever. For the people!

MIT licensed. No subscriptions, no telemetry, no monetization — ever.

MIT License ↗

── Install

Arch Linux
# stable
$ yay -S hyprwhspr
# bleeding edge
$ yay -S hyprwhspr-git
Debian · Ubuntu · Fedora · openSUSE
# clone
$ git clone https://github.com/goodroot/hyprwhspr.git ~/hyprwhspr
# install deps, then set up
$ cd ~/hyprwhspr && ./scripts/install-deps.sh
$ ./bin/hyprwhspr setup

── FAQ

Why hyprwhspr over other Linux speech tools?

Model breadth and performance. hyprwhspr supports more local backends than any comparable tool — Cohere Transcribe, pywhispercpp, faster-whisper, onnx-asr, and Parakeet TDT V3 — plus a full range of cloud APIs. The model stays hot in memory, so transcription is instant rather than gated behind a multi-second load on every invocation. It's also the most fully featured: toggle, push-to-talk, and auto modes; Waybar integration; live model unload for VRAM management; and evdev-based hotkeys that work on any Wayland compositor — Hyprland, GNOME, KDE, Sway.

Why Python and not Rust?

The ecosystem is Python. It's the right tool for the job. Every transcription model that matters — Whisper, Parakeet, Cohere Transcribe — runs on PyTorch or ONNX Runtime, both of which are C++ and CUDA at the compute layer with Python as the interface. The actual inference isn't happening in hyprwhspr's code — it's happening in those runtimes. Rewriting the orchestration in Rust would add significant complexity without touching the performance-critical path.

Does it cost anything?

No. The default setup runs a local model — no API key, no account, no ongoing cost. Cloud backends are available for users who want maximum accuracy without local hardware requirements, but they're explicitly opt-in and require you to supply your own API key.

Can I pay you?

If you'd like to give something back, donate to a local wildlife charity — that would be most excellent. If you're set on buying me a coffee, I appreciate that: ko-fi.com/goodroot.

Why no precompiled binary?

By design. The AUR builds from source — that's the philosophy, and it's the right one. A precompiled binary is a black box: no audit trail, no way to verify what you're actually running. For a tool that sits on your microphone and injects text into every application on your system, that's not a tradeoff worth making. Read the code, build it yourself, know what's on your machine.

Is my audio private?

With the default local backend, audio never leaves your machine. Nothing is logged, no account is required, and no third party is involved. Cloud backends send audio to the provider you configure — that tradeoff is documented and the choice is yours.