Voice & speech AI — what's accelerating
This is a smaller, more honest edition. The voice-and-speech bucket is thin, and keyword matching swept in several repos where voice is a side feature, not the point. Below are the ones genuinely built for synthesizing, cloning, and generating speech — the rest are named and set aside.
Top mover
"The open-source AI voice studio. Clone, dictate, create." A full local voice workspace built on Qwen3-TTS with CUDA and MLX backends — cloning, dictation, and generation in one app rather than a bare model checkpoint. The studio framing is why it's the fastest mover here: it's usable end-to-end, not just weights.
---
The speech stack
Microsoft's "open-source frontier voice AI." Backing from a major lab is the signal — it pulls open speech synthesis toward the quality bar previously held by closed APIs, and the star base reflects that trust.
A tokenizer-free TTS model for multilingual speech generation, creative voice design, and true-to-life cloning. It surfaced in the image/video bucket by accident, but it's squarely a speech model — a strong one — so it earns a place here instead.
---
Voice as a feature, not the point
These climbed fast but aren't voice/speech tools — voice is a bolt-on, so they're named and set aside rather than ranked:
- Alishahryar1/free-claude-code — ⭐32,687 · ↑255.4/day. A free Claude-Code access wrapper that happens to support voice; it's a coding-agent client, not speech AI. - hugohe3/ppt-master — ⭐24,748 · ↑139.0/day. Generates editable PowerPoint with voiced speaker notes — a slide tool with TTS attached, not a voice engine. - slopus/happy — ⭐21,644 · ↑67.0/day. A mobile/web client for Codex and Claude Code with realtime voice as one feature among many. - mudler/LocalAI — ⭐46,705 · ↑39.7/day. A general local-inference engine that runs voice among LLMs, vision, and image — capable, but voice isn't its focus. - blakeblackshear/frigate — ⭐33,560 · ↑12.5/day. An NVR for camera object detection. No audio, no speech — pure keyword false-positive, dropped.
---
How this was made
Live GitHub pull, bucketed by voice/speech keywords, each repo verified not-archived and pushed recently, ranked by stars/day, then curated hard for fit. The bucket was small and noisy, so rather than pad it, mismatches were named and set aside and one mis-filed speech model was pulled in from a neighboring bucket. Star counts pulled at publish; they move daily, so re-verify before reposting.
Accelbrief · catch acceleration, not stars · all editions