Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

OmniVoice Studio — How to Use It
01 / 08

What Is OmniVoice Studio?

OmniVoice Studio is an open-source desktop application for voice cloning, video dubbing, real-time dictation, and speaker diarization. Everything runs locally on your machine. No API keys, no cloud account, no subscription required.

646 languages supported for TTS via the default OmniVoice engine
99 languages for transcription via WhisperX
Available on macOS, Windows, and Linux
GPU is optional — full pipeline runs on CPU
Free for personal, educational, and research use (FSL-1.1-ALv2)

OmniVoice Studio — How to Use It
02 / 08

System Requirements

A GPU is optional. Without one, TTS runs approximately 3× slower on CPU. With ≤8 GB VRAM, TTS automatically offloads to CPU during transcription — no config needed.

ComponentMinimumRecommended

OSWin 10 / macOS 12+ / Ubuntu 20.04+Any modern 64-bit OS
RAM8 GB16 GB+
VRAM4 GB (auto-offloads)8 GB+ (RTX 3060+)
Disk10 GB free20 GB+ SSD
Python3.10+3.11–3.12
GPUOptionalCUDA / MPS / ROCm

OmniVoice Studio — How to Use It
03 / 08

Installation

The project recommends running from source. Install three prerequisites first: ffmpeg, Bun (JS runtime), and uv (Python package manager).

git clone https://github.com/debpalash/OmniVoice-Studio.git
cd OmniVoice-Studio
uv sync
bun install
bun dev

Frontend loads at http://localhost:5173 | API runs on port 8000.Model weights download automatically on first generation.

Pre-built installers available: macOS DMG, Windows MSI, Linux AppImage and .deb — see the Releases page on GitHub.

OmniVoice Studio — How to Use It
04 / 08

Voice Cloning

Voice cloning uses zero-shot learning — it clones a voice from a clip as short as 3 seconds, without prior training on that voice. The default OmniVoice engine conditions a diffusion-based TTS model on the reference audio.

Go to the Voice Clone tab in the UI
Upload or record a 3-second audio clip of the target voice
Enter your text and select a target language (646 available)
Click Generate — output is saved to your project library

Voice Gallery: Search YouTube, browse categories, and download reference clips directly inside the app to build your voice library.

OmniVoice Studio — How to Use It
05 / 08

Video Dubbing

The full dubbing pipeline runs locally: transcribe → translate → synthesize → mux. Demucs isolates vocals so the original background audio is preserved in the final export.

Go to the Dub tab — paste a YouTube URL or upload a local file
WhisperX transcribes speech with word-level alignment
Select a target language; translation runs automatically
TTS engine re-voices the transcript; Demucs preserves background audio
Export the final MP4 with dubbed audio mixed in

Batch Queue: Drop up to 50 videos and walk away. Each job has its own progress bar tracking through the full pipeline.

OmniVoice Studio — How to Use It
06 / 08

Dictation & Speaker Diarization

Dictation works system-wide from any application. Diarization identifies individual speakers in a multi-speaker audio file using Pyannote + WhisperX.

Press ⌘+⇧+Space (macOS) to open the floating dictation widget
Speech streams via WebSocket and auto-pastes into the active input field
Upload a multi-speaker file to the Diarization tab
Pyannote identifies who said what; each speaker gets an auto-extracted voice profile
Assign a TTS voice per speaker for per-speaker dubbing

Hugging Face token required for Pyannote diarization. See docs/setup/huggingface-token.md in the repo.

OmniVoice Studio — How to Use It
07 / 08

TTS Engines

Six TTS engines are built in. Switch via Settings → TTS Engine or the env var:OMNIVOICE_TTS_BACKEND=cosyvoice

EngineLanguagesClonePlatform

OmniVoice (default)600+✓CUDA / MPS / CPU
CosyVoice 39 + 18 dialects✓CUDA / MPS / CPU
MLX-AudioMultiVariesApple Silicon only
VoxCPM230✓CUDA / MPS / CPU
MOSS-TTS-Nano20✓CUDA / CPU
KittenTTSEnglish✗CPU only

Custom engine: Subclass TTSBackend in backend/services/tts_backend.py and add it to _REGISTRY. ~50 lines of Python.

OmniVoice Studio — How to Use It
08 / 08

MCP Server & Resources

OmniVoice Studio ships a built-in MCP Server, exposing voice and dubbing capabilities to any MCP-compatible client — Claude, Cursor, or your own tooling — without opening the desktop UI.

MCP Server starts alongside the FastAPI backend on bun dev
Point your MCP client at the local server to access all endpoints
AudioSeal (Meta) embeds an invisible neural watermark in all generated audio for AI provenance

GitHub: github.com/debpalash/OmniVoice-Studio
Install docs: docs/install/ (macos / windows / linux / docker)
Troubleshooting: docs/install/troubleshooting.md
Discord: discord.gg/bzQavDfVV9

Source link

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

Jinhua Zhao named head of the Department of Urban Studies and Planning | MIT News

NanoClaw and JFrog launch 'immune system' to block AI agents from downloading malicious code

Visa ChatGPT integration enables AI agent retail purchasing

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation

2 Canadian Growth Stocks Worth Adding to a TFSA This Year

Jinhua Zhao named head of the Department of Urban Studies and Planning | MIT News

TOP 7 AI CERTIFICATIONS THAT CAN MAKE YOU RICH IN 2026

AI Was a Mistake

Saylor Says Bitcoin Sales Are Necessary for Strategy’s Digital Credit Business

Top Insights

The Bitcoin 400-Day Cycle: Historical Performance Shows How Low The Bottom Goes

Morpho’s $175M DeFi Round Tests Onchain Credit’s Future

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

Related Posts