Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Bytecore News
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Bytecore News
    Home»AI News»Meet ‘Kani-TTS-2’: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support
    Meet 'Kani-TTS-2': A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support
    AI News

    Meet ‘Kani-TTS-2’: A 400M Param Open Source Text-to-Speech Model that Runs in 3GB VRAM with Voice Cloning Support

    February 15, 20264 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    binance






    The landscape of generative audio is shifting toward efficiency. A new open-source contender, Kani-TTS-2, has been released by the team at nineninesix.ai. This model marks a departure from heavy, compute-expensive TTS systems. Instead, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint.

    Kani-TTS-2 offers a lean, high-performance alternative to closed-source APIs. It is currently available on Hugging Face in both English (EN) and Portuguese (PT) versions.

    The Architecture: LFM2 and NanoCodec

    Kani-TTS-2 follows the ‘Audio-as-Language‘ philosophy. The model does not use traditional mel-spectrogram pipelines. Instead, it converts raw audio into discrete tokens using a neural codec.

    synthesia

    The system relies on a two-stage process:

  • The Language Backbone: The model is built on LiquidAI’s LFM2 (350M) architecture. This backbone generates ‘audio intent’ by predicting the next audio tokens. Because LFM (Liquid Foundation Models) are designed for efficiency, they provide a faster alternative to standard transformers.
  • The Neural Codec: It uses the NVIDIA NanoCodec to turn those tokens into 22kHz waveforms.
  • By using this architecture, the model captures human-like prosody—the rhythm and intonation of speech—without the ‘robotic’ artifacts found in older TTS systems.

    Efficiency: 10,000 Hours in 6 Hours

    The training metrics for Kani-TTS-2 are a masterclass in optimization. The English model was trained on 10,000 hours of high-quality speech data.

    While that scale is impressive, the speed of training is the real story. The research team trained the model in only 6 hours using a cluster of 8 NVIDIA H100 GPUs. This proves that massive datasets no longer require weeks of compute time when paired with efficient architectures like LFM2.

    Zero-Shot Voice Cloning and Performance

    The standout feature for developers is zero-shot voice cloning. Unlike traditional models that require fine-tuning for new voices, Kani-TTS-2 uses speaker embeddings.

    • How it works: You provide a short reference audio clip.
    • The result: The model extracts the unique characteristics of that voice and applies them to the generated text instantly.

    From a deployment perspective, the model is highly accessible:

    • Parameter Count: 400M (0.4B) parameters.
    • Speed: It features a Real-Time Factor (RTF) of 0.2. This means it can generate 10 seconds of speech in roughly 2 seconds.
    • Hardware: It requires only 3GB of VRAM, making it compatible with consumer-grade GPUs like the RTX 3060 or 4050.
    • License: Released under the Apache 2.0 license, allowing for commercial use.

    Key Takeaways

    • Efficient Architecture: The model uses a 400M parameter backbone based on LiquidAI’s LFM2 (350M). This ‘Audio-as-Language’ approach treats speech as discrete tokens, allowing for faster processing and more human-like intonation compared to traditional architectures.
    • Rapid Training at Scale: Kani-TTS-2-EN was trained on 10,000 hours of high-quality speech data in just 6 hours using 8 NVIDIA H100 GPUs.
    • Instant Zero-Shot Cloning: There is no need for fine-tuning to replicate a specific voice. By providing a short reference audio clip, the model uses speaker embeddings to instantly synthesize text in the target speaker’s voice.
    • High Performance on Edge Hardware: With a Real-Time Factor (RTF) of 0.2, the model can generate 10 seconds of audio in approximately 2 seconds. It requires only 3GB of VRAM, making it fully functional on consumer-grade GPUs like the RTX 3060.
    • Developer-Friendly Licensing: Released under the Apache 2.0 license, Kani-TTS-2 is ready for commercial integration. It offers a local-first, low-latency alternative to expensive closed-source TTS APIs.

    Check out the Model Weight. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

    Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.







    Previous articleGetting Started with OpenClaw and Connecting It with WhatsApp




    Source link

    murf
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    CryptoExpert
    • Website

    Related Posts

    Inside the AI agent playbook driving enterprise margin gains

    April 1, 2026

    How to Build and Evolve a Custom OpenAI Agent with A-Evolve Using Benchmarks, Skills, Memory, and Workspace Mutations

    March 31, 2026

    MIT researchers use AI to uncover atomic defects in materials | MIT News

    March 30, 2026

    When product managers ship code: AI just broke the software org chart

    March 29, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    synthesia
    Latest Posts

    Inside the AI agent playbook driving enterprise margin gains

    April 1, 2026

    Bitcoin Below $54K Would Signal Best Accumulation Zone: Analyst

    April 1, 2026

    5 EASIEST Ways to Make Money With AI (No One is Doing This)

    April 1, 2026

    FREE AI Tools To Create Videos & Images 😳🔥 (Full Beginner Tutorial 2026)

    April 1, 2026

    Crypto-Revenge ‘On Demand’ – Why Are Rogue Groups Taking Justice On Their Own Hands?

    April 1, 2026
    kraken
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    SOL price stalls below key resistance even as Solana’s fundamentals surge

    April 1, 2026

    Iran threat to 18 U.S. firms opens a new risk front for crypto

    April 1, 2026
    bybit
    Facebook X (Twitter) Instagram Pinterest
    © 2026 BytecoreNews.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.