Builds·15 min read·Apr 28, 2026

VoxCPM2: Open-Source TTS, Voice Cloning & Voice Design

A free, open-source 2B parameter TTS model. Clone voices, design voices from text, generate studio-quality audio. No API key.

VoxCPM2 is a free, open-source text-to-speech modelfrom OpenBMB. It's a 2B parameter model trained on 2 million+ hours of multilingual audio, and it can clone voices, design voices from text descriptions, and generate 48kHz studio-quality audio. No API key. No monthly fee. Runs on your own machine.

What Makes VoxCPM2 Different

Most TTS systems work by breaking speech into tokens (discrete chunks), which limits naturalness. VoxCPM2 skips that entirely. It uses a tokenizer-free diffusion architecture that generates speech directly in a continuous latent space.

  • 2B parameters: bigger than most open-source TTS models
  • 30 languages: no language tag needed, it auto-detects
  • 48kHz output: studio quality, no external upsampler
  • ~8GB VRAM
  • Apache 2.0 license: fully open, commercial use allowed

Requirements

  • Python 3.10 to 3.12 (not 3.13, it's not supported yet)
  • PyTorch 2.5.0+
  • CUDA 12.0+ (you need an NVIDIA GPU; CPU inference is very slow)
  • ~8GB VRAM
  • ~10GB disk space for the model weights

Installation

Install the package with pip:

pip install voxcpm

That's it. The model weights download automatically from HuggingFace the first time you run it (~8GB, so give it a few minutes on first load).

This is a preview. The full guide continues inside.

The complete version includes everything above plus:

Plus 12 other full guides on agent builds, MCP setups, and Claude workflows. All free inside.

  • Feature 1: Basic text-to-speech (code, parameters, output)
  • Feature 2: Voice design from plain English (no audio needed)
  • Feature 3: Voice cloning (controllable + ultimate modes)
  • Feature 4: Fine-tuning with LoRA
  • Running the web demo
  • Quick reference: model versions comparison
Join My Skool (Free)