Back to Nodes
Resemble AI Chatterbox
Official
Generate expressive, natural speech with emotion control and voice cloning.
Nodespell AI
AI / Audio / Resemble Ai
Generate expressive, natural speech with emotion control and voice cloning.
Model Overview
Chatterbox is a production-grade, open-source Text-to-Speech (TTS) model that generates expressive and natural-sounding speech. It stands out with its unique emotion control capabilities and the ability to perform instant voice cloning from short audio samples. It also features built-in watermarking for responsible AI.
Best At
- Generating high-quality, natural-sounding speech from text.
- Voice cloning from short audio samples for a personalized touch.
- Fine-tuning speech expressiveness through emotion and exaggeration controls.
- Applications requiring natural voiceovers like memes, videos, games, and AI agents.
Limitations / Not Good At
- While powerful, extreme values for exaggeration can lead to unstable results.
- Not designed for generating music or sound effects.
Ideal Use Cases
- Creating voiceovers for videos and presentations.
- Developing AI agents and chatbots with natural conversational voices.
- Generating audio content for games and interactive media.
- Rapid prototyping of voice applications using voice cloning.
- Podcasting and audiobook narration.
Input & Output Format
- Input: Text prompt (string), optional audio prompt (string URI for voice cloning), and various numerical parameters for fine-tuning (exaggeration, cfg_weight, temperature, seed).
- Output: Synthesized speech as an audio file (string URI).
Performance Notes
- Offers ultra-low latency (sub 200ms) for production use.
- Outputs are watermarked using Resemble AI's Perth (Perceptual Threshold) Watermarker, which is robust against audio editing and compression.
Inputs (2)
Prompt
StringText to synthesize
Multi InputMin: 0Max: 100
Audio Prompt
StringPath to the reference audio file (Optional)
Min: 0Max: 100
Parameters (5)
Seed
NumberSeed (0 for random)
Default:
-1Prompt
StringText to synthesize
Default:
CFG Weight
NumberCFG/Pace weight
Default:
0.5Temperature
NumberTemperature
Default:
0.8Exaggeration
NumberExaggeration (Neutral = 0.5, extreme values can be unstable)
Default:
0.5Outputs (1)
Output
InferredOutput
Type
Node
Status
Official
Package
Nodespell AI
Category
AI / Audio / Resemble AiInput
TextAudio
Output
Audio
Keywords
Text To SpeechVoice CloningLength Control