Back to Nodes
Resemble AI Chatterbox

Resemble AI Chatterbox

Official

Generate expressive, natural speech with emotion control and voice cloning.

Nodespell AI
AI / Audio / Resemble Ai

Generate expressive, natural speech with emotion control and voice cloning.

Model Overview

Chatterbox is a production-grade, open-source Text-to-Speech (TTS) model that generates expressive and natural-sounding speech. It stands out with its unique emotion control capabilities and the ability to perform instant voice cloning from short audio samples. It also features built-in watermarking for responsible AI.

Best At

  • Generating high-quality, natural-sounding speech from text.
  • Voice cloning from short audio samples for a personalized touch.
  • Fine-tuning speech expressiveness through emotion and exaggeration controls.
  • Applications requiring natural voiceovers like memes, videos, games, and AI agents.

Limitations / Not Good At

  • While powerful, extreme values for exaggeration can lead to unstable results.
  • Not designed for generating music or sound effects.

Ideal Use Cases

  • Creating voiceovers for videos and presentations.
  • Developing AI agents and chatbots with natural conversational voices.
  • Generating audio content for games and interactive media.
  • Rapid prototyping of voice applications using voice cloning.
  • Podcasting and audiobook narration.

Input & Output Format

  • Input: Text prompt (string), optional audio prompt (string URI for voice cloning), and various numerical parameters for fine-tuning (exaggeration, cfg_weight, temperature, seed).
  • Output: Synthesized speech as an audio file (string URI).

Performance Notes

  • Offers ultra-low latency (sub 200ms) for production use.
  • Outputs are watermarked using Resemble AI's Perth (Perceptual Threshold) Watermarker, which is robust against audio editing and compression.
Inputs (2)

Prompt

String

Text to synthesize

Multi InputMin: 0Max: 100

Audio Prompt

String

Path to the reference audio file (Optional)

Min: 0Max: 100
Parameters (5)

Seed

Number

Seed (0 for random)

Default: -1

Prompt

String

Text to synthesize

Default:

CFG Weight

Number

CFG/Pace weight

Default: 0.5

Temperature

Number

Temperature

Default: 0.8

Exaggeration

Number

Exaggeration (Neutral = 0.5, extreme values can be unstable)

Default: 0.5
Outputs (1)

Output

Inferred

Output

Nodespell

Nodespell

📍 London

Building the future. Join us!

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Audio / Resemble Ai

Input

TextAudio

Output

Audio

Keywords

Text To SpeechVoice CloningLength Control
Use in Workflow