Back to Nodes
Resemble AI Chatterbox

Resemble AI Chatterbox

Official

Generate expressive, natural speech with emotion control and voice cloning.

Nodespell AI
AI / Audio / Resemble Ai

Generate expressive, natural speech with emotion control and voice cloning.

Model Overview

Chatterbox is a production-grade, open-source Text-to-Speech (TTS) model that generates expressive and natural-sounding speech. It stands out with its unique emotion control capabilities and the ability to perform instant voice cloning from short audio samples. It also features built-in watermarking for responsible AI.

Best At

  • Generating high-quality, natural-sounding speech from text.
  • Voice cloning from short audio samples for a personalized touch.
  • Fine-tuning speech expressiveness through emotion and exaggeration controls.
  • Applications requiring natural voiceovers like memes, videos, games, and AI agents.

Limitations / Not Good At

  • While powerful, extreme values for exaggeration can lead to unstable results.
  • Not designed for generating music or sound effects.

Ideal Use Cases

  • Creating voiceovers for videos and presentations.
  • Developing AI agents and chatbots with natural conversational voices.
  • Generating audio content for games and interactive media.
  • Rapid prototyping of voice applications using voice cloning.
  • Podcasting and audiobook narration.

Input & Output Format

  • Input: Text prompt (string), optional audio prompt (string URI for voice cloning), and various numerical parameters for fine-tuning (exaggeration, cfg_weight, temperature, seed).
  • Output: Synthesized speech as an audio file (string URI).

Performance Notes

  • Offers ultra-low latency (sub 200ms) for production use.
  • Outputs are watermarked using Resemble AI's Perth (Perceptual Threshold) Watermarker, which is robust against audio editing and compression.

Model Examples (3)

Example Index01 / 03
Example 01

Gallery guide continuation

Voice-clone continuation from a calm exhibition guide sample.

Source Inputs02
Prompt

On this wall are the lighting references for the storm sequence. Each blue-gray pass was timed to keep the actors readable without losing the feeling of incoming weather.

Audio Prompt
Example input
Parameters04
Prompt
On this wall are the lighting references for the storm sequence. Each blue-gray pass was timed to keep the actors readable without losing the feeling of incoming weather.
Cfg Weight
0.5
Temperature
0.7
Exaggeration
0.45
ttsvoice-clone
Response
Inputs (2)

Prompt

String

Text to synthesize

Multi InputMin: 0Max: 100

Audio Prompt

String

Path to the reference audio file (Optional)

Min: 0Max: 100
Parameters (5)

Seed

Number

Seed (0 for random)

Default: -1

Prompt

String

Text to synthesize

Default:

CFG Weight

Number

CFG/Pace weight

Default: 0.5

Temperature

Number

Temperature

Default: 0.8

Exaggeration

Number

Exaggeration (Neutral = 0.5, extreme values can be unstable)

Default: 0.5
Outputs (1)

Output

Inferred

Output

Nodespell

Nodespell

London

Building the future. Join us!

Creator profile

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Audio / Resemble Ai

Input

TextAudio

Output

Audio

Keywords

Text To SpeechVoice CloningLength Control
Use in Workflow