Resemble AI Chatterbox

Official

Generate expressive, natural speech with emotion control and voice cloning.

Nodespell AI

AI / Audio / Resemble Ai

Generate expressive, natural speech with emotion control and voice cloning.

Model Overview

Chatterbox is a production-grade, open-source Text-to-Speech (TTS) model that generates expressive and natural-sounding speech. It stands out with its unique emotion control capabilities and the ability to perform instant voice cloning from short audio samples. It also features built-in watermarking for responsible AI.

Best At

Generating high-quality, natural-sounding speech from text.
Voice cloning from short audio samples for a personalized touch.
Fine-tuning speech expressiveness through emotion and exaggeration controls.
Applications requiring natural voiceovers like memes, videos, games, and AI agents.

Limitations / Not Good At

While powerful, extreme values for exaggeration can lead to unstable results.
Not designed for generating music or sound effects.

Ideal Use Cases

Creating voiceovers for videos and presentations.
Developing AI agents and chatbots with natural conversational voices.
Generating audio content for games and interactive media.
Rapid prototyping of voice applications using voice cloning.
Podcasting and audiobook narration.

Input & Output Format

Input: Text prompt (string), optional audio prompt (string URI for voice cloning), and various numerical parameters for fine-tuning (exaggeration, cfg_weight, temperature, seed).
Output: Synthesized speech as an audio file (string URI).

Performance Notes

Offers ultra-low latency (sub 200ms) for production use.
Outputs are watermarked using Resemble AI's Perth (Perceptual Threshold) Watermarker, which is robust against audio editing and compression.

Model Examples (3)

Example Index01 / 03

Example 01

Gallery guide continuation

Voice-clone continuation from a calm exhibition guide sample.

Open

Source Inputs02

Prompt

On this wall are the lighting references for the storm sequence. Each blue-gray pass was timed to keep the actors readable without losing the feeling of incoming weather.

Audio Prompt

Parameters04

Prompt

On this wall are the lighting references for the storm sequence. Each blue-gray pass was timed to keep the actors readable without losing the feeling of incoming weather.

Cfg Weight

0.5

Temperature

0.7

Exaggeration

0.45

ttsvoice-clone

Response

Inputs (2)

Prompt

String

Text to synthesize

Multi InputMin: 0Max: 100

Audio Prompt

String

Path to the reference audio file (Optional)

Min: 0Max: 100

Parameters (5)

Seed

Number

Seed (0 for random)

Default: -1

Prompt

String

Text to synthesize

Default:

CFG Weight

Number

CFG/Pace weight

Default: 0.5

Temperature

Number

Temperature

Default: 0.8

Exaggeration

Number

Exaggeration (Neutral = 0.5, extreme values can be unstable)

Default: 0.5

Outputs (1)

Output

Inferred

Output

Nodespell

London

Building the future. Join us!

nodespell.com nodespell.app NodespellAI

Creator profile

Type

Node

Status

Official

Package

Nodespell AI

Keywords

Text To SpeechVoice CloningLength Control

Use in Workflow