Back to Nodes
Minimax Speech 02 Turbo

Minimax Speech 02 Turbo

Official

Real-time Text-to-Audio synthesis with emotional expression and multilingual support.

Nodespell AI
AI / Audio / Minimax

Real-time Text-to-Audio synthesis with emotional expression and multilingual support.

Model Overview

A powerful Text-to-Audio (T2A) model designed for real-time applications, offering high-quality voice synthesis, a wide range of emotional expressions, and extensive multilingual capabilities.

Best At

This model excels at generating speech for real-time applications where low latency is crucial. It's also highly capable in producing varied emotional tones and supporting over 30 languages with native accents.

Limitations / Not Good At

While optimized for speed, the 'turbo' version might not offer the absolute highest fidelity compared to specialized high-definition models for applications like audiobooks. Extensive character counts (up to 5000) might introduce slightly more latency.

Ideal Use Cases

  • Real-time voice assistants and chatbots 🤖
  • Dynamic character voices for games 🎮
  • Instantaneous audio feedback in applications
  • Live narration for streams or events
  • Multilingual customer support audio

Input & Output Format

Text prompt → Audio file (URI)

Performance Notes

Designed for low latency, making it ideal for real-time interactions. Offers controls for speed, pitch, volume, and emotion to fine-tune the output.

Inputs (1)

Text

String

Text to convert to speech

Multi InputMin: 0Max: 100
Parameters (11)

Text

String

Text to convert to speech. Every character is 1 token. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).

Default:

Pitch

Number

Speech pitch

Default: 0

Speed

Number

Speech speed

Default: 1

Volume

Number

Speech volume

Default: 1

Bitrate

Number

Bitrate for the generated speech

Default: 128000

Channel

String

Number of audio channels

Default: mono

Emotion

String

Speech emotion

Default: auto

Voice Id

String

Desired voice ID. Use a voice ID you have trained (https://replicate.com/minimax/voice-cloning), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl

Default: Wise_Woman

Sample Rate

Number

Sample rate for the generated speech

Default: 32000

Language Boost

String

Enhance recognition of specific languages and dialects

Default: None

English Normalization

Boolean

Enable English text normalization for better number reading (slightly increases latency)

Default: false
Outputs (1)

Output

Inferred

Output

Nodespell

Nodespell

📍 London

Building the future. Join us!

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Audio / Minimax

Input

Text

Output

Audio

Keywords

Text To SpeechVoice CloningReal Time
Use in Workflow