Back to Nodes
Minimax Speech 02 Turbo

Minimax Speech 02 Turbo

Official

Real-time Text-to-Audio synthesis with emotional expression and multilingual support.

Nodespell AI
AI / Audio / Minimax

Real-time Text-to-Audio synthesis with emotional expression and multilingual support.

Model Overview

A powerful Text-to-Audio (T2A) model designed for real-time applications, offering high-quality voice synthesis, a wide range of emotional expressions, and extensive multilingual capabilities.

Best At

This model excels at generating speech for real-time applications where low latency is crucial. It's also highly capable in producing varied emotional tones and supporting over 30 languages with native accents.

Limitations / Not Good At

While optimized for speed, the 'turbo' version might not offer the absolute highest fidelity compared to specialized high-definition models for applications like audiobooks. Extensive character counts (up to 5000) might introduce slightly more latency.

Ideal Use Cases

  • Real-time voice assistants and chatbots 🤖
  • Dynamic character voices for games 🎮
  • Instantaneous audio feedback in applications
  • Live narration for streams or events
  • Multilingual customer support audio

Input & Output Format

Text prompt → Audio file (URI)

Performance Notes

Designed for low latency, making it ideal for real-time interactions. Offers controls for speed, pitch, volume, and emotion to fine-tune the output.

Model Examples (4)

Example Index01 / 04
Example 01

Prestige-series teaser

Trailer-style narration for a dramatic series promo.

Source Inputs01
Text

At first they called it an accident. Then the dailies came back. Every frame showed the same door, open three inches wider than before. This autumn, the footage tells its own story.

Parameters09
Text
At first they called it an accident. Then the dailies came back. Every frame showed the same door, open three inches wider than before. This autumn, the footage tells its own story.
Voice Id
Deep_Voice_Man
Emotion
neutral
Speed
1
Pitch
0
Volume
1
Channel
mono
Sample Rate
32000
Bitrate
128000
ttslow-latency
Response
Inputs (1)

Text

String

Text to convert to speech

Multi InputMin: 0Max: 100
Parameters (11)

Text

String

Text to convert to speech. Every character is 1 token. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).

Default:

Pitch

Number

Speech pitch

Default: 0

Speed

Number

Speech speed

Default: 1

Volume

Number

Speech volume

Default: 1

Bitrate

Number

Bitrate for the generated speech

Default: 128000

Channel

String

Number of audio channels

Default: mono

Emotion

String

Speech emotion

Default: auto

Voice Id

String

Desired voice ID. Use a voice ID you have trained (https://replicate.com/minimax/voice-cloning), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl

Default: Wise_Woman

Sample Rate

Number

Sample rate for the generated speech

Default: 32000

Language Boost

String

Enhance recognition of specific languages and dialects

Default: None

English Normalization

Boolean

Enable English text normalization for better number reading (slightly increases latency)

Default: false
Outputs (1)

Output

Inferred

Output

Nodespell

Nodespell

London

Building the future. Join us!

Creator profile

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Audio / Minimax

Input

Text

Output

Audio

Keywords

Text To SpeechVoice CloningReal Time
Use in Workflow