Gemini 3.1 Flash TTS

Name: Gemini 3.1 Flash TTS
Author: Nodespell Team

Official

Generate expressive speech from text with optional multi-speaker voice configuration.

Nodespell AI

AI / Audio / Google

Generate expressive speech from text with optional multi-speaker voice configuration.

Model Overview

Gemini 3.1 Flash TTS converts text prompts into speech audio. It supports single-speaker synthesis and multi-speaker scripts where prompt line prefixes match configured speaker aliases.

Best At

Dialogue-style text-to-speech with named speakers.
Expressive narration controlled by natural-language style instructions.
Speech prompts that include inline delivery cues for pacing, tone, and emotion.

Limitations / Not Good At

Speaker aliases must match the prefixes used in the prompt.
The single-speaker voice setting is ignored when speaker groups are configured.
Very long or inconsistently tagged scripts may need editing before synthesis.

Ideal Use Cases

Podcast or interview drafts with two or more named speakers.
Narration, explainer audio, and character dialogue.
Prototyping multilingual or expressive speech workflows from text.

Input & Output Format

Input: required prompt, optional style instructions, optional single-speaker voice, optional language code, optional temperature, and zero or more speaker groups.
Output: generated WAV audio returned on response.

Performance Notes

Multi-speaker synthesis is enabled only when at least one speaker group is present.

Model Examples (4)

Example Index01 / 04

Example 01

Prestige-series teaser

Trailer-style narration for a dramatic series promo.

Open

Source Inputs01

Prompt

At first they called it an accident. Then the dailies came back. Every frame showed the same door, open three inches wider than before. This autumn, the footage tells its own story.

Parameters05

Prompt

At first they called it an accident. Then the dailies came back. Every frame showed the same door, open three inches wider than before. This autumn, the footage tells its own story.

Style Instructions

Measured male trailer narration with restrained menace, clear enunciation, and deliberate breath control.

Voice

Alnilam

Language Code

English (US)

Temperature

0.7

ttstrailernarration

Response

Inputs (1)

Prompt

String

Text to convert to speech. Use speaker prefixes that match configured speaker aliases for multi-speaker scripts.

Multi InputMin: 0Max: 100

Parameters (8)

Prompt

String

Text to convert to speech. Use speaker prefixes that match configured speaker aliases for multi-speaker scripts.

Required

Default:

Style Instructions

String

Optional natural-language instructions for tone, pace, accent, emotion, or delivery style.

Default: Say the following.

Voice

String

Voice preset for single-speaker synthesis. Ignored when speaker groups are configured.

Default: Kore

Language Code

String

Optional language hint, such as English (US), Japanese (Japan), or Chinese Mandarin (China). Leave empty for auto-detect.

Default: English (US)

Speakers

Inferred

Repeatable speaker configs for multi-speaker synthesis. Speaker aliases must match prefixes in the prompt.

Speaker Alias

String

Alias used in the prompt, for example 'Host' in 'Host: Welcome back'. Use alphanumeric text without spaces.

Required

Default:

Voice

String

Voice preset for this speaker.

Required

Default: Kore

Temperature

Number

Controls delivery variation. Lower values are more predictable; higher values are more varied.

Default: 1

Outputs (1)

Audio

Inferred

Generated speech audio.

Nodespell Team

Creator profile

Type

Node

Status

Official

Package

Nodespell AI

Keywords

Text To SpeechMultimodal GenerationStyle ControlLength Control

Use in Workflow