Back to Nodes
Gemini 3.1 Flash TTS

Gemini 3.1 Flash TTS

Official

Generate expressive speech from text with optional multi-speaker voice configuration.

Nodespell AI
AI / Audio / Google

Generate expressive speech from text with optional multi-speaker voice configuration.

Model Overview

Gemini 3.1 Flash TTS converts text prompts into speech audio. It supports single-speaker synthesis and multi-speaker scripts where prompt line prefixes match configured speaker aliases.

Best At

  • Dialogue-style text-to-speech with named speakers.
  • Expressive narration controlled by natural-language style instructions.
  • Speech prompts that include inline delivery cues for pacing, tone, and emotion.

Limitations / Not Good At

  • Speaker aliases must match the prefixes used in the prompt.
  • The single-speaker voice setting is ignored when speaker groups are configured.
  • Very long or inconsistently tagged scripts may need editing before synthesis.

Ideal Use Cases

  • Podcast or interview drafts with two or more named speakers.
  • Narration, explainer audio, and character dialogue.
  • Prototyping multilingual or expressive speech workflows from text.

Input & Output Format

  • Input: required prompt, optional style instructions, optional single-speaker voice, optional language code, optional temperature, and zero or more speaker groups.
  • Output: generated WAV audio returned on response.

Performance Notes

Multi-speaker synthesis is enabled only when at least one speaker group is present.

Inputs (1)

Prompt

String

Text to convert to speech. Use speaker prefixes that match configured speaker aliases for multi-speaker scripts.

Multi InputMin: 0Max: 100
Parameters (8)

Prompt

String

Text to convert to speech. Use speaker prefixes that match configured speaker aliases for multi-speaker scripts.

Required
Default:

Style Instructions

String

Optional natural-language instructions for tone, pace, accent, emotion, or delivery style.

Default: Say the following.

Voice

String

Voice preset for single-speaker synthesis. Ignored when speaker groups are configured.

Default: Kore

Language Code

String

Optional language hint, such as English (US), Japanese (Japan), or Chinese Mandarin (China). Leave empty for auto-detect.

Default: English (US)

Speakers

Inferred

Repeatable speaker configs for multi-speaker synthesis. Speaker aliases must match prefixes in the prompt.

Speaker Alias

String

Alias used in the prompt, for example 'Host' in 'Host: Welcome back'. Use alphanumeric text without spaces.

Required
Default:

Voice

String

Voice preset for this speaker.

Required
Default: Kore

Temperature

Number

Controls delivery variation. Lower values are more predictable; higher values are more varied.

Default: 1
Outputs (1)

Audio

Inferred

Generated speech audio.

Nodespell Team

Creator profile

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Audio / Google

Input

Text

Output

Audio

Keywords

Text To SpeechMultimodal GenerationStyle ControlLength Control
Use in Workflow