Back to Nodes
Gemini 3.1 Flash TTS

Gemini 3.1 Flash TTS

Official

Generate expressive speech from text with Gemini 3.1 Flash TTS, including optional multi-speaker voice configuration.

Nodespell AI
AI / Audio / Google

Generate expressive speech from text with Gemini 3.1 Flash TTS, including optional multi-speaker voice configuration.

Model Overview

Gemini 3.1 Flash TTS converts text prompts into speech audio and supports inline delivery cues such as [laughing], [whispering], and [short pause]. The model can run as a single-speaker voice or use a speakers list where prompt line prefixes match configured speaker aliases.

Best At

  • Dialogue-style text-to-speech with named speakers.
  • Expressive narration controlled by natural-language style instructions.
  • Speech prompts that use inline audio tags for pacing, tone, and delivery.

Limitations / Not Good At

  • Speaker aliases must match the prefixes used in the prompt.
  • The single-speaker voice setting is ignored when speaker groups are provided.
  • Very long or inconsistently tagged scripts may need editing before synthesis.

Ideal Use Cases

  • Podcast or interview drafts with two or more named speakers.
  • Narration, explainer audio, and character dialogue.
  • Prototyping multilingual or expressive speech workflows from text.

Input & Output Format

  • Input: required prompt, optional style_instructions, optional single-speaker voice, optional language_code, optional temperature, output_format, and zero or more speaker groups with speaker_id plus voice.
  • Output: generated audio returned on response.

Performance Notes

  • Fal prices this model per 1000 input characters.
  • Multi-speaker synthesis is enabled only when at least one speaker group is present.
Inputs (1)

Prompt

String

Text to convert to speech. Use speaker prefixes that match configured speaker aliases for multi-speaker scripts.

Multi InputMin: 0Max: 100
Parameters (9)

Prompt

String

Text to convert to speech. Use speaker prefixes that match configured speaker aliases for multi-speaker scripts.

Required
Default: Host: Welcome back. DrChen: [excited] Gemini TTS can now generate expressive multi-speaker dialogue from a script.

Style Instructions

String

Optional natural-language instructions for tone, pace, accent, emotion, or delivery style.

Default:

Voice

String

Voice preset for single-speaker synthesis. Ignored when speaker groups are configured.

Default: Kore

Language Code

String

Optional language hint, such as English (US), Japanese (Japan), or Chinese Mandarin (China). Leave empty for auto-detect.

Default:

Speakers

Inferred

Repeatable speaker configs for multi-speaker synthesis. Speaker aliases must match prefixes in the prompt.

Speaker Alias

String

Alias used in the prompt, for example 'Host' in 'Host: Welcome back'. Use alphanumeric text without spaces.

Required
Default:

Voice

String

Voice preset for this speaker.

Required
Default: Kore

Temperature

Number

Controls delivery variation. Lower values are more predictable; higher values are more varied.

Default: 1

Output Format

String

Generated audio file format.

Default: mp3
Outputs (1)

Audio

Inferred

Generated speech audio.

Nodespell Team

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Audio / Google

Input

Text

Output

Audio

Keywords

Text To SpeechMultimodal GenerationStyle ControlLength Control
Use in Workflow