Gemini 3.1 Flash TTS
Generate expressive speech from text with optional multi-speaker voice configuration.
Generate expressive speech from text with optional multi-speaker voice configuration.
Model Overview
Gemini 3.1 Flash TTS converts text prompts into speech audio. It supports single-speaker synthesis and multi-speaker scripts where prompt line prefixes match configured speaker aliases.
Best At
- Dialogue-style text-to-speech with named speakers.
- Expressive narration controlled by natural-language style instructions.
- Speech prompts that include inline delivery cues for pacing, tone, and emotion.
Limitations / Not Good At
- Speaker aliases must match the prefixes used in the prompt.
- The single-speaker voice setting is ignored when speaker groups are configured.
- Very long or inconsistently tagged scripts may need editing before synthesis.
Ideal Use Cases
- Podcast or interview drafts with two or more named speakers.
- Narration, explainer audio, and character dialogue.
- Prototyping multilingual or expressive speech workflows from text.
Input & Output Format
- Input: required prompt, optional style instructions, optional single-speaker voice, optional language code, optional temperature, and zero or more speaker groups.
- Output: generated WAV audio returned on response.
Performance Notes
Multi-speaker synthesis is enabled only when at least one speaker group is present.
Prompt
StringText to convert to speech. Use speaker prefixes that match configured speaker aliases for multi-speaker scripts.
Prompt
StringText to convert to speech. Use speaker prefixes that match configured speaker aliases for multi-speaker scripts.
Style Instructions
StringOptional natural-language instructions for tone, pace, accent, emotion, or delivery style.
Say the following.Voice
StringVoice preset for single-speaker synthesis. Ignored when speaker groups are configured.
KoreLanguage Code
StringOptional language hint, such as English (US), Japanese (Japan), or Chinese Mandarin (China). Leave empty for auto-detect.
English (US)Speakers
InferredRepeatable speaker configs for multi-speaker synthesis. Speaker aliases must match prefixes in the prompt.
Speaker Alias
StringAlias used in the prompt, for example 'Host' in 'Host: Welcome back'. Use alphanumeric text without spaces.
Voice
StringVoice preset for this speaker.
KoreTemperature
NumberControls delivery variation. Lower values are more predictable; higher values are more varied.
1Audio
InferredGenerated speech audio.
Nodespell Team
Type
Node
Status
Official
Package
Nodespell AI
Category
AI / Audio / GoogleInput
Output