Kokoro 8.2m

Official

High-quality, efficient text-to-speech model (82M parameters) based on StyleTTS2.

Nodespell AI

AI / Audio / Kokoro

High-quality, efficient text-to-speech model (82M parameters) based on StyleTTS2.

Model Overview

A text-to-speech (TTS) model that generates natural-sounding speech from text, based on the StyleTTS2 architecture with 82 million parameters.

Best At

Generating clear and expressive speech across multiple languages, including American English, British English, French, Hindi, Italian, and Japanese. It offers a good balance of quality and efficiency.

Limitations / Not Good At

While it supports multiple languages, the quality and availability of specific voices can vary. Some languages have fewer voice options or lower training durations, potentially impacting the naturalness of the synthesized speech.

Ideal Use Cases

Creating voiceovers for videos and presentations
Generating audiobooks or podcast segments
Developing interactive voice response (IVR) systems
Accessibility tools for content creators
Prototyping voice applications

Input & Output Format

Input: Text (string), Voice (string, optional), Speed (number, optional)
Output: Audio file in URI format (e.g., WAV)

Performance Notes

This model is known for being fast and cost-efficient due to its relatively small size (82M parameters). It can handle long text inputs by automatically splitting them.

Model Examples (3)

Example Index01 / 03

Example 01

Prestige-series teaser

Trailer-style narration for a dramatic series promo.

Open

Source Inputs01

Text

At first they called it an accident. Then the dailies came back. Every frame showed the same door, open three inches wider than before. This autumn, the footage tells its own story.

Parameters03

Text

At first they called it an accident. Then the dailies came back. Every frame showed the same door, open three inches wider than before. This autumn, the footage tells its own story.

Voice

bm_george

Speed

0.95

ttstrailer

Response

Inputs (1)

Text

String

Text to convert to speech

Multi InputMin: 0Max: 100

Parameters (3)

Text

String

Text input (long text is automatically split)

Default:

Speed

Number

Speech speed multiplier (0.5 = half speed, 2.0 = double speed)

Default: 1

Voice

String

Voice to use for synthesis

Default: af_bella

Outputs (1)

Output

Inferred

Output

Nodespell

London

Building the future. Join us!

nodespell.com nodespell.app NodespellAI

Creator profile

Type

Node

Status

Official

Package

Nodespell AI

Keywords

Text To Speech

Use in Workflow