Back to Nodes
MMAudio V2

MMAudio V2

Official

AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation.

Nodespell AI
AI / Audio / Mmaudio

AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation.

Model Overview

A plain-language description of what the model does (e.g. "Text-to-image generator trained on modern photography").
An advanced AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation. It processes visual information to generate corresponding audio that naturally fits the content, maintaining temporal consistency.

Best At

  • Generating high-fidelity audio that matches visual elements in videos.
  • Real-time synchronization with video events.
  • Synthesizing environmental sounds and action-to-sound mappings.
  • Adding audio to silent films or enhancing existing video audio.

Limitations / Not Good At

  • Processing time increases with video length.
  • Complex acoustic environments or rapid scene changes might require additional processing or may impact quality.
  • Output quality is dependent on the clarity and content of the input video.
  • Unique or highly specific sound effects might need specialized handling.

Ideal Use Cases

  • Film and video post-production to add sound effects or ambient audio.
  • Silent film restoration projects.
  • Enhancing educational videos with background sounds.
  • Creating soundscapes for games and VR experiences.
  • Improving accessibility of video content.

Input & Output Format

Input: Video file, optional text prompt, negative prompt, duration, and various generation parameters.
Output: Audio file (URI).

Performance Notes

  • Processing time scales with video length and complexity.
  • Performance can vary with rapid scene changes in the input video.
Inputs (3)

Prompt

String

Text prompt for generated audio

Multi InputMin: 0Max: 100

Video

String

Optional video file for video-to-audio generation

Min: 0Max: 100

Image

String

Optional image file for image-to-audio generation (experimental)

Min: 0Max: 100
Parameters (6)

Seed

Number

Random seed. Use -1 or leave blank to randomize the seed

Default: -1

Prompt

String

Text prompt for generated audio

Default:

Duration

Number

Duration of output in seconds

Default: 8

Num Steps

Number

Number of inference steps

Default: 25

CFG Strength

Number

Guidance strength (CFG)

Default: 4.5

Negative Prompt

String

Negative prompt to avoid certain sounds

Default: music
Outputs (1)

Output

Inferred

Output

Nodespell

Nodespell

📍 London

Building the future. Join us!

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Audio / Mmaudio

Input

VideoText

Output

Audio

Keywords

Video EditSound Effect GenerationAudio EnhancementMultimodal GenerationConditional GenerationLength Control
Use in Workflow