Back to Nodes
Seedance 2.0 Reference To Video

Seedance 2.0 Reference To Video

Official

Seedance 2.0 reference-to-video generation using text plus optional image, video, and audio references.

+1
Nodespell AI
AI / Video / Bytedance

Seedance 2.0 reference-to-video generation using text plus optional image, video, and audio references.

Model Overview

Seedance 2.0 Reference To Video generates short video clips from a prompt and optional reference media. It supports image, video, and audio references, 480p, 720p, and 1080p output, 4-15 second duration controls, flexible aspect ratios, and optional generated audio.

Best At

  • Multimodal video generation with reference images, videos, or audio.
  • Prompt-directed clips that borrow subject, style, or motion cues from supplied media.
  • Reference-heavy creative work where output quality matters more than fast iteration.

Limitations / Not Good At

  • Reference files must be aligned with prompt instructions to avoid conflicting guidance.
  • Video-reference pricing includes input video duration as well as output duration.
  • 1080p provides the highest fidelity, but costs more than 720p and 480p.

Ideal Use Cases

  • Creating video variations from visual and audio reference sets.
  • Producing short campaign, storyboard, and concept clips with multimodal guidance.
  • Generating final candidates after testing reference direction in faster modes.

Input & Output Format

  • Input: required prompt; optional image_urls, video_urls, audio_urls, aspect_ratio, duration, resolution, generate_audio, and seed.
  • Output: generated video URI returned on response.

Performance Notes

  • Pricing scales with generated video size and duration.
  • Runs with video references also account for input video duration, with Fal applying its video-input multiplier.
Inputs (4)

Prompt

String

Text prompt describing the generated video and how to use references.

RequiredMulti InputMin: 0Max: 100

Reference Images

String

Reference images for subject, style, or scene guidance. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12.

Multi InputMin: 0Max: 100

Reference Videos

String

Reference videos for motion, style, or scene guidance. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Resolution must be between ~480p and ~720p. Total files across all modalities must not exceed 12.

Multi InputMin: 0Max: 100

Reference Audio

String

Reference audio for audio-guided video generation. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file. At least one reference image or video is required when audio is provided. Total files across all modalities must not exceed 12.

Multi InputMin: 0Max: 100
Parameters (6)

Prompt

String

Text prompt describing the generated video and how to use references.

Required
Default:

Duration

String

Video duration in seconds, or auto to let the model decide.

Default: auto

Resolution

String

Video resolution. 480p is cheaper; 720p balances cost and detail; 1080p is highest quality.

Default: 720p

Aspect Ratio

String

Aspect ratio of the generated video.

Default: auto

Generate Audio

Boolean

Generate synchronized audio for the video.

Default: true

Seed

Number

Random seed. Leave at -1 for a random result.

Outputs (1)

Output

Inferred

Generated video output.

Nodespell Team

Creator profile

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Video / Bytedance

Input

TextImageVideoAudio

Output

Video

Keywords (7)

Video GenerationMultimodal GenerationConditional GenerationPrompt ConditioningAspect ControlResolution Control
Use in Workflow