Seedance 2.0 Reference To Video
Seedance 2.0 reference-to-video generation using text plus optional image, video, and audio references.
Seedance 2.0 reference-to-video generation using text plus optional image, video, and audio references.
Model Overview
Seedance 2.0 Reference To Video generates short video clips from a prompt and optional reference media. It supports image, video, and audio references, 480p, 720p, and 1080p output, 4-15 second duration controls, flexible aspect ratios, and optional generated audio.
Best At
- Multimodal video generation with reference images, videos, or audio.
- Prompt-directed clips that borrow subject, style, or motion cues from supplied media.
- Reference-heavy creative work where output quality matters more than fast iteration.
Limitations / Not Good At
- Reference files must be aligned with prompt instructions to avoid conflicting guidance.
- Video-reference pricing includes input video duration as well as output duration.
- 1080p provides the highest fidelity, but costs more than 720p and 480p.
Ideal Use Cases
- Creating video variations from visual and audio reference sets.
- Producing short campaign, storyboard, and concept clips with multimodal guidance.
- Generating final candidates after testing reference direction in faster modes.
Input & Output Format
- Input: required
prompt; optionalimage_urls,video_urls,audio_urls,aspect_ratio,duration,resolution,generate_audio, andseed. - Output: generated video URI returned on
response.
Performance Notes
- Pricing scales with generated video size and duration.
- Runs with video references also account for input video duration, with Fal applying its video-input multiplier.
Prompt
StringText prompt describing the generated video and how to use references.
Reference Images
StringReference images for subject, style, or scene guidance. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12.
Reference Videos
StringReference videos for motion, style, or scene guidance. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Resolution must be between ~480p and ~720p. Total files across all modalities must not exceed 12.
Reference Audio
StringReference audio for audio-guided video generation. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file. At least one reference image or video is required when audio is provided. Total files across all modalities must not exceed 12.
Prompt
StringText prompt describing the generated video and how to use references.
Duration
StringVideo duration in seconds, or auto to let the model decide.
autoResolution
StringVideo resolution. 480p is cheaper; 720p balances cost and detail; 1080p is highest quality.
720pAspect Ratio
StringAspect ratio of the generated video.
autoGenerate Audio
BooleanGenerate synchronized audio for the video.
trueSeed
NumberRandom seed. Leave at -1 for a random result.
Output
InferredGenerated video output.
Nodespell Team
Type
Node
Status
Official
Package
Nodespell AI
Category
AI / Video / BytedanceInput
Output