Back to Nodes
Veo 3.1

Veo 3.1

Official

Improved video generation model with higher fidelity, context-aware audio, and supports image references and frame interpolation.

Nodespell AI
AI / Video / Google

Improved video generation model with higher fidelity, context-aware audio, and supports image references and frame interpolation.

Model Overview

A text-to-video generator that creates high-fidelity videos with context-aware audio. It builds on Veo 3 with improved quality.

Best At

  • Generating high-quality videos from text descriptions.
  • Maintaining subject consistency when using reference images.
  • Smooth video transitions via last frame interpolation.
  • Creating natural audio that matches the generated video content.

Limitations / Not Good At

  • Reference images are limited to 16:9 aspect ratio and 8-second duration.
  • Last frame option is omitted when reference images are used.
  • Output is always a video file; no separate image or audio output.
  • Specific input image resolutions required for different modes.

Ideal Use Cases

  • Marketing and product demos with consistent branding.
  • Social media video creation (short vertical or horizontal videos).
  • Smooth transitions from images to video content.
  • Videos with contextually relevant background vocals or sounds.

Input & Output Format

  • Input: Text prompt (required) combined with optional parameters: aspect ratio, duration, starting image, last frame, reference images, negative prompt, resolution, and audio generation flag.
  • Output: URI pointing to the generated MP4 video file.

Performance Notes

  • High quality video generation requires substantial compute resources.
  • Generation time may be longer than other modalities due to video synthesis.
Inputs (4)

Prompt

String

Text prompt for video generation

Multi InputMin: 0Max: 100

Image

String

Input image to start generating from. Ideal images are 16:9 or 9:16 and 1280x720 or 720x1280, depending on the aspect ratio you choose.

Min: 0Max: 100

Last Frame

String

Ending image for interpolation. When provided with an input image, creates a transition between the two images.

Min: 0Max: 100

Reference Images

String

1 to 3 reference images for subject-consistent generation (reference-to-video, or R2V). Reference images only work with 16:9 aspect ratio and 8-second duration. Last frame is ignored if reference images are provided.

Multi InputMin: 0Max: 100
Parameters (7)

Seed

Number

Random seed. Omit for random generations

Default: -1

Prompt

String

Text prompt for video generation

Default:

Duration

Number

Video duration in seconds

Default: 8

Resolution

String

Resolution of the generated video

Default: 720p

Aspect Ratio

String

Video aspect ratio

Default: 16:9

Generate Audio

Boolean

Generate audio with the video

Default: true

Negative Prompt

String

Description of what to exclude from the generated video

Default:
Outputs (1)

Output

Inferred

Output

Nodespell

Nodespell

📍 London

Building the future. Join us!

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Video / Google

Input

TextImage

Output

Video

Keywords

Video GenerationAspect ControlResolution ControlLength Control
Use in Workflow