Grok Imagine Reference To Video
Generate videos guided by one or more reference images using xAI’s Grok Imagine video model.
Generate videos guided by one or more reference images using xAI’s Grok Imagine video model.
Model Overview
Grok Imagine R2V is a reference-to-video workflow. Instead of treating an image as the first frame, it uses one or more reference images as visual direction for style, subjects, and composition while the prompt describes the motion and scene.
Best At
- Character and style consistency across a new generated clip.
- Combining multiple visual references into one coherent moving scene.
- Prompt-guided video generation where reference images should influence the outcome without locking the first frame.
Limitations / Not Good At
- It is not image-to-video and does not accept source videos.
- Resolution currently tops out at 720p.
- Pricing scales linearly with output duration, so longer clips cost more even when the references stay the same.
Ideal Use Cases
- Giving a generated video the look of specific character sheets or mood boards.
- Combining several reference stills into one motion concept.
- Style-directed short-form video ideation.
Input & Output Format
- Input: required
promptplus requiredreference_images; optionalaspect_ratio,duration, andresolution. - Output: generated video asset returned on
response.
Performance Notes
- Replicate bills this model per second of output video.
- Shorter clips are the fastest way to iterate when refining references and motion prompts.
Prompt
StringText prompt describing the video to generate.
Reference Images
StringReference images used as style and content guidance.
Prompt
StringText prompt describing the video to generate.
Aspect Ratio
StringAspect ratio of the generated video.
16:9Duration
NumberDuration of the video in seconds.
8Resolution
StringResolution of the generated video.
480pOutput
InferredGenerated video output.
Nodespell Team
Type
Node
Status
Official
Package
Nodespell AI
Category
AI / Video / XaiInput
Output