Archived docs Get your API Key
Get started
Tutorials
Guides
Reference
Help for AI agents
🤖 AI Assistant

Core concepts

These are the building blocks every JSON2Video request shares. Understanding them is enough to read and write any movie JSON.

  • Movie. The final video output. A movie is a JSON document describing the video's resolution, scenes, and movie-level elements that overlay every scene.
  • Scene. A segment of the movie. A movie contains one or more scenes, played back-to-back. Scenes cannot overlap; each renders independently and is then chained into the final output.
  • Element. The atomic content unit inside a scene or movie. Element types include image, video, text, audio, voice, subtitles, component, html, and audiogram.
  • Duration and timing. Every element has a start (when it appears) and a duration (how long it plays). duration accepts a number in seconds, or -1 for "natural duration of the asset", or -2 for "match the container".
  • Coordinates. Positioning works like HTML/CSS: the canvas origin (0, 0) is the top-left corner. x and y move right and down. Width/height match the movie's resolution.
  • Layering. Elements stack in the order they appear in the JSON array. Later elements paint on top of earlier ones, the same way HTML siblings stack along the z-axis.
  • Caching. The renderer caches downloaded assets and intermediate scene renders to speed up repeat requests and reduce cost. Set cache: false on an element to bypass the cache for that asset.

Watch a 2-minute walkthrough