Archived docs Get your API Key
Get started
Tutorials
Guides
Reference
Help for AI agents
๐Ÿค– AI Assistant

2. Images, videos & audios

In chapter 1 you rendered a single still image with a text overlay. This chapter adds the other two asset element types โ€” video and audio โ€” and introduces element-level timing with start and duration. By the end you have a short reel that shows two photos, then a short video clip, with background music underneath.

Prerequisites: chapter 1. You should know what a scene and an element are, and how to submit a movie with POST /v2/movies.

Step 1 โ€” Start from chapter 1

The starting point is the final JSON from chapter 1 โ€” one image plus a text overlay.

{
  "resolution": "full-hd",
  "scenes": [
    {
      "elements": [
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
          "duration": 8
        },
        {
          "type": "text",
          "text": "FOR SALE โ€” 123 Oak Street",
          "duration": 8
        }
      ]
    }
  ]
}

Step 2 โ€” Add a second image, sequenced

We want two photos shown back-to-back inside the same scene: the exterior shot first, then the kitchen. Use start to delay the second image. The exterior runs from 0 to 4 s; the kitchen runs from 4 to 8 s.

{
  "resolution": "full-hd",
  "scenes": [
    {
      "elements": [
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
          "start": 0,
          "duration": 4
        },
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg",
          "start": 4,
          "duration": 4
        },
        {
          "type": "text",
          "text": "FOR SALE โ€” 123 Oak Street",
          "duration": 8
        }
      ]
    }
  ]
}

The text element has no start, so it defaults to 0 and is visible across the full 8 seconds.

Step 3 โ€” Add a short video clip

A video element is identical to image except the source is an MP4/WebM/MOV. Drop a 5-second drone shot at the end and extend the scene to 13 seconds.

{
  "resolution": "full-hd",
  "scenes": [
    {
      "elements": [
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
          "start": 0,
          "duration": 4
        },
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg",
          "start": 4,
          "duration": 4
        },
        {
          "type": "video",
          "src": "https://cdn.json2video.com/assets/videos/sample-house-drone.mp4",
          "start": 8,
          "duration": 5
        },
        {
          "type": "text",
          "text": "FOR SALE โ€” 123 Oak Street",
          "duration": 13
        }
      ]
    }
  ]
}

Tip โ€” if your source video is longer than duration, JSON2Video trims it. If shorter, the last frame freezes. Use the video element's loop property when you want continuous playback over a longer slot. See Video element.

Step 4 โ€” Add background music

Audio plays in parallel with whatever else is on screen. Drop a music track at the movie level so it spans the whole video (not just a single scene). Movie-level elements live in a top-level elements array โ€” separate from scene-level elements.

{
  "resolution": "full-hd",
  "elements": [
    {
      "type": "audio",
      "src": "https://cdn.json2video.com/assets/audios/uplifting-corporate.mp3",
      "volume": 0.4
    }
  ],
  "scenes": [
    {
      "elements": [
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
          "start": 0,
          "duration": 4
        },
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg",
          "start": 4,
          "duration": 4
        },
        {
          "type": "video",
          "src": "https://cdn.json2video.com/assets/videos/sample-house-drone.mp4",
          "start": 8,
          "duration": 5
        },
        {
          "type": "text",
          "text": "FOR SALE โ€” 123 Oak Street",
          "duration": 13
        }
      ]
    }
  ]
}

volume: 0.4 keeps the music in the background so a future voice-over (chapter 7) sits cleanly on top.

The complete final JSON

{
  "resolution": "full-hd",
  "elements": [
    {
      "type": "audio",
      "src": "https://cdn.json2video.com/assets/audios/uplifting-corporate.mp3",
      "volume": 0.4
    }
  ],
  "scenes": [
    {
      "elements": [
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
          "start": 0,
          "duration": 4
        },
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg",
          "start": 4,
          "duration": 4
        },
        {
          "type": "video",
          "src": "https://cdn.json2video.com/assets/videos/sample-house-drone.mp4",
          "start": 8,
          "duration": 5
        },
        {
          "type": "text",
          "text": "FOR SALE โ€” 123 Oak Street",
          "duration": 13
        }
      ]
    }
  ]
}

Expected output

A 13-second 1920ร—1080 MP4: 4 s exterior photo, 4 s kitchen photo, 5 s drone video, with quiet uplifting music throughout and the "FOR SALE โ€” 123 Oak Street" label always on screen. Sample render: tutorial-02.mp4 (placeholder).

What you learned

  • start and duration let you sequence multiple elements inside a single scene.
  • video elements work like image elements but consume a clip.
  • Movie-level elements overlay every scene โ€” perfect for background music that spans the whole video.
  • audio.volume is a float between 0 and 1.

Previous chapter / Next chapter

โ† 1. Your first video ยท 3. Multiple scenes & transitions โ†’