2. Images, videos & audios
In chapter 1 you rendered a single still image with a text overlay. This chapter adds the other two asset element types โ video and audio โ and introduces element-level timing with start and duration. By the end you have a short reel that shows two photos, then a short video clip, with background music underneath.
Prerequisites: chapter 1. You should know what a scene and an element are, and how to submit a movie with POST /v2/movies.
Step 1 โ Start from chapter 1
The starting point is the final JSON from chapter 1 โ one image plus a text overlay.
{
"resolution": "full-hd",
"scenes": [
{
"elements": [
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
"duration": 8
},
{
"type": "text",
"text": "FOR SALE โ 123 Oak Street",
"duration": 8
}
]
}
]
}
Step 2 โ Add a second image, sequenced
We want two photos shown back-to-back inside the same scene: the exterior shot first, then the kitchen. Use start to delay the second image. The exterior runs from 0 to 4 s; the kitchen runs from 4 to 8 s.
{
"resolution": "full-hd",
"scenes": [
{
"elements": [
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
"start": 0,
"duration": 4
},
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg",
"start": 4,
"duration": 4
},
{
"type": "text",
"text": "FOR SALE โ 123 Oak Street",
"duration": 8
}
]
}
]
}
The text element has no start, so it defaults to 0 and is visible across the full 8 seconds.
Step 3 โ Add a short video clip
A video element is identical to image except the source is an MP4/WebM/MOV. Drop a 5-second drone shot at the end and extend the scene to 13 seconds.
{
"resolution": "full-hd",
"scenes": [
{
"elements": [
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
"start": 0,
"duration": 4
},
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg",
"start": 4,
"duration": 4
},
{
"type": "video",
"src": "https://cdn.json2video.com/assets/videos/sample-house-drone.mp4",
"start": 8,
"duration": 5
},
{
"type": "text",
"text": "FOR SALE โ 123 Oak Street",
"duration": 13
}
]
}
]
}
Tip โ if your source video is longer than
duration, JSON2Video trims it. If shorter, the last frame freezes. Use the video element'sloopproperty when you want continuous playback over a longer slot. See Video element.
Step 4 โ Add background music
Audio plays in parallel with whatever else is on screen. Drop a music track at the movie level so it spans the whole video (not just a single scene). Movie-level elements live in a top-level elements array โ separate from scene-level elements.
{
"resolution": "full-hd",
"elements": [
{
"type": "audio",
"src": "https://cdn.json2video.com/assets/audios/uplifting-corporate.mp3",
"volume": 0.4
}
],
"scenes": [
{
"elements": [
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
"start": 0,
"duration": 4
},
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg",
"start": 4,
"duration": 4
},
{
"type": "video",
"src": "https://cdn.json2video.com/assets/videos/sample-house-drone.mp4",
"start": 8,
"duration": 5
},
{
"type": "text",
"text": "FOR SALE โ 123 Oak Street",
"duration": 13
}
]
}
]
}
volume: 0.4 keeps the music in the background so a future voice-over (chapter 7) sits cleanly on top.
The complete final JSON
{
"resolution": "full-hd",
"elements": [
{
"type": "audio",
"src": "https://cdn.json2video.com/assets/audios/uplifting-corporate.mp3",
"volume": 0.4
}
],
"scenes": [
{
"elements": [
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg",
"start": 0,
"duration": 4
},
{
"type": "image",
"src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg",
"start": 4,
"duration": 4
},
{
"type": "video",
"src": "https://cdn.json2video.com/assets/videos/sample-house-drone.mp4",
"start": 8,
"duration": 5
},
{
"type": "text",
"text": "FOR SALE โ 123 Oak Street",
"duration": 13
}
]
}
]
}
Expected output
A 13-second 1920ร1080 MP4: 4 s exterior photo, 4 s kitchen photo, 5 s drone video, with quiet uplifting music throughout and the "FOR SALE โ 123 Oak Street" label always on screen. Sample render: tutorial-02.mp4 (placeholder).
What you learned
startanddurationlet you sequence multiple elements inside a single scene.videoelements work likeimageelements but consume a clip.- Movie-level
elementsoverlay every scene โ perfect for background music that spans the whole video. audio.volumeis a float between 0 and 1.
Previous chapter / Next chapter
โ 1. Your first video ยท 3. Multiple scenes & transitions โ