8. Automatic subtitles
Most social platforms autoplay video with sound off. Subtitles double watch time. This chapter adds a subtitles element that transcribes the chapter-7 voice-over automatically โ no SRT file required.
Prerequisites: chapter 7. The listing must already include a voice element; the subtitles element transcribes it.
Step 1 โ The simplest subtitles element
A bare subtitles element transcribes the audio track of the movie automatically. There can be only one subtitles element per movie, and it always lives at movie level.
{
"type": "subtitles"
}
That's it. Add it to the top-level elements array and the renderer:
- Mixes the voice + audio tracks.
- Runs speech-to-text on the result.
- Burns the captions onto the canvas in a default style.
Step 2 โ Customise the look
Subtitles styling lives in a settings object. The most-used keys are style, font-family, font-size, font-color, outline-color, position, and all-caps.
{
"type": "subtitles",
"settings": {
"style": "boxed-word",
"font-family": "Inter",
"font-size": 90,
"font-color": "#FFFFFF",
"outline-color": "#000000",
"position": "bottom-center",
"all-caps": true,
"box-color": "#0E7C66"
}
}
style ranges from classic (simple text overlay) to boxed-word (modern social-style with a coloured box behind the current word). See the Subtitles element reference for all styles and settings.
Step 3 โ Specify the language (optional)
The transcription engine auto-detects language by default. If you want to be explicit (or speed up the model), set language:
{
"type": "subtitles",
"language": "en",
"settings": {
"style": "boxed-word",
"font-family": "Inter",
"font-size": 90,
"font-color": "#FFFFFF",
"outline-color": "#000000",
"position": "bottom-center",
"all-caps": true,
"box-color": "#0E7C66"
}
}
language accepts ISO 639-1 codes (en, es, fr, โฆ).
Step 4 โ Mind the room labels at the bottom-left
The chapter-4 room labels live at bottom-left. Subtitles at bottom-center are far enough away that they don't clash, but if you wanted them not to overlap you could move the room labels to top-left instead. We keep both for clarity.
The complete final JSON
{
"resolution": "full-hd",
"elements": [
{
"type": "audio",
"src": "https://cdn.json2video.com/assets/audios/uplifting-corporate.mp3",
"volume": 0.4
},
{
"type": "voice",
"text": "Welcome to 123 Oak Street โ a four-bedroom craftsman home, listed at $849,000.",
"voice": "en-US-EmmaMultilingualNeural",
"start": 1.5
},
{
"type": "subtitles",
"language": "en",
"settings": {
"style": "boxed-word",
"font-family": "Inter",
"font-size": 90,
"font-color": "#FFFFFF",
"outline-color": "#000000",
"position": "bottom-center",
"all-caps": true,
"box-color": "#0E7C66"
}
},
{
"type": "html",
"tailwind": true,
"wait": 0.5,
"html": "<div class='inline-flex items-center gap-2 px-6 py-4 rounded-xl bg-emerald-700 text-white text-5xl font-bold shadow-lg'>๐ฐ $849,000</div>",
"position": "bottom-right",
"x": -60,
"y": -60,
"start": 4,
"duration": 12
}
],
"scenes": [
{
"duration": 4,
"elements": [
{
"type": "component",
"component": "basic/000",
"settings": { "headline": "FOR SALE", "subline": "123 Oak Street" }
}
]
},
{
"duration": 4,
"transition": { "style": "fade", "duration": 0.5 },
"elements": [
{ "type": "image", "src": "https://cdn.json2video.com/assets/images/sample-house-front.jpg" },
{ "type": "text", "text": "Exterior", "position": "top-left", "x": 60, "y": 60 }
]
},
{
"duration": 4,
"transition": { "style": "fade", "duration": 0.5 },
"elements": [
{ "type": "image", "src": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg" },
{ "type": "text", "text": "Chef's Kitchen", "position": "top-left", "x": 60, "y": 60 }
]
},
{
"duration": 4,
"transition": { "style": "fade", "duration": 0.5 },
"elements": [
{ "type": "image", "src": "https://cdn.json2video.com/assets/images/sample-house-bedroom.jpg" },
{ "type": "text", "text": "Master Bedroom", "position": "top-left", "x": 60, "y": 60 }
]
}
]
}
(Room labels moved to top-left to keep the bottom band reserved for subtitles.)
Expected output
The chapter-7 listing with bold green-boxed subtitles word-by-word at the bottom, room labels in the top-left, and the price tag bottom-right. Sample render: tutorial-08.mp4 (placeholder).
What you learned
type: subtitlesauto-transcribes the audio track of the movie.- Only one subtitles element per movie; it always sits at movie level.
settings.styleswitches between classic captions and modern boxed-word styles.languageis optional โ set it to skip auto-detection and pick a specific transcription model.
Going further
You can also supply subtitles from a pre-existing SRT/VTT/ASS file with captions: "https://โฆ". Useful for translated subtitle tracks the engine cannot generate yet. See the Subtitles reference.
Previous chapter / Next chapter
โ 7. AI voiceover ยท 9. AI generated images and videos โ