Automatic subtitles
A video with auto-generated subtitles burned in. The transcription is generated from an AI voiceover, then displayed as styled captions synced to the audio. No manual transcription required.
Complete JSON
{
"comment": "Video with auto-generated subtitles from AI voice",
"resolution": "instagram-portrait",
"quality": "high",
"scenes": [
{
"elements": [
{
"type": "video",
"src": "https://cdn.json2video.com/assets/samples/loop-bg.mp4",
"fit": "cover",
"volume": 0
},
{
"type": "voice",
"model": "elevenlabs-flash-v2-5",
"voice": "Rachel",
"text": "JSON2Video makes it easy to add subtitles to any video. The transcription runs automatically — you just pick the style, and the engine handles the rest."
},
{
"type": "subtitles",
"settings": {
"style": "classic-progressive",
"position": "bottom-center",
"max-words-per-line": 3,
"all-caps": true,
"font-size": 65,
"font-weight": "900",
"color": "white",
"outline-color": "black",
"outline-width": 4,
"highlight-color": "#FF6B00"
}
}
]
}
]
}
How it works
The scene contains three layered elements:
- A
videobackground withvolume: 0to mute its native audio (since the voice element will be the audio track). - A
voiceelement with an AI-generated voiceover. The scene's duration auto-matches the voice clip length. - A
subtitleselement that automatically transcribes whatever audio is in the scene — in this case, the voice element above.
The subtitles transcription runs on the rendered audio, so it works equally well for AI voice, uploaded narration, or any audio source. The engine uses Whisper-grade transcription by default; the result is highly accurate for clear English audio and supports many languages.
Subtitle style
The style: "classic-progressive" shows words one-by-one with a highlight on the current word — the high-engagement caption style used by major creators. Alternative styles:
"classic"— fixed multi-word lines, no per-word highlight."karaoke"— colour fills each word as it's spoken."subtitle"— minimalist bottom-of-frame text, no styling.
Style settings
max-words-per-line: 3keeps each caption short — easier to read on a vertical phone screen.all-caps: trueis a stylistic choice popular in social content.outline-color,outline-widthmake the text legible against busy video backgrounds.highlight-color: "#FF6B00"colours the currently-spoken word — set to your brand accent.
For the full property list, see the subtitles element reference.
Translating to a different language
The transcription detects the language of the audio automatically. To force a specific language or translate, pass a language setting:
{
"type": "subtitles",
"settings": {
"language": "es",
"translate-to": "en",
"style": "classic-progressive"
}
}