Audiogram
A podcast-style audiogram: a static cover image, an animated waveform synced to the audio, episode title text, and burned-in subtitles for accessibility. Vertical 9:16 — the standard share format for Reels, Stories, and Shorts.
Complete JSON
{
"comment": "Podcast audiogram with cover, waveform, title, subtitles",
"resolution": "instagram-portrait",
"quality": "high",
"scenes": [
{
"elements": [
{
"type": "image",
"src": "https://cdn.json2video.com/assets/samples/podcast-cover.jpg",
"fit": "cover"
},
{
"type": "audio",
"src": "https://cdn.json2video.com/assets/samples/podcast-clip.mp3"
},
{
"type": "text",
"text": "Episode 42\nBuilding in public",
"y": 150,
"style": "002",
"settings": {
"font-size": "75px",
"color": "white",
"font-weight": "800",
"text-align": "center",
"line-height": "85px"
}
},
{
"type": "audiogram",
"color": "#FF6B00",
"amplitude": 6,
"y": 1400,
"height": 200,
"width": 1000,
"x": 40
},
{
"type": "subtitles",
"settings": {
"style": "classic",
"position": "bottom-center",
"max-words-per-line": 4,
"font-size": 50,
"all-caps": false
}
}
]
}
]
}
How it works
The scene has no explicit duration — it inherits the length of the longest element, which is the audio clip. Whatever your podcast-clip.mp3 is, the video matches it exactly.
The image element fills the canvas as the visual backdrop. fit: "cover" ensures the cover art scales to the 1080×1920 frame; if your cover is square (1080×1080), it crops the top and bottom. For square cover art, position it explicitly with y: 420 and constrain its height instead.
The audiogram element renders an animated waveform synced to whatever audio is playing in the scene — it doesn't own the audio, it visualises it. Properties:
color— bar colour. Match your brand accent.amplitude— how reactive the bars are to audio loudness. Start at 5-6; increase for quiet recordings, decrease for loud / compressed ones.height,width,x,y— position and dimensions in pixels.
The subtitles element generates timed text from the audio via JSON2Video's automatic transcription. style: "classic", max-words-per-line: 4, and position: "bottom-center" produce the standard reel-friendly caption look. The model defaults are good — whisper runs on the audio source automatically.
Cost note
Subtitles consume credits for the transcription pass. If you'll re-render the same audio multiple times during development, the cache means you only pay for transcription once — see the caching deep-dive.