Archived docs Get your API Key
Get started
Tutorials
Guides
Reference
Help for AI agents
🤖 AI Assistant

Audiogram

A podcast-style audiogram: a static cover image, an animated waveform synced to the audio, episode title text, and burned-in subtitles for accessibility. Vertical 9:16 — the standard share format for Reels, Stories, and Shorts.

Complete JSON

{
  "comment": "Podcast audiogram with cover, waveform, title, subtitles",
  "resolution": "instagram-portrait",
  "quality": "high",
  "scenes": [
    {
      "elements": [
        {
          "type": "image",
          "src": "https://cdn.json2video.com/assets/samples/podcast-cover.jpg",
          "fit": "cover"
        },
        {
          "type": "audio",
          "src": "https://cdn.json2video.com/assets/samples/podcast-clip.mp3"
        },
        {
          "type": "text",
          "text": "Episode 42\nBuilding in public",
          "y": 150,
          "style": "002",
          "settings": {
            "font-size": "75px",
            "color": "white",
            "font-weight": "800",
            "text-align": "center",
            "line-height": "85px"
          }
        },
        {
          "type": "audiogram",
          "color": "#FF6B00",
          "amplitude": 6,
          "y": 1400,
          "height": 200,
          "width": 1000,
          "x": 40
        },
        {
          "type": "subtitles",
          "settings": {
            "style": "classic",
            "position": "bottom-center",
            "max-words-per-line": 4,
            "font-size": 50,
            "all-caps": false
          }
        }
      ]
    }
  ]
}

How it works

The scene has no explicit duration — it inherits the length of the longest element, which is the audio clip. Whatever your podcast-clip.mp3 is, the video matches it exactly.

The image element fills the canvas as the visual backdrop. fit: "cover" ensures the cover art scales to the 1080×1920 frame; if your cover is square (1080×1080), it crops the top and bottom. For square cover art, position it explicitly with y: 420 and constrain its height instead.

The audiogram element renders an animated waveform synced to whatever audio is playing in the scene — it doesn't own the audio, it visualises it. Properties:

  • color — bar colour. Match your brand accent.
  • amplitude — how reactive the bars are to audio loudness. Start at 5-6; increase for quiet recordings, decrease for loud / compressed ones.
  • height, width, x, y — position and dimensions in pixels.

The subtitles element generates timed text from the audio via JSON2Video's automatic transcription. style: "classic", max-words-per-line: 4, and position: "bottom-center" produce the standard reel-friendly caption look. The model defaults are good — whisper runs on the audio source automatically.

Cost note

Subtitles consume credits for the transcription pass. If you'll re-render the same audio multiple times during development, the cache means you only pay for transcription once — see the caching deep-dive.

See also