Archived docs Get your API Key
Get started
Tutorials
Guides
Reference
Help for coding agents
🤖 AI Assistant

16. Optimization & cost

Your CRM is now triggering renders automatically (chapter 15) — possibly hundreds per day. Each render consumes credits and takes wall-clock time. This final chapter covers four levers that bring down both: cache, scene granularity for parallel rendering, quality, and asset reuse.

Prerequisites: chapter 15. Optimization assumes you understand the rest of the pipeline.

Lever 1 — Caching

Every renderable thing in JSON2Video is cached by default. The cache key is "everything that affects the output": the JSON of the element, its sources, its settings. If you submit a movie with the same JSON twice, the second render is served from cache — essentially free and instant.

To opt out for a specific element, set cache: false. Useful when:

  • A source URL has the same path but changing contents (e.g. a price tag served by your CMS).
  • You want to force a fresh voice render after tweaking voice settings.
{
  "type": "voice",
  "text": "Welcome to {{ address }}.",
  "voice": "en-US-EmmaMultilingualNeural",
  "cache": false
}

In production, leave caching on. Each generated voiceover in the listing costs credits the first time — every subsequent render of the same listing is free.

Lever 2 — Split into more scenes

JSON2Video renders scenes in parallel. The total render time is roughly max(scene durations), not the sum. Two effects:

  • Splitting one 30-second scene into 6×5-second scenes is significantly faster.
  • A monolithic scene with 30 elements is the worst case — it cannot parallelise.

In our listing, the chapter-13 iterate pattern already produces N scenes — one per room. That's optimal. If you have a long "stats roll" with many beats, break it into one scene per beat.

Lever 3 — Tune quality

The movie-level quality field controls render fidelity:

Value Use for
"low" Internal previews, internal QA
"medium" Social previews, draft reviews
"high" Final delivery (default)

low renders ~3× faster than high and costs fewer credits. A typical workflow: render low while drafting, then re-render high once approved.

{ "quality": "high" }

Lever 4 — Reuse heavy assets across renders

The most expensive operations are voiceover synthesis and heavy remote downloads. Two strategies:

4a. Hoist heavy assets to preload. The preload array generates or fetches an asset once and exposes its URL as {{id_url}}, ready for every element that references it. The generation happens before the main render and is cached.

{
  "preload": [
    {
      "id": "intro",
      "type": "voice",
      "text": "Welcome to today's listing.",
      "voice": "en-US-EmmaMultilingualNeural"
    }
  ],
  "scenes": [
    {
      "elements": [
        { "type": "audio", "src": "{{intro_url}}" }
      ]
    }
  ]
}

4b. Upload the asset once via /v2/media. If you want full control, upload your reusable assets to your own Media library and reference them by URL. No re-generation ever.

The complete final JSON

{
  "resolution": "full-hd",
  "quality": "high",
  "cache": true,
  "client-data": {
    "listing_id": "L-4821"
  },
  "exports": [
    {
      "destinations": [
        { "type": "webhook", "endpoint": "https://your-app.example/json2video-callback" }
      ]
    }
  ],
  "preload": [
    {
      "id": "intro",
      "type": "voice",
      "text": "Welcome to {{ address }}.",
      "voice": "en-US-EmmaMultilingualNeural"
    }
  ],
  "variables": {
    "address": "123 Oak Street",
    "rooms": [
      { "name": "Exterior",       "image": "https://cdn.json2video.com/assets/images/sample-house-front.jpg" },
      { "name": "Chef's Kitchen", "image": "https://cdn.json2video.com/assets/images/sample-house-kitchen.jpg" },
      { "name": "Master Bedroom", "image": "https://cdn.json2video.com/assets/images/sample-house-bedroom.jpg" }
    ]
  },
  "elements": [
    {
      "type": "audio",
      "src": "https://cdn.json2video.com/assets/audios/uplifting-corporate.mp3",
      "volume": 0.4
    },
    {
      "type": "voice",
      "text": "Welcome to {{ address }}.",
      "voice": "en-US-EmmaMultilingualNeural",
      "start": 1.5
    },
    {
      "type": "subtitles",
      "language": "en",
      "settings": {
        "style": "boxed-word",
        "font-family": "Inter",
        "font-size": 90,
        "font-color": "#FFFFFF",
        "position": "bottom-center",
        "all-caps": true,
        "box-color": "#0E7C66"
      }
    }
  ],
  "scenes": [
    {
      "duration": 4,
      "elements": [
        {
          "type": "component",
          "component": "basic/000",
          "settings": { "headline": "FOR SALE", "subline": "{{address}}" }
        }
      ]
    },
    {
      "duration": 4,
      "transition": { "style": "fade", "duration": 0.5 },
      "iterate": "rooms",
      "iterate-as": "room",
      "elements": [
        { "type": "image", "src": "{{ room.image }}" },
        { "type": "text", "text": "{{ room.name }}", "position": "top-left", "x": 60, "y": 60 }
      ]
    },
    {
      "duration": 4,
      "transition": { "style": "fade", "duration": 0.5 },
      "elements": [
        {
          "type": "html",
          "tailwind": true,
          "html": "<div class='flex h-full w-full items-center justify-center bg-emerald-700 text-white text-7xl font-bold'>Open House Sunday</div>"
        }
      ]
    }
  ]
}

Expected output

Functionally the same listing as chapter 15 — but renders faster on first hit (parallel scenes), free on repeat hits (cache), with the voiceover hoisted into preload so it generates exactly once and is reused everywhere. Sample render: tutorial-16.mp4 (placeholder).

What you learned

  • cache: true (default) makes repeat renders of the same JSON essentially free.
  • Splitting work into more scenes lets the renderer parallelise — total time tracks the slowest scene, not the sum.
  • quality is a three-step dial; use low for drafts and high for delivery.
  • Hoist expensive generated assets into preload so they are produced once and reused across every element.

You finished the tutorial

You now have a production-grade real-estate listing pipeline: data-driven, conditional, narrated, captioned, cached, parallel, and webhook-delivered. Where to go next:

  • Reference — every field and every endpoint.
  • Guides — task-oriented walkthroughs (dashboards, no-code, advanced patterns).
  • For coding agents — set up an MCP server or hand these docs to your coding agent.

Previous chapter

← 15. Webhooks