The feature to generate voiceovers with AI models is a terrific feature that allows you to create engaging videos with ease.

JSON2Video API allows you to directly use AI models from the API to generate voiceovers for your videos simplifying a lot the creation process. You don't need to generate voiceovers calling external APIs or services, you just need to provide the parameters for the AI model and JSON2Video will handle the rest.

Generating voiceovers with AI models

To generate voiceovers with AI models, you use the Voice Element specifying the model to use and the parameters for the AI model.

The current supported models are elevenlabs and Microsoft azure.

Using Microsoft Azure model

The following example shows how to generate a voiceover using the azure model.

{
    "resolution": "full-hd",
    "scenes": [
        {
            "elements": [
                {
                    "type": "voice",
                    "model": "azure",
                    "text": "That's one small step for a man, one giant leap for mankind.",
                    "voice": "en-GB-SoniaNeural"
                }
            ]
        }
    ]
}

The AI generation related properties are:

The azure model is included for free in the JSON2Video API.

Using ElevenLabs model

Now, if we want to use elevenlabs model instead of azure, we can do it by changing the model property.

{
    "resolution": "full-hd",
    "scenes": [
        {
            "elements": [
                {
                    "type": "voice",
                    "model": "elevenlabs",
                    "text": "That's one small step for a man, one giant leap for mankind.",
                    "voice": "Brian"
                }
            ]
        }
    ]
}

For the voice property, you can check all the available voices for ElevenLabs model here.

Be aware that the elevenlabs model will consume credits just for generating the voiceover. Check how credits are consumed here.

Voiceover caching

Voiceovers generated with AI models are cached in JSON2Video servers to avoid calling the AI models for the same voiceover multiple times. This means that if you call the JSON2Video API with the same parameters for the same voiceover multiple times, the same voiceover will be used in the video, avoiding the need to call the AI models again.

This is good because if you re-render the same video multiple times, the voiceover will be cached and reused, avoiding consuming credits for generating the same voiceover multiple times.

But if for any reason you need to regenerate a voiceover, you can do it by setting the cache property to false.

Example:

{
    "resolution": "full-hd",
    "scenes": [
        {
            "elements": [
                {
                    "type": "voice",
                    "model": "azure",
                    "text": "That's one small step for a man, one giant leap for mankind.",
                    "voice": "en-GB-SoniaNeural",
                    "cache": false
                }
            ]
        }
    ]
}

Published on January 13th, 2025

Author
Joaquim Cardona
Joaquim Cardona Senior Internet business executive with more than 20 years of broad experience in Internet business, media sector, digital marketing, online video and mobile technologies.