The feature to generate voiceovers with AI models is a terrific feature that allows you to create engaging videos with ease.
JSON2Video API allows you to directly use AI models from the API to generate voiceovers for your videos simplifying a lot the creation process. You don't need to generate voiceovers calling external APIs or services, you just need to provide the parameters for the AI model and JSON2Video will handle the rest.
Generating voiceovers with AI models
To generate voiceovers with AI models, you use the Voice Element specifying the model to use and the parameters for the AI model.
The current supported models are elevenlabs
and Microsoft azure
.
Using Microsoft Azure model
The following example shows how to generate a voiceover using the azure
model.
{
"resolution": "full-hd",
"scenes": [
{
"elements": [
{
"type": "voice",
"model": "azure",
"text": "That's one small step for a man, one giant leap for mankind.",
"voice": "en-GB-SoniaNeural"
}
]
}
]
}
The AI generation related properties are:
model
: The AI model to use for the generation.text
: The text to use for the generation. This is the text that will be used to generate the voiceover.voice
: The voice to use for the generation. Check all the available voices for Microsoft Azure model here.
The azure
model is included for free in the JSON2Video API.
Using ElevenLabs model
Now, if we want to use elevenlabs
model instead of azure
, we can do it by changing the model
property.
{
"resolution": "full-hd",
"scenes": [
{
"elements": [
{
"type": "voice",
"model": "elevenlabs",
"text": "That's one small step for a man, one giant leap for mankind.",
"voice": "Brian"
}
]
}
]
}
For the voice
property, you can check all the available voices for ElevenLabs model here.
Be aware that the elevenlabs
model will consume credits just for generating the voiceover.
Check how credits are consumed here.
Voiceover caching
Voiceovers generated with AI models are cached in JSON2Video servers to avoid calling the AI models for the same voiceover multiple times. This means that if you call the JSON2Video API with the same parameters for the same voiceover multiple times, the same voiceover will be used in the video, avoiding the need to call the AI models again.
This is good because if you re-render the same video multiple times, the voiceover will be cached and reused, avoiding consuming credits for generating the same voiceover multiple times.
But if for any reason you need to regenerate a voiceover, you can do it by setting the cache
property to false
.
Example:
{
"resolution": "full-hd",
"scenes": [
{
"elements": [
{
"type": "voice",
"model": "azure",
"text": "That's one small step for a man, one giant leap for mankind.",
"voice": "en-GB-SoniaNeural",
"cache": false
}
]
}
]
}
Published on January 13th, 2025