The Voice elements allow you to easily add voice-over to your videos by simply indicating the text to be spoken and the type of voice (and language) to be used.
JSON2Video uses Microsoft Azure's Text-To-Speech service to achieve the most natural voices and the widest variety of languages and accents.
Learn more about Microsoft voices and explore the full list of available voices and supported languages.
In the following examples we will see how we can include voice elements in our videos.
Simple voice over
In this example we will use the default voice to add a short voice-over to a still image video:
{
"resolution": "full-hd",
"quality": "high",
"scenes": [
{
"comment": "Scene #1",
"elements": [
{
"type": "image",
"src": "https://assets.json2video.com/assets/images/space-apollo11-01.jpg",
"scale": {
"width": 1920,
"height": 1280
},
"zoom": 5
},
{
"type": "voice",
"text": "That's one small step for a man, one giant leap for mankind. Upon taking a \"small step\" onto the surface of the moon in 1969, Neil Armstrong uttered what would become one of history's most famous one-liners.",
"start": 1.5
}
]
}
]
}
The resulting video is:
The video uses one scene with 2 elements:
- The first element is an image re-scaled down to
1920x1280
to keep the original 3:2 aspect ratio - The second element is a voice element with the voice-over text that starts
1.5
seconds from the beginning of the scene
In this example, we are not indicating the voice to use, so it uses the default value for the voice
field: en-GB-LibbyNeural
.
Using multiple voices
In this example, we will use two voices in two different languages to showcase the Voice element features.
{
"resolution": "full-hd",
"quality": "high",
"scenes": [
{
"comment": "Scene #1",
"elements": [
{
"type": "image",
"src": "https://assets.json2video.com/assets/images/woman-01.jpg",
"y": -100
},
{
"type": "voice",
"text": "Hello Diego! Could you please introduce yourself in Italian?",
"voice": "en-US-AriaNeural",
"start": 1
}
]
},
{
"comment": "Scene #2",
"elements": [
{
"type": "image",
"src": "https://assets.json2video.com/assets/images/man-01.jpg",
"y": -100
},
{
"type": "voice",
"text": "Sì, certo, Aria. Mi chiamo Diego Rossi e sono di Firenze.",
"voice": "it-IT-DiegoNeural"
}
]
}
]
}
The resulting video is:
The video simulates a short conversation between an English-speaking woman and an Italian-speaking man.
- In the first scene, we add an image of the woman, a text element with the subtitle and a Voice element with the English text
- In the second scene, we add an image of the man, the subtitle and the Voice element with the Italian text
Changing the pace of the voice
You can use a few tags to change the pace of the voice:
<super-slow>
: makes the voice very slow<slow>
: makes the voice a bit slower<normal>
: makes the voice normal speed<fast>
: makes the voice a bit faster<super-fast>
: makes the voice very fast
Just wrap the text with the tags to apply the voice change. Examples:
{
"resolution": "full-hd",
"quality": "high",
"scenes": [
{
"comment": "Scene #1",
"elements": [
{
"type": "voice",
"text": "That's one small step for a man, <super-slow>one giant leap for mankind</super-slow>. <fast>Upon taking a \"small step\" onto the surface of the moon in 1969</fast>, Neil Armstrong uttered what would become <slow>one of history's most famous one-liners</slow>.",
"start": 1.5
}
]
}
]
}
Expressing emotion
You can also add an emotion to the voice over by using tags.
These are the supported emotions:
<ad>
<advertisement_upbeat>
<affectionate>
<angry>
<assistant>
<calm>
<chat>
<cheerful>
<customerservice>
<depressed>
<disgruntled>
<documentary-narration>
<embarrassed>
<empathetic>
<envious>
<excited>
<fearful>
<friendly>
<gentle>
<hopeful>
<lyrical>
<narration-professional>
<narration-relaxed>
<newscast>
<newscast-casual>
<newscast-formal>
<poetry-reading>
<sad>
<serious>
<shouting>
<sports_commentary>
<sports_commentary_excited>
<whispering>
<terrified>
<unfriendly>
Example:
{
"resolution": "full-hd",
"quality": "high",
"scenes": [
{
"comment": "Scene #1",
"elements": [
{
"type": "voice",
"voice": "en-US-AriaNeural",
"text": "<cheerful>\"That's remarkable! You're a genius!\"</cheerful> Mom said to her son.",
"start": 1.5
}
]
}
]
}
Using SSML
Finally, you can use SSML tags to express more complex nuances.
Example:
{
"resolution": "full-hd",
"quality": "high",
"scenes": [
{
"comment": "Scene #1",
"elements": [
{
"type": "voice",
"voice": "en-US-AriaNeural",
"text": "<mstts:express-as style=\"cheerful\">\"That's remarkable! You're a genius!\"</mstts:express-as><break time=\"600ms\" />Mom said to her son.",
"start": 1.5
}
]
}
]
}
Balancing music and voice volume
When you want to add music and narration to a video, you typically need to adjust the volume so that the voice can be heard clearly. The best option is to keep the voice at its original volume and reduce the volume of the music.