Convert text to voice

The Voice elements allow you to easily add voice-over to your videos by simply indicating the text to be spoken and the type of voice (and language) to be used.

JSON2Video uses Microsoft Azure's Text-To-Speech service to achieve the most natural voices and the widest variety of languages and accents.

Check the full list of available voices and languages.

In the following examples we will see how we can include voice elements in our videos.

Simple voice over

In this example we will use the default voice to add a short voice-over to a still image video:

JSON
PHP
NodeJS

{
    "resolution": "full-hd",
    "quality": "high",
    "scenes": [
        {
            "comment": "Scene #1",
            "elements": [
                {
                    "type": "image",
                    "src": "https://assets.json2video.com/assets/images/space-apollo11-01.jpg",
                    "scale": {
                        "width": 1920,
                        "height": 1280
                    },
                    "zoom": 5
                },
                {
                    "type": "voice",
                    "text": "That's one small step for a man, one giant leap for mankind. Upon taking a \"small step\" onto the surface of the moon in 1969, Neil Armstrong uttered what would become one of history's most famous one-liners.",
                    "start": 1.5
                }
            ]
        }
    ]
}

require 'vendor/autoload.php';

use JSON2Video\Movie;
use JSON2Video\Scene;

// Create and initialize the movie object
$movie = new Movie;
$movie->setAPIKey(YOUR_API_KEY);
$movie->resolution = 'full-hd';
$movie->quality = 'high';

// Create the scenes of the movie

// Create SCENE 1
$scene1 = new Scene;
$scene1->comment = 'Scene #1';
$scene1->addElement([
	'type' => 'image',
	'src' => 'https://assets.json2video.com/assets/images/space-apollo11-01.jpg',
	'scale' => [
		'width' => 1920,
		'height' => 1280
	],
	'zoom' => 5
]);
$scene1->addElement([
	'type' => 'voice',
	'text' => 'That\'s one small step for a man, one giant leap for mankind. Upon taking a "small step" onto the surface of the moon in 1969, Neil Armstrong uttered what would become one of history\'s most famous one-liners.',
	'start' => 1.5
]);
$movie->addScene($scene1);

// Finally, render the movie
$movie->render();

// Wait for the movie to be rendered
$movie->waitToFinish();

let movie = new Movie;
movie.setAPIKey(YOUR_API_KEY);
movie.set("resolution", "full-hd");
movie.set("quality", "high");

// Create the scenes of the movie

// Create SCENE 1
let scene1 = new Scene;
scene1.set("comment", "Scene #1");
scene1.addElement({
	"type": "image",
	"src": "https://assets.json2video.com/assets/images/space-apollo11-01.jpg",
	"scale": {
		"width": 1920,
		"height": 1280
	},
	"zoom": 5
});
scene1.addElement({
	"type": "voice",
	"text": "That's one small step for a man, one giant leap for mankind. Upon taking a \"small step\" onto the surface of the moon in 1969, Neil Armstrong uttered what would become one of history's most famous one-liners.",
	"start": 1.5
});
movie.addScene(scene1);

// Finally, render the movie
movie.render();

// Wait for the movie to be rendered
movie.waitToFinish();

The resulting video is:

The video uses one scene with 2 elements:

The first element is an image re-scaled down to 1920x1280 to keep the original 3:2 aspect ratio
The second element is a voice element with the voice-over text that starts 1.5 seconds from the beginning of the scene

In this example, we are not indicating the voice to use, so it uses the default value for the voice field: en-GB-LibbyNeural.

Using multiple voices

In this example, we will use two voices in two different languages to showcase the Voice element features.

JSON
PHP
NodeJS

{
    "resolution": "full-hd",
    "quality": "high",
    "scenes": [
        {
            "comment": "Scene #1",
            "elements": [
                {
                    "type": "image",
                    "src": "https://assets.json2video.com/assets/images/woman-01.jpg",
                    "y": -100
                },
                {
                    "type": "voice",
                    "text": "Hello Diego! Could you please introduce yourself in Italian?",
                    "voice": "en-US-AriaNeural",
                    "start": 1
                }
            ]
        },
        {
            "comment": "Scene #2",
            "elements": [
                {
                    "type": "image",
                    "src": "https://assets.json2video.com/assets/images/man-01.jpg",
                    "y": -100
                },
                {
                    "type": "voice",
                    "text": "S\u00ec, certo, Aria. Mi chiamo Diego Rossi e sono di Firenze.",
                    "voice": "it-IT-DiegoNeural"
                }
            ]
        }
    ]
}

require 'vendor/autoload.php';

use JSON2Video\Movie;
use JSON2Video\Scene;

// Create and initialize the movie object
$movie = new Movie;
$movie->setAPIKey(YOUR_API_KEY);
$movie->resolution = 'full-hd';
$movie->quality = 'high';

// Create the scenes of the movie

// Create SCENE 1
$scene1 = new Scene;
$scene1->comment = 'Scene #1';
$scene1->addElement([
	'type' => 'image',
	'src' => 'https://assets.json2video.com/assets/images/woman-01.jpg',
	'y' => -100
]);
$scene1->addElement([
	'type' => 'voice',
	'text' => 'Hello Diego! Could you please introduce yourself in Italian?',
	'voice' => 'en-US-AriaNeural',
	'start' => 1
]);
$movie->addScene($scene1);

// Create SCENE 2
$scene2 = new Scene;
$scene2->comment = 'Scene #2';
$scene2->addElement([
	'type' => 'image',
	'src' => 'https://assets.json2video.com/assets/images/man-01.jpg',
	'y' => -100
]);
$scene2->addElement([
	'type' => 'voice',
	'text' => 'Sì, certo, Aria. Mi chiamo Diego Rossi e sono di Firenze.',
	'voice' => 'it-IT-DiegoNeural'
]);
$movie->addScene($scene2);

// Finally, render the movie
$movie->render();

// Wait for the movie to be rendered
$movie->waitToFinish();

let movie = new Movie;
movie.setAPIKey(YOUR_API_KEY);
movie.set("resolution", "full-hd");
movie.set("quality", "high");

// Create the scenes of the movie

// Create SCENE 1
let scene1 = new Scene;
scene1.set("comment", "Scene #1");
scene1.addElement({
	"type": "image",
	"src": "https://assets.json2video.com/assets/images/woman-01.jpg",
	"y": -100
});
scene1.addElement({
	"type": "voice",
	"text": "Hello Diego! Could you please introduce yourself in Italian?",
	"voice": "en-US-AriaNeural",
	"start": 1
});
movie.addScene(scene1);

// Create SCENE 2
let scene2 = new Scene;
scene2.set("comment", "Scene #2");
scene2.addElement({
	"type": "image",
	"src": "https://assets.json2video.com/assets/images/man-01.jpg",
	"y": -100
});
scene2.addElement({
	"type": "voice",
	"text": "Sì, certo, Aria. Mi chiamo Diego Rossi e sono di Firenze.",
	"voice": "it-IT-DiegoNeural"
});
movie.addScene(scene2);

// Finally, render the movie
movie.render();

// Wait for the movie to be rendered
movie.waitToFinish();

The resulting video is:

The video simulates a short conversation between an English-speaking woman and an Italian-speaking man.

In the first scene, we add an image of the woman, a text element with the subtitle and a Voice element with the English text
In the second scene, we add an image of the man, the subtitle and the Voice element with the Italian text

Changing the pace of the voice

You can use a few tags to change the pace of the voice:

<super-slow>: makes the voice very slow
<slow>: makes the voice a bit slower
<normal>: makes the voice normal speed
<fast>: makes the voice a bit faster
<super-fast>: makes the voice very fast

Just wrap the text with the tags to apply the voice change. Examples:

JSON
PHP
NodeJS

{
    "resolution": "full-hd",
    "quality": "high",
    "scenes": [
        {
            "comment": "Scene #1",
            "elements": [
                {
                    "type": "voice",
                    "text": "That's one small step for a man, <super-slow>one giant leap for mankind</super-slow>. <fast>Upon taking a \"small step\" onto the surface of the moon in 1969</fast>, Neil Armstrong uttered what would become <slow>one of history's most famous one-liners</slow>.",
                    "start": 1.5
                }
            ]
        }
    ]
}

require 'vendor/autoload.php';

use JSON2Video\Movie;
use JSON2Video\Scene;

// Create and initialize the movie object
$movie = new Movie;
$movie->setAPIKey(YOUR_API_KEY);
$movie->resolution = 'full-hd';
$movie->quality = 'high';

// Create the scenes of the movie

// Create SCENE 1
$scene1 = new Scene;
$scene1->comment = 'Scene #1';
$scene1->addElement([
	'type' => 'voice',
	'text' => 'That\'s one small step for a man, &lt;super-slow&gt;one giant leap for mankind&lt;/super-slow&gt. &lt;fast&gt;Upon taking a "small step" onto the surface of the moon in 1969&lt;/fast&gt, Neil Armstrong uttered what would become &lt;slow&gt;one of history\'s most famous one-liners&lt;/slow&gt.',
	'start' => 1.5
]);
$movie->addScene($scene1);

// Finally, render the movie
$movie->render();

// Wait for the movie to be rendered
$movie->waitToFinish();

let movie = new Movie;
movie.setAPIKey(YOUR_API_KEY);
movie.set("resolution", "full-hd");
movie.set("quality", "high");

// Create the scenes of the movie

// Create SCENE 1
let scene1 = new Scene;
scene1.set("comment", "Scene #1");
scene1.addElement({
	"type": "voice",
	"text": "That's one small step for a man, &lt;super-slow&gt;one giant leap for mankind&lt;/super-slow&gt. &lt;fast&gt;Upon taking a \"small step\" onto the surface of the moon in 1969&lt;/fast&gt, Neil Armstrong uttered what would become &lt;slow&gt;one of history's most famous one-liners&lt;/slow&gt.",
	"start": 1.5
});
movie.addScene(scene1);

// Finally, render the movie
movie.render();

// Wait for the movie to be rendered
movie.waitToFinish();

Expressing emotion

You can also add an emotion to the voice over by using tags.

These are the supported emotions:

<ad>
<advertisement_upbeat>
<affectionate>
<angry>
<assistant>
<calm>
<chat>
<cheerful>
<customerservice>
<depressed>
<disgruntled>
<documentary-narration>
<embarrassed>
<empathetic>
<envious>
<excited>
<fearful>
<friendly>
<gentle>
<hopeful>
<lyrical>
<narration-professional>
<narration-relaxed>
<newscast>
<newscast-casual>
<newscast-formal>
<poetry-reading>
<sad>
<serious>
<shouting>
<sports_commentary>
<sports_commentary_excited>
<whispering>
<terrified>
<unfriendly>

Example:

JSON
PHP
NodeJS

{
    "resolution": "full-hd",
    "quality": "high",
    "scenes": [
        {
            "comment": "Scene #1",
            "elements": [
                {
                    "type": "voice",
                    "voice": "en-US-AriaNeural",
                    "text": "<cheerful>\"That's remarkable! You're a genius!\"</cheerful> Mom said to her son.",
                    "start": 1.5
                }
            ]
        }
    ]
}

require 'vendor/autoload.php';

use JSON2Video\Movie;
use JSON2Video\Scene;

// Create and initialize the movie object
$movie = new Movie;
$movie->setAPIKey(YOUR_API_KEY);
$movie->resolution = 'full-hd';
$movie->quality = 'high';

// Create the scenes of the movie

// Create SCENE 1
$scene1 = new Scene;
$scene1->comment = 'Scene #1';
$scene1->addElement([
	'type' => 'voice',
	'voice' => 'en-US-AriaNeural',
	'text' => '&lt;cheerful&gt;"That\'s remarkable! You\'re a genius!"&lt;/cheerful&gt; Mom said to her son.',
	'start' => 1.5
]);
$movie->addScene($scene1);

// Finally, render the movie
$movie->render();

// Wait for the movie to be rendered
$movie->waitToFinish();

let movie = new Movie;
movie.setAPIKey(YOUR_API_KEY);
movie.set("resolution", "full-hd");
movie.set("quality", "high");

// Create the scenes of the movie

// Create SCENE 1
let scene1 = new Scene;
scene1.set("comment", "Scene #1");
scene1.addElement({
	"type": "voice",
	"voice": "en-US-AriaNeural",
	"text": "&lt;cheerful&gt;\"That's remarkable! You're a genius!\"&lt;/cheerful&gt; Mom said to her son.",
	"start": 1.5
});
movie.addScene(scene1);

// Finally, render the movie
movie.render();

// Wait for the movie to be rendered
movie.waitToFinish();

Using SSML

Finally, you can use SSML tags to express more complex nuances.

Example:

JSON
PHP
NodeJS

{
    "resolution": "full-hd",
    "quality": "high",
    "scenes": [
        {
            "comment": "Scene #1",
            "elements": [
                {
                    "type": "voice",
                    "voice": "en-US-AriaNeural",
                    "text": "<mstts:express-as style=\"cheerful\">\"That's remarkable! You're a genius!\"</mstts:express-as><break time=\"600ms\" />Mom said to her son.",
                    "start": 1.5
                }
            ]
        }
    ]
}

require 'vendor/autoload.php';

use JSON2Video\Movie;
use JSON2Video\Scene;

// Create and initialize the movie object
$movie = new Movie;
$movie->setAPIKey(YOUR_API_KEY);
$movie->resolution = 'full-hd';
$movie->quality = 'high';

// Create the scenes of the movie

// Create SCENE 1
$scene1 = new Scene;
$scene1->comment = 'Scene #1';
$scene1->addElement([
	'type' => 'voice',
	'voice' => 'en-US-AriaNeural',
	'text' => '&lt;mstts:express-as style="cheerful"&gt;"That\'s remarkable! You\'re a genius!"&lt;/mstts:express-as&gt;&lt;break time="600ms" /&gt;Mom said to her son.',
	'start' => 1.5
]);
$movie->addScene($scene1);

// Finally, render the movie
$movie->render();

// Wait for the movie to be rendered
$movie->waitToFinish();

let movie = new Movie;
movie.setAPIKey(YOUR_API_KEY);
movie.set("resolution", "full-hd");
movie.set("quality", "high");

// Create the scenes of the movie

// Create SCENE 1
let scene1 = new Scene;
scene1.set("comment", "Scene #1");
scene1.addElement({
	"type": "voice",
	"voice": "en-US-AriaNeural",
	"text": "&lt;mstts:express-as style=\"cheerful\"&gt;\"That's remarkable! You're a genius!\"&lt;/mstts:express-as&gt;&lt;break time=\"600ms\" /&gt;Mom said to her son.",
	"start": 1.5
});
movie.addScene(scene1);

// Finally, render the movie
movie.render();

// Wait for the movie to be rendered
movie.waitToFinish();

Balancing music and voice volume

When you want to add music and narration to a video, you typically need to adjust the volume so that the voice can be heard clearly. The best option is to keep the voice at its original volume and reduce the volume of the music.

Read this section in the audio elements documentation.

Voice elements

← Documentation