SSML stands for Speech Synthesis Markup Language. It's a way to add instructions to your text so that a Text-To-Speech (TTS) system knows how to read it aloud.
You use SSML like HTML, but for controlling speech. It helps you adjust things like: Pronunciation, Pauses, Pitch and Volume, Emphasis, Speaking Rate.
{
"type": "voice",
"voice": "or-IN-SukantNeural",
"text": "<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
}
$scene->addElement([
'type' => 'voice',
'voice' => 'or-IN-SukantNeural',
'text' => '<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>'
]);
scene.addElement({
"type": "voice",
"voice": "or-IN-SukantNeural",
"text": "<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
});
Sukant is a neural voice
In Azure Cognitive Services, a Neural voice refers to a voice generated using neural network technology. This means the Text-To-Speech system uses advanced machine learning models to create more natural, human-like speech compared to traditional methods.
Key characteristics of Neural voices:
More expressive and realistic
Better at handling pitch, tone, and rhythm variations