Kai: Text to Speech

Article tags:

azureen-US-KaiNeuralttstext to speechgallery

Kai voice

The voice Kai is available in the Azure Text-to-Speech service for the English language.

Voice Name: Kai

Voice ID: en-US-KaiNeural

Language: English

Gender: Male

Words Per Minute: unknown

How to use Kai voice in your videos

To use Kai voice in your videos, you can use the following JSON2Video code:

JSON
PHP
NodeJS

{
    "type": "voice",
    "model": "azure",
    "voice": "en-US-KaiNeural",
    "text": "In springtime, the garden comes alive with colorful flowers and singing birds. The old oak tree provides shade for visitors, while butterflies dance among the roses. A small fountain creates peaceful sounds, making this the perfect spot to relax and enjoy nature's beauty."
}

$scene->addElement([
'voice',
'azure',
'en-US-KaiNeural',
'In springtime, the garden comes alive with colorful flowers and singing birds. The old oak tree provides shade for visitors, while butterflies dance among the roses. A small fountain creates peaceful sounds, making this the perfect spot to relax and enjoy nature\'s beauty.'
]);

scene.addElement([
"voice",
"azure",
"en-US-KaiNeural",
"In springtime, the garden comes alive with colorful flowers and singing birds. The old oak tree provides shade for visitors, while butterflies dance among the roses. A small fountain creates peaceful sounds, making this the perfect spot to relax and enjoy nature's beauty."
]);

Kai supports SSML

SSML stands for Speech Synthesis Markup Language. It's a way to add instructions to your text so that a Text-To-Speech (TTS) system knows how to read it aloud.

You use SSML like HTML, but for controlling speech. It helps you adjust things like: Pronunciation, Pauses, Pitch and Volume, Emphasis, Speaking Rate.

JSON
PHP
NodeJS

{
    "type": "voice",
    "voice": "en-US-KaiNeural",
    "text": "<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
}

$scene->addElement([
'voice',
'en-US-KaiNeural',
'<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>'
]);

scene.addElement([
"voice",
"en-US-KaiNeural",
"<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
]);

Kai supports different voice styles

As part of SSML, you can use the style tags to change the voice style.

Kai supports these styles: conversation

JSON
PHP
NodeJS

{
    "type": "voice",
    "voice": "en-US-KaiNeural",
    "text": "<whispering>I have a secret for you</whispering>"
}

Kai is a neural voice

In Azure Cognitive Services, a Neural voice refers to a voice generated using neural network technology. This means the Text-To-Speech system uses advanced machine learning models to create more natural, human-like speech compared to traditional methods.

Key characteristics of Neural voices:

More expressive and realistic
Better at handling pitch, tone, and rhythm variations
Sounds closer to how humans naturally speak

Use Azure voices in your videos with JSON2Video

JSON2Video lets you create videos programmatically with TTS voiceover, subtitles, images and effects. Add any Azure voice to your videos via the API, no-code tools or the CLI.

Get a free API key Voiceover tutorial No-code with Make