Xiaotong: Text to Speech

Article tags:

azurewuu-CN-XiaotongNeuralttstext to speechgallery

Xiaotong voice

The voice Xiaotong is available in the Azure Text-to-Speech service for the Wu Chinese language.

Voice Name: Xiaotong

Voice ID: wuu-CN-XiaotongNeural

Language: Wu Chinese

Gender: Female

Words Per Minute: 238

How to use Xiaotong voice in your videos

To use Xiaotong voice in your videos, you can use the following JSON2Video code:

JSON
PHP
NodeJS

{
    "type": "voice",
    "model": "azure",
    "voice": "wuu-CN-XiaotongNeural",
    "text": "在春天时节，花园里开满了五光十色的花朵，还有歌唱的小鸟。老橡树为游客提供了阴凉之处，蝴蝶在玫瑰花丛中翩翩起舞。一座小喷泉发出悠扬的声音，使这里成为放松和享受大自然之美的完美场所。"
}

$scene->addElement([
'voice',
'azure',
'wuu-CN-XiaotongNeural',
'在春天时节，花园里开满了五光十色的花朵，还有歌唱的小鸟。老橡树为游客提供了阴凉之处，蝴蝶在玫瑰花丛中翩翩起舞。一座小喷泉发出悠扬的声音，使这里成为放松和享受大自然之美的完美场所。'
]);

scene.addElement([
"voice",
"azure",
"wuu-CN-XiaotongNeural",
"在春天时节，花园里开满了五光十色的花朵，还有歌唱的小鸟。老橡树为游客提供了阴凉之处，蝴蝶在玫瑰花丛中翩翩起舞。一座小喷泉发出悠扬的声音，使这里成为放松和享受大自然之美的完美场所。"
]);

Xiaotong supports SSML

SSML stands for Speech Synthesis Markup Language. It's a way to add instructions to your text so that a Text-To-Speech (TTS) system knows how to read it aloud.

You use SSML like HTML, but for controlling speech. It helps you adjust things like: Pronunciation, Pauses, Pitch and Volume, Emphasis, Speaking Rate.

JSON
PHP
NodeJS

{
    "type": "voice",
    "voice": "wuu-CN-XiaotongNeural",
    "text": "<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
}

$scene->addElement([
'voice',
'wuu-CN-XiaotongNeural',
'<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>'
]);

scene.addElement([
"voice",
"wuu-CN-XiaotongNeural",
"<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
]);

Xiaotong is a neural voice

In Azure Cognitive Services, a Neural voice refers to a voice generated using neural network technology. This means the Text-To-Speech system uses advanced machine learning models to create more natural, human-like speech compared to traditional methods.

Key characteristics of Neural voices:

More expressive and realistic
Better at handling pitch, tone, and rhythm variations
Sounds closer to how humans naturally speak

Use Azure voices in your videos with JSON2Video

JSON2Video lets you create videos programmatically with TTS voiceover, subtitles, images and effects. Add any Azure voice to your videos via the API, no-code tools or the CLI.