Xiaoyi: Text to Speech

Article tags:

azurezh-CN-XiaoyiNeuralttstext to speechgallery

Xiaoyi voice

The voice Xiaoyi is available in the Azure Text-to-Speech service for the Chinese language.

Voice Name: Xiaoyi

Voice ID: zh-CN-XiaoyiNeural

Language: Chinese

Gender: Female

Words Per Minute: 263

How to use Xiaoyi voice in your videos

To use Xiaoyi voice in your videos, you can use the following JSON2Video code:

JSON
PHP
NodeJS

{
    "type": "voice",
    "model": "azure",
    "voice": "zh-CN-XiaoyiNeural",
    "text": "在春天，花园里充满了五颜六色的花朵和鸣禽。 古老的橡树为访客提供阴凉，而蝴蝶在玫瑰丛中跳舞。 一个小小的喷泉发出宁静的声音，使这里成为放松并享受大自然美丽的完美场所。"
}

$scene->addElement([
'voice',
'azure',
'zh-CN-XiaoyiNeural',
'在春天，花园里充满了五颜六色的花朵和鸣禽。 古老的橡树为访客提供阴凉，而蝴蝶在玫瑰丛中跳舞。 一个小小的喷泉发出宁静的声音，使这里成为放松并享受大自然美丽的完美场所。'
]);

scene.addElement([
"voice",
"azure",
"zh-CN-XiaoyiNeural",
"在春天，花园里充满了五颜六色的花朵和鸣禽。 古老的橡树为访客提供阴凉，而蝴蝶在玫瑰丛中跳舞。 一个小小的喷泉发出宁静的声音，使这里成为放松并享受大自然美丽的完美场所。"
]);

Xiaoyi supports SSML

SSML stands for Speech Synthesis Markup Language. It's a way to add instructions to your text so that a Text-To-Speech (TTS) system knows how to read it aloud.

You use SSML like HTML, but for controlling speech. It helps you adjust things like: Pronunciation, Pauses, Pitch and Volume, Emphasis, Speaking Rate.

JSON
PHP
NodeJS

{
    "type": "voice",
    "voice": "zh-CN-XiaoyiNeural",
    "text": "<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
}

$scene->addElement([
'voice',
'zh-CN-XiaoyiNeural',
'<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>'
]);

scene.addElement([
"voice",
"zh-CN-XiaoyiNeural",
"<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
]);

Xiaoyi supports different voice styles

As part of SSML, you can use the style tags to change the voice style.

Xiaoyi supports these styles: affectionate angry cheerful disgruntled embarrassed fearful gentle sad serious

JSON
PHP
NodeJS

{
    "type": "voice",
    "voice": "zh-CN-XiaoyiNeural",
    "text": "<whispering>I have a secret for you</whispering>"
}

Xiaoyi is a neural voice

In Azure Cognitive Services, a Neural voice refers to a voice generated using neural network technology. This means the Text-To-Speech system uses advanced machine learning models to create more natural, human-like speech compared to traditional methods.

Key characteristics of Neural voices:

More expressive and realistic
Better at handling pitch, tone, and rhythm variations
Sounds closer to how humans naturally speak