Masaru: Text to Speech

Article tags:

azureja-JP-MasaruMultilingualNeuralttstext to speechgallery

Masaru voice

The voice Masaru is available in the Azure Text-to-Speech service for the Japanese language.

Voice Name: Masaru

Voice ID: ja-JP-MasaruMultilingualNeural

Language: Japanese

Gender: Male

Words Per Minute: 190

How to use Masaru voice in your videos

To use Masaru voice in your videos, you can use the following JSON2Video code:

JSON
PHP
NodeJS

{
    "type": "voice",
    "model": "azure",
    "voice": "ja-JP-MasaruMultilingualNeural",
    "text": "春になると、庭は色とりどりの花や鳴き鳥でいっぱいになります。古いオークの木は訪れる人々に日陰を提供し、バラの間で蝶々が踊ります。小さな噴水は静かな音を奏で、自然の美しさを楽しんでリラックスするのに最適なスポットとなります。"
}

$scene->addElement([
'voice',
'azure',
'ja-JP-MasaruMultilingualNeural',
'春になると、庭は色とりどりの花や鳴き鳥でいっぱいになります。古いオークの木は訪れる人々に日陰を提供し、バラの間で蝶々が踊ります。小さな噴水は静かな音を奏で、自然の美しさを楽しんでリラックスするのに最適なスポットとなります。'
]);

scene.addElement([
"voice",
"azure",
"ja-JP-MasaruMultilingualNeural",
"春になると、庭は色とりどりの花や鳴き鳥でいっぱいになります。古いオークの木は訪れる人々に日陰を提供し、バラの間で蝶々が踊ります。小さな噴水は静かな音を奏で、自然の美しさを楽しんでリラックスするのに最適なスポットとなります。"
]);

Masaru supports SSML

SSML stands for Speech Synthesis Markup Language. It's a way to add instructions to your text so that a Text-To-Speech (TTS) system knows how to read it aloud.

You use SSML like HTML, but for controlling speech. It helps you adjust things like: Pronunciation, Pauses, Pitch and Volume, Emphasis, Speaking Rate.

JSON
PHP
NodeJS

{
    "type": "voice",
    "voice": "ja-JP-MasaruMultilingualNeural",
    "text": "<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
}

$scene->addElement([
'voice',
'ja-JP-MasaruMultilingualNeural',
'<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>'
]);

scene.addElement([
"voice",
"ja-JP-MasaruMultilingualNeural",
"<speak>Hello, <break time="500ms"/> how are you today? <emphasis level="strong">This is important!</emphasis></speak>"
]);

Masaru is a neural voice

In Azure Cognitive Services, a Neural voice refers to a voice generated using neural network technology. This means the Text-To-Speech system uses advanced machine learning models to create more natural, human-like speech compared to traditional methods.

Key characteristics of Neural voices:

More expressive and realistic
Better at handling pitch, tone, and rhythm variations
Sounds closer to how humans naturally speak

Masaru is a multilingual voice

A Multilingual voice in Azure Cognitive Services refers to a voice that can speak in multiple languages or accents while maintaining a consistent speaking style.

Key points about Multilingual voices:

The same voice can pronounce text in different languages accurately
Useful for applications that switch between languages or need to handle multilingual content
Maintains natural tone and consistency, even when switching languages