Best Azure voices
The voices below are some of the best and most popular voices available in the Microsoft Text to Speech service:
English (United Kingdom)
English (United States)
Spanish (Spain)
Spanish (Mexico)
Portuguese (Brazil)
Korean (South Korea)
Explore all voices in all languages
What is Azure Text to Speech?
Microsoft's Azure Text to Speech is a service that converts text into natural-sounding speech. It's a powerful tool for creating voice-enabled applications, such as virtual assistants, chatbots, and more.
Azure Text to Speech is part of the Azure Cognitive Services, which includes a range of AI services for speech recognition, natural language processing, and more.
Azure Text to Speech is available in multiple languages and voices, each with its own unique characteristics and accents.
What languages are supported?
Azure Text to Speech supports 82 languages, including English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, and more.
Check all supported voices by language for a complete list of languages supported by Azure Text to Speech.
How to use Azure Text to Speech?
JSON2Video API integrates with Azure Text to Speech to generate voiceovers and provide a wide range of voices and languages.
Basically, you can use the Voice element to generate voiceovers, set the model to azure
and the voice to the ID of the voice you want to use.
For example, to use the Sara voice, you can use the following code:
{
"type": "voice",
"model": "azure",
"voice": "en-US-SaraNeural",
"text": "Hello, how are you?"
}
Deep dive into how to use Azure Text to Speech with JSON2Video by reading the Voice elements tutorial.
How much does Azure Text to Speech cost?
Azure Text to Speech is a paid service by Microsoft Azure.
However, JSON2Video includes using Microsoft voices for free in all plans. This means that you can generate voiceovers for your videos, with the voices you want, without an Azure account or any cost.
Check the pricing page to learn more about the different JSON2Video plans and pricing for automated video generation.
How Microsoft voices compare to ElevenLabs voices?
Microsoft Text-to-Speech voices and ElevenLabs voices are both great tools for generating voiceovers. However, they have some differences that you should consider when choosing the right voice for your project.
Pros
- Microsoft Text-to-Speech voices support a wide range of languages and accents, much wider than ElevenLabs that in this aspect is quite limited. If you need a voice in a language that is not supported by ElevenLabs, you can use Microsoft voices.
- Microsoft voices support SSML, which allows you to customize the voice output with the tone, pitch, speed, pauses, emphasis, etc. ElevenLabs voices do not support SSML.
Cons
- Microsoft Text-to-Speech voices don't sound as natural as ElevenLabs voices. Best Microsoft voices sound like radio hosts, and those not that good may sound a bit robotic.
Conclusion
Microsoft Text-to-Speech voices is a great choice for generating voiceovers when language support or SSML customization is a priority.
JSON2Vidoe API integrates natively with Microsoft Text-to-Speech voices, so you can use them in your videos without any additional cost.