The speech to text solutions have dramatically improved in the last few years getting to high levels of accuracy. However, there are still some issues that can affect the accuracy of the transcriptions:
- The speaker is not speaking clearly or the audio is not of good quality.
- The speech uses names or acronyms that are specific to an industry or field.
- Brand names or product names that are not well known.
To improve the accuracy of the transcriptions you can use the keywords
and replace
properties in the subtitles element settings.
keywords
is a list of words or phrases that you want to be recognized as a single entity.
For example, for a brand name like JSON2Video, STT services usually transcribe it as "JSON to video" instead of "JSON2Video".
Adding the word "JSON2Video" to the keywords
property will tell the STT service to transcribe it as "JSON2Video".
replace
is a list of words or phrases that you want to be replaced with a different word or phrase.
In this case, the replacement happens after the transcription is done.
Example:
{
"type": "subtitles",
"settings": {
"style": "classic",
"keywords": "JSON2Video, voiceover",
"replace": "movie object: Movie Object, scene: Scene"
}
}
Manually correcting transcriptions
You can also manually correct the transcriptions downloading the transcription file and editing it.
Along with the final video URL, the API provides a link to the ASS file used for the subtitles. You can download the file and edit it with a text editor.
Once it's amended, you render again the video passing the ASS file in the captions
property.