The speech to text solutions have dramatically improved in the last few years getting to high levels of accuracy. However, there are still some issues that can affect the accuracy of the transcriptions:

To improve the accuracy of the transcriptions you can use the keywords and replace properties in the subtitles element settings.

keywords is a list of words or phrases that you want to be recognized as a single entity. For example, for a brand name like JSON2Video, STT services usually transcribe it as "JSON to video" instead of "JSON2Video". Adding the word "JSON2Video" to the keywords property will tell the STT service to transcribe it as "JSON2Video".

replace is a list of words or phrases that you want to be replaced with a different word or phrase. In this case, the replacement happens after the transcription is done.

Example:

{
    "type": "subtitles",
    "settings": {
        "style": "classic",
        "keywords": "JSON2Video, voiceover",
        "replace": "movie object: Movie Object, scene: Scene"
    }
}

Manually correcting transcriptions

You can also manually correct the transcriptions downloading the transcription file and editing it.

Along with the final video URL, the API provides a link to the ASS file used for the subtitles. You can download the file and edit it with a text editor.

Once it's amended, you render again the video passing the ASS file in the captions property.

{
    "type": "subtitles",
    "captions": "https://example.com/subtitles.ass",
    "settings": {
        "style": "classic"
    }
}