Improving transcriptions

The speech to text solutions have dramatically improved in the last few years getting to high levels of accuracy. However, there are still some issues that can affect the accuracy of the transcriptions:

The speaker is not speaking clearly or the audio is not of good quality.
The speech uses names or acronyms that are specific to an industry or field.
Brand names or product names that are not well known.

To improve the accuracy of the transcriptions you can use the keywords and replace properties in the subtitles element settings.

keywords is a list of words or phrases that you want to be recognized as a single entity. For example, for a brand name like JSON2Video, STT services usually transcribe it as "JSON to video" instead of "JSON2Video". Adding the word "JSON2Video" to the keywords property will tell the STT service to transcribe it as "JSON2Video".

replace is a list of words or phrases that you want to be replaced with a different word or phrase. In this case, the replacement happens after the transcription is done.

Example:

JSON
PHP
NodeJS

{
    "type": "subtitles",
    "settings": {
        "style": "classic",
        "keywords": "JSON2Video, voiceover",
        "replace": "movie object: Movie Object, scene: Scene"
    }
}

Manually correcting transcriptions

You can also manually correct the transcriptions downloading the transcription file and editing it.

Along with the final video URL, the API provides a link to the ASS file used for the subtitles. You can download the file and edit it with a text editor.

Once it's amended, you render again the video passing the ASS file in the captions property.

JSON
PHP
NodeJS

{
    "type": "subtitles",
    "captions": "https://example.com/subtitles.ass",
    "settings": {
        "style": "classic"
    }
}

Improving transcription accuracy

← API Tutorial

Subtitles element

Manually correcting transcriptions