Add Subtitles with Speech-to-Text
With Cinamaker you can add Subtitles to your multi-camera video recordings and live streams with the help of AI and Speech-To-Text. Customize subtitles while you are live, or when using the in-app editor; change position, fonts, text colors, background, speed, and more.
How it works
This document describes how to create and include Subtitles using Speech-to-Text from the Cinamaker Media panel.
- Add Subtitles as a Media object in your production
- Configure Subtitles like any Cinamaker Media object
- Begin a session with one or more audio sources selected for Speech-to-Text transcription
- Speech-to-Text transcription appears as Subtitles during your session by activating/deactivating the eyeball adjacent to the Subtitles media object during your sessions
Add Subtitles from the Media Panel
1. Launch Cinamaker Director Studio and start a New Session
2. From the Main interface go to the Media Tab and click/tap Add (you will see a list of available Media options)
3. Select Subtitles from the list of Media options
4. The Subtitles configuration popup will appear
See below for how to configure: Sources, Language, Font, Fill, Advanced
5. When the configuration is finished, Click to Save (upper-right corner of popup window)
6. When ready to activate Subtitles through Speech-to-Text, tap the adjacent eyeball icon to enable/disable. Then Cinamaker will begin to transcribe the audio coming from one or more selected audio sources and the Subtitles will begin to appear on the Program screen.
7. The Program screen will begin to populate Subtitles as your session proceeds and the dialogue flows.
Speech-to-Text Configurations
Cinamaker Speech-to-Text can be configured in a number of ways. Read the following to determine the best way to create Subtitles for your unique needs.
NOTE: The quality of audio input is a key factor for successful Speech-to-Text recognition. Therefore, be sure that the Volume levels and proximity of each person to their microphone are monitored for optimal results. Headphones are always optimal for monitoring audio.
Audio Source Configuration:
Mixed audio: All Audio Sources
By default, speech recognition and transcription will be applied to each audio source and generate Subtitles in real time.
Individual audio: Selected Audio Sources
Alternatively, individual audio sources can be selected and transcription will be applied to only those audio sources and generate Subtitles in real-time.
- When individually selected, speech recognition will be applied to all audio/video sources including pre-recorded video objects.
- When volume levels of audio/video sources = 0 (audio/video source is muted), then this audio/video source will not be heard in the live audio mix, however this Source(s) will be transcribed.
- You can not concurrently select Mixed Audio when individual Sources are selected.
- A list of available audio/video sources will be displayed under Sources. By default, speech recognition and transcription will be applied to each audio source and generate Subtitles in real time.
- Alternatively, individual audio sources can be selected and transcription will be applied to only those audio sources and generate Subtitles in real-time.
- When one (1) or more audio source is selected, then speech recognition and transcription and Subtitles will be applied to that single (or more) audio/video source(s).
- When more than one audio source is selected, then Speech-to-Text transcription and Subtitles will be applied to those selected audio sources.
- When one or more Source(s) is selected then transcribing will not be applied to pre-recorded video objects
- When transcribing a pre-recorded video being played as a Media object is desired, select Mixed audio
Display title
When this option is selected, the Source name will be displayed before the transcribed text on the Main output screen after saving but will not be visible on the Preview Screen.
Languages
Default language: English (United States).
Voice recognition supported languages: Danish, Dutch, English, English(Australia), English(United Kingdom), English(India), English(New Zealand), English(United States), Flemish, French, French(Canada), Italian, German, Norwegian, Polish, Portuguese, Portuguese(Brazil), Portuguese(Portugal), Spanish, Spanish (Latin America), Ukrainian.
There is no language auto-detection
Default: English (United States)
NOTE: It is possible to assign different languages to different subtitles.
Fonts & Fills
Font
Default font-size: 10
Fill
Same as Text Media objects.
Advanced
Advanced configuration changes will be visible after saving them and navigating to the Main screen.
Subtitles direction: Default value: Up
The direction of transcribed blocks of text default in the UP direction.
Display time per average word
Display time for a single transcribed word can be configured.
Values: 0.01s - 0.09s
Minimum Display time
The minimum display time for a phrase can be configured between 2.0s - 10.0s