Real-Time Container Release Notes

10.7.0 [2024-08-22]

New

GPU & CPU
- Audio Filtering: pre-process audio to remove low-volume background speech which might otherwise be detected and transcribed. Refer to documentation here to get started
- Disfluency removal: automatically remove disfluencies from your transcript. Refer to documentation here to get started

Removed

The legacy Speaker Change Detection feature is now obsolete. Any sessions using the speaker_change and channel_and_speaker_change parameters will be rejected

Improvements

GPU
- Initial improvements from our Ursa2 accuracy uplift
  - Improved transcription accuracy and updated vocabulary for 31 languages (Enhanced Operating Point only): Bashkir (ba), Basque (eu), Belarusian (be), Bulgarian (bg), Cantonese (yue), Catalan (ca), Danish (da), Esperanto (eo), Estonian (et), Finnish (fi), French (fr), Galician (gl), Greek (el), Hindi (hi), Indonesian (id), Interlingua (ia), Japanese (ja), Korean (ko), Latvian (lv), Malay (ms), Marathi (mr), Mongolian (mn), Norwegian (no), Romanian (ro), Slovenian (sl), Spanish (es), Swedish (sv), Turkish (tr), Ukrainian (uk), Uyghur (ug), Vietnamese (vi)
  - Updated vocabulary for English (Enhanced Operating Point only)
- Improved music detection accuracy in Audio Events
GPU & CPU
- Lower-latency Finals: the minimum allowed value of the max_delay parameter has been reduced from 2 to 0.7, enabling lower-latency final transcripts. Refer to documentation here for more details
- Improved transcription accuracy around endpoints, especially for lower values of max_delay
- When a transcription Final does not contain words which appeared in previous Partials, an AddPartialTranscript message containing the missing words is now sent immediately after the Final
- Start and end times in AddTranscript and AddPartialTranscript messages are now always rounded to 2 decimal places

Fixes

Written form for negative percentages in German transcription is now output as "%" instead of "Prozent"

10.6.0 [2024-05-24]

New

GPU & CPU
- New transcription language - Hebrew (he)
- Automatic Usage Reporting is now enabled by default
GPU
- Audio Events: Detection of music, laughter and applause in media files now supported. Refer to documentation here to get started

Improvements

GPU & CPU

Accuracy improvements for Romanian (ro)
Improved accuracy when transcribing audio with periods of silence

Fixes

Fix for profanity tagging in bilingual Spanish & English
Fixes for specific transcription accuracy issues in English, German, Swedish and Norwegian
Fix for unknown speaker labels on pure punctuation finals with speaker diarization
Fix for issue affecting recognition of English words ending in 'erm'
Security fixes

10.5.1 [2024-03-01]

Fixes

Fix for Automatic Usage Reporting for Containers
Security fixes

10.5.0 [2023-12-15]

New

New transcription language - Persian (fa)
New bilingual Spanish and English container - this enables Spanish and English to be transcribed accurately within the same audio stream. To pull the new container see here. Only available for GPU
New GPU Ursa models - all 49 languages are now available on GPU
- Major transcription accuracy gains
- Major improvement in Speaker Diarization accuracy
- Faster transcription
- Arabic (ar), Bashkir (ba), Basque (eu), Belarusian (be), Bulgarian (bg), Cantonese (yue), Catalan (ca), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), Esperanto (eo), Estonian (et), Finnish (fi), Galician (gl), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Interlingua (ia), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Malay (ms), Mandarin (cmn), Marathi (mr), Mongolian (mn), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Tamil (ta), Thai (th), Turkish (tr), Ukrainian (uk), Uyghur (ug), Vietnamese (vi), Welsh (cy) with associated GPU Inference Container

Improvements

GPU & CPU
- Improved models for English transcription (Standard and Enhanced operating points):
  - Enhanced transcription of disfluencies in English. The model now more accurately captures common disfluencies like "um" and "uh". This change makes our ASR even more accurate for verbatim transcription, great for use cases such as audio editing, analytics on hesitations for call centers and legal transcription. For details on how to identify disfluencies in output, see the documentation here
  - More accurate transcription of short utterances of the word "I" in English
  - More accurate transcription of acronyms in English
- Appropriate punctuation is now provided for finals after a pause in speech, improving transcription for downstream workflows such as translation
CPU
- Significantly improved transcription accuracy for English
- Significantly improved transcription accuracy for Norwegian

Fixes

Fixed an error with custom dictionary when the content is only a "-"
Security fixes

Known Issues

When Speaker Diarization is enabled, occasionally punctuation can be labeled as Speaker:UU (unknown speaker)

10.4.0 [2023-08-24]

Improvements

GPU LM Inference request optimization changes for better performance

Known Issues

We are aware of two gRPC vulnerabilities in the Container, analysis indicates that our product is not exposed since these apply to gRPC servers, but the transcriber Container only uses gRPC clients. It's currently not possible for us to upgrade the library due to a known functional issue with the new version. As soon as the functional issue is resolved we will upgrade the version. These are the two CVE's

CVE-2023-32731: Improper HTTP2 parsing when header size is exceeded
CVE-2023-1428: Denial of service from abort() triggered by specific headers

10.3.0 [2023-07-31]

New

Added support for Translation On-Prem in real-time, tightly integrated with Transcription. Translate your audio to one or more languages through a single API call

Translate speech to and from English for 34 languages
Translate from Norwegian Bokmål to Nynorsk
Requires the new Translation GPU Inference Container to be deployed, see here to understand how to deploy the Container
Documentation here shows how to get started using Translation once deployed

Improvements

Improved Speaker Diarization accuracy for noisy audio (English GPU only, Standard and Enhanced operating points)

Fixes

Security patches
Fix for transcribed words returned during non-speech audio when Custom Dictionary is used

Known Issues

CVE-2023-32731: Improper HTTP2 parsing when header size is exceeded
CVE-2023-1428: Denial of service from abort() triggered by specific headers

10.2.0 [2023-06-06]

New

GPU support for French, Spanish and German with associated GPU Inference Container
Major accuracy improvement to Speaker Diarization for GPU supported languages

Improvements

CPU - Improved transcription accuracy for Basque, Belarusian, Estonian, Mongolian, Thai, Vietnamese, and Welsh
GPU - Improvements to capitalization for English transcription

Fixes

GPU LM memory optimization changes for better performance
Security patches

10.1.0 [2023-05-10]

New

Combined Container for both CPU transcription in all languages and GPU transcription in English. Note that GPU transcription requires running both the CPU Container and the GPU Inference Container.
Automatic Usage Reporting for Containers, see our documentation for more details
The JSON-v2 output version is now 2.9

Improvements

Language vocabulary improvements for French (fr), Italian (it), Hindi (hi), and Korean (ko)
Remodelled German (de) language pack to utilize subwords, separating words into smaller segments to reduce Word Error Rate (WER)
Improved numeral formatting in English
- Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
- Alphanumerics now have upper-case letters
- Added regional handling for en-AU and en-US output locale to keep 'pounds' as words
- A number of other improvements and fixes for better readability

Fixes

Fix for missing accented characters in Dutch transcription
Security fixes
Fix potential degradation in accuracy for multi-hour transcription sessions with low max_delay values

10.0.0 [2023-03-13]

Warning

This version should only be used alongside the GPU Inference Container. For more information and implementation detail, see here.

GPU based inference, only for English
Major accuracy gains for transcription (Standard and Enhanced operating points)
Major efficiency and speed gains, particularly for the Standard Operating Point
Improved Speaker Diarization accuracy for English (Standard and Enhanced operating points)
- Improved Numeral Formatting in English
- Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
- Alphanumerics now have upper-case letters
- Added regional handling for en-AU and en-US output locale to keep 'pounds' as words
- A number of other improvements and fixes for better readability
Resolved an issue where words would occasionally be fully upper-cased
Fix potential degradation in accuracy for multi-hour transcription sessions with low max_delay values

2.2.0 [2022-10-17]

New

14 new languages: Bashkir, Basque, Belarusian, Esperanto, Estonian, Galician, Interlingua, Marathi, Mongolian, Tamil, Thai, Uyghur, Vietnamese, and Welsh
The JSON-v2 output version is now 2.8, specific changes are:
- Additional language pack information has been added to the RecognitionStarted websocket message. There is now more detailed information about properties of the language being used, such as writing direction and word delimiter.
- We now also record the correct attachment direction for punctuation (e.g. before or after a space) in a new attaches_to field.

Improvements

Improved accuracy for 20 languages: Latvian (lv), Swedish (sv), Hungarian (hu), Portuguese (pt), Polish (pl), Mandarin Chinese (cmn), Arabic (ar), Dutch (nl), Slovak (sk), Bulgarian (bg), Romanian (ro), Slovenian (sl), Lithuanian (It), Croatian (hr), Malay (ms), Catalan (ca), Czech (cs), Danish (da), Greek (el), Turkish (tr)
Improved formatting of numeric entities such as dates, currencies and large numbers for Swedish (sv), Norwegian (no), and Dutch (nl).

Fixes

Fix for accurately handling "p" as "pence" when transcribing currency in English (en).
Fix for handling small denominator fractions in Italian (it) and not converting to similar English homonyms e.g. "un terzo" being converted to "1/3".

Known Limitations

Issue ID	Summary	Detailed Description and Possible Workarounds
REQ-1409	Proteus HCL with `<unk>` causes out of memory error	A custom dictionary list that contains the word `<unk>` causes the worker to crash.
REQ-10160	Advanced punctuation for Spanish (es) does not contain inverted marks.	Inverted marks [ ¿ ¡ ] are not currently available for Spanish advanced punctuation.
REQ-10627	Double full stops when acronym is at the end of the sentence	If there is an acronym at the end of the sentence, then a double full stop will be output, for example: "team G.B.."
REQ-10634	Putting "-" as an item in `additional vocab` configuration will cause the container to fail	Do not enter just a "-" on its own in Custom Dictionary either as an additional vocab item or in the `sounds_like property`. Hyphens are still supported when entered as part of phrases or words

Supported Platforms

Docker (17.06.0+) running on Ubuntu, Debian, Fedora or CentOS

Real-Time Container Release Notes

10.7.0 [2024-08-22]​

10.6.0 [2024-05-24]​

10.5.1 [2024-03-01]​

10.5.0 [2023-12-15]​

Known Issues​

10.4.0 [2023-08-24]​

Known Issues​

10.3.0 [2023-07-31]​

Known Issues​

10.2.0 [2023-06-06]​

10.1.0 [2023-05-10]​

10.0.0 [2023-03-13]​

2.2.0 [2022-10-17]​

Known Limitations​

Supported Platforms​

10.7.0 [2024-08-22]

10.6.0 [2024-05-24]

10.5.1 [2024-03-01]

10.5.0 [2023-12-15]

Known Issues

10.4.0 [2023-08-24]

Known Issues

10.3.0 [2023-07-31]

Known Issues

10.2.0 [2023-06-06]

10.1.0 [2023-05-10]

10.0.0 [2023-03-13]

2.2.0 [2022-10-17]

Known Limitations

Supported Platforms