Skip to main content

Real-Time Container Release Notes

10.5.1

Fixed​

  • Fix for Automatic Usage Reporting for Containers
  • Security fixes

10.5.0

New

  • New transcription language - Persian (fa)
  • New bilingual Spanish and English container - this enables Spanish and English to be transcribed accurately within the same audio stream. To pull the new container see here. Only available for GPU
  • New GPU Ursa models - all 49 languages are now available on GPU
    • Major transcription accuracy gains
    • Major improvement in Speaker Diarization accuracy
    • Faster transcription
    • Arabic (ar), Bashkir (ba), Basque (eu), Belarusian (be), Bulgarian (bg), Cantonese (yue), Catalan (ca), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), Esperanto (eo), Estonian (et), Finnish (fi), Galician (gl), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Interlingua (ia), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Malay (ms), Mandarin (cmn), Marathi (mr), Mongolian (mn), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Tamil (ta), Thai (th), Turkish (tr), Ukrainian (uk), Uyghur (ug), Vietnamese (vi), Welsh (cy) with associated GPU Inference Container

Improved​

  • GPU & CPU
    • Improved models for English transcription (Standard and Enhanced operating points):
      • Enhanced transcription of disfluencies in English. The model now more accurately captures common disfluencies like "um" and "uh". This change makes our ASR even more accurate for verbatim transcription, great for use cases such as audio editing, analytics on hesitations for call centers and legal transcription. For details on how to identify disfluencies in output, see the documentation here
      • More accurate transcription of short utterances of the word "I" in English
      • More accurate transcription of acronyms in English
    • Appropriate punctuation is now provided for finals after a pause in speech, improving transcription for downstream workflows such as translation
  • CPU
    • Significantly improved transcription accuracy for English
    • Significantly improved transcription accuracy for Norwegian

Fixed​

  • Fixed an error with custom dictionary when the content is only a "-"
  • Security fixes

Known Issues

  • When Speaker Diarization is enabled, occasionally punctuation can be labeled as Speaker:UU (unknown speaker)

10.4.0

Improved

  • GPU LM Inference request optimization changes for better performance

Known Issues

We are aware of two gRPC vulnerabilities in the Container, analysis indicates that our product is not exposed since these apply to gRPC servers, but the transcriber Container only uses gRPC clients. It's currently not possible for us to upgrade the library due to a known functional issue with the new version. As soon as the functional issue is resolved we will upgrade the version. These are the two CVE's

  • CVE-2023-32731: Improper HTTP2 parsing when header size is exceeded
  • CVE-2023-1428: Denial of service from abort() triggered by specific headers

10.3.0

New​

  • Added support for Translation On-Prem in real-time, tightly integrated with Transcription. Translate your audio to one or more languages through a single API call
    • Translate speech to and from English for 34 languages
    • Translate from Norwegian Bokmål to Nynorsk
    • Requires the new Translation GPU Inference Container to be deployed, see here to understand how to deploy the Container
    • Documentation here shows how to get started using Translation once deployed

Improved​

  • Improved Speaker Diarization accuracy for noisy audio (English GPU only, Standard and Enhanced operating points)

Fixed​

  • Security patches
  • Fix for transcribed words returned during non-speech audio when Custom Dictionary is used

Known Issues

We are aware of two gRPC vulnerabilities in the Container, analysis indicates that our product is not exposed since these apply to gRPC servers, but the transcriber Container only uses gRPC clients. It's currently not possible for us to upgrade the library due to a known functional issue with the new version. As soon as the functional issue is resolved we will upgrade the version. These are the two CVE's

  • CVE-2023-32731: Improper HTTP2 parsing when header size is exceeded
  • CVE-2023-1428: Denial of service from abort() triggered by specific headers

10.2.0

New​

  • GPU support for French, Spanish and German with associated GPU Inference Container
  • Major accuracy improvement to Speaker Diarization for GPU supported languages

Improved​

  • CPU - Improved transcription accuracy for Basque, Belarusian, Estonian, Mongolian, Thai, Vietnamese, and Welsh
  • GPU - Improvements to capitalization for English transcription

Fixed​

  • GPU LM memory optimization changes for better performance
  • Security patches

10.1.0

New

  • Combined Container for both CPU transcription in all languages and GPU transcription in English. Note that GPU transcription requires running both the CPU Container and the GPU Inference Container.
  • Automatic Usage Reporting for Containers, see our documentation for more details
  • The JSON-v2 output version is now 2.9

Improved

  • Language vocabulary improvements for French (fr), Italian (it), Hindi (hi), and Korean (ko)
  • Remodelled German (de) language pack to utilize subwords, separating words into smaller segments to reduce Word Error Rate (WER)
  • Improved numeral formatting in English
    • Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
    • Alphanumerics now have upper-case letters
    • Added regional handling for en-AU and en-US output locale to keep 'pounds' as words
    • A number of other improvements and fixes for better readability

Fixed

  • Fix for missing accented characters in Dutch transcription
  • Security fixes
  • Fix potential degradation in accuracy for multi-hour transcription sessions with low max_delay values

10.0.0

Warning

This version should only be used alongside the GPU Inference Container. For more information and implementation detail, see here.

  • GPU based inference, only for English
  • Major accuracy gains for transcription (Standard and Enhanced operating points)
  • Major efficiency and speed gains, particularly for the Standard operating point
  • Improved Speaker Diarization accuracy for English (Standard and Enhanced operating points)
    • Improved Numeral Formatting in English
    • Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
    • Alphanumerics now have upper-case letters
    • Added regional handling for en-AU and en-US output locale to keep 'pounds' as words
    • A number of other improvements and fixes for better readability
  • Resolved an issue where words would occasionally be fully upper-cased
  • Fix potential degradation in accuracy for multi-hour transcription sessions with low max_delay values

2.2.0

New

  • 14 new languages: Bashkir, Basque, Belarusian, Esperanto, Estonian, Galician, Interlingua, Marathi, Mongolian, Tamil, Thai, Uyghur, Vietnamese, and Welsh
  • The JSON-v2 output version is now 2.8, specific changes are:
    • Additional language pack information has been added to the RecognitionStarted websocket message. There is now more detailed information about properties of the language being used, such as writing direction and word delimiter.
    • We now also record the correct attachment direction for punctuation (e.g. before or after a space) in a new attaches_to field.

Improved

  • Improved accuracy for 20 languages: Latvian (lv), Swedish (sv), Hungarian (hu), Portuguese (pt), Polish (pl), Mandarin Chinese (cmn), Arabic (ar), Dutch (nl), Slovak (sk), Bulgarian (bg), Romanian (ro), Slovenian (sl), Lithuanian (It), Croatian (hr), Malay (ms), Catalan (ca), Czech (cs), Danish (da), Greek (el), Turkish (tr)
  • Improved formatting of numeric entities such as dates, currencies and large numbers for Swedish (sv), Norwegian (no), and Dutch (nl).

Fixed

  • Fix for accurately handling "p" as "pence" when transcribing currency in English (en).
  • Fix for handling small denominator fractions in Italian (it) and not converting to similar English homonyms e.g. "un terzo" being converted to "1/3".

Known Limitations

Issue IDSummaryDetailed Description and Possible Workarounds
REQ-1409Proteus HCL with <unk> causes out of memory errorA custom dictionary list that contains the word <unk> causes the worker to crash.
REQ-10160Advanced punctuation for Spanish (es) does not contain inverted marks.Inverted marks [ ¿ ¡ ] are not currently available for Spanish advanced punctuation.
REQ-10627Double full stops when acronym is at the end of the sentenceIf there is an acronym at the end of the sentence, then a double full stop will be output, for example: "team G.B.."
REQ-10634Putting "-" as an item in additional vocab configuration will cause the container to failDo not enter just a "-" on its own in Custom Dictionary either as an additional vocab item or in the sounds_like property. Hyphens are still supported when entered as part of phrases or words

Supported Platforms

Docker (17.06.0+) running on Ubuntu, Debian, Fedora or CentOS