/
Real-time Container
/
Release Notes

Real-time Container

High Level Summary

This release provides new improved language packs for all Speechmatics' commercially available languages. Two new language packs Cantonese (yue) and Indonesian (id) are released. This release improves on existing punctuation in several of these languages, as well as existing custom dictionary features. Common concepts in 11 words called entities are now output in a consistent and predictable fashion. Additional data about these entities can be requested via the API. Transcript segments can flex in length when an entity is detected to ensure accurate output, but fixed behaviour can also now be requested using a new API parameter.

Important Notices

It is now necessary to use processors that support Advanced Vector Extensions 2 (AVX2) when running the container in order to take advantage of latest performance optimisations.

It is also recommended when using the enhanced model to use hardware that supports the AVX512_VNNI flag for optimal processing performance. For more information please see the quick start guide.

What's New

2.0.0

  • Improved accuracy for all 31 language packs. Gains will be for both standard and enhanced operating points
  • New Cantonese (yue) and Indonesian (id) language packs
  • Improved formatting of numeric entities such as dates, currencies and large numbers for 11 languages, which are as following
    • Cantonese (yue)
    • Chinese Mandarin (cmn)
    • English (en)
    • French (fr)
    • German (de)
    • Hindi (hi)
    • Italian (it)
    • Japanese (ja)
    • Portuguese (pt)
    • Russian (ru)
    • Spanish (es)
  • Additional metadata about the entities can be requested by using the new enable_entities config parameter. For more information please see our documentation for entities here
  • Max delay has a new configuration option called max_delay_mode
    • max_delay_mode defaults to flexible which introduces a change in max delay behaviour to improve accuracy of entities. To maintain previous behaviour set max_delay_mode to fixed.
  • Improvements to custom dictionary functionality. Custom dictionary entries should now have less false positives
  • Languages with updated punctuation marks
    • Japanese (。 、)
    • Italian (. ? , !)
    • Portuguese (. ? , !)
    • Russian (. ? , !)
    • Mandarin (。 ? ! 、)
    • Hindi (। ? , !)
    • All other languages will not see a change in outstanding punctuation marks
  • The JSON-v2 output format version is now 2.7
  • The transcription can now output words containing non-breaking spaces as a single result

Known Limitations

The following are known issues in this release:

Issue IDSummaryDetailed Description and Possible Workarounds
REQ-10634Putting "-" as an item in additional vocab configuration will cause the container to failDo not enter just a "-" on its own in Custom Dictionary either as an additional vocab item or in the sounds_like property. Hyphens are still supported when entered as part of phrases or words
REQ-13240Chinese (cmn) container crashes occasionally when using certain additional vocabularyDo not use whitespace characters in additional vocabulary sounds_like
REQ-16256Audio Swapping between 8kHz and 16kHz causes memory leakRepeatedly audio swapping between 8kHz and 16kHz files can cause an increase in memory over very long periods that causes the container to crash. If memory usage in this scenario becomes excessive it is recommended to restart the container

Resolved Issues

The following is a list of any resolved issues within this release:

Issue IDSummaryResolution Description
REQ-17771Wide-space Unicode characters in Custom Dictionary cause a jobs to failThis is now fixed and wide-spaced characters should be accepted

Supported Languages

These are the General Availability (GA) release notes for the Real-time ASR container images. Following languages are supported:

LanguageISO Code
Arabicar
Bulgarianbg
Catalanca
Mandarincmn
Czechcs
Danishda
Germande
Greekel
Global Englishen
Global Spanishes
Finnishfi
Frenchfr
Hindihi
Croatianhr
Hungarianhu
Indonesianid
Italianit
Japaneseja
Koreanko
Lithuanianlt
Latvianlv
Malayms
Dutchnl
Norwegianno
Polishpl
Portuguesept
Romanianro
Russianru
Slovakiansk
Sloveniansl
Swedishsv
Turkishtr
Cantoneseyue

Container images are labelled using the following scheme, where language codes adhere the ISO-639 standard:

rt-asr-transcriber-<language>:<version>

For example,

rt-asr-transcriber-en:2.0.0

Supported Platforms

Docker 17.06.0+

Installation

Pull the container image from the Speechmatics Docker registry.

Prerequisites

  • Docker (17.06.0 or above).
  • Login credentials (URL, username and password) for the Speechmatics Docker registry.