Skip to main content

Real-Time SaaS Release Notes

This page documents updates to Real-Time SaaS including details of new and updated features, bug fixes, known issues, and deprecated functionality.

2024-04-25

Improvements
  • Accuracy improvements for Romanian
Fixes
  • Fix for issue affecting recognition of English words ending in 'erm'

2024-02-21

New
  • New transcription language: Hebrew (he)
Improvements
  • Improved accuracy when transcribing audio with periods of silence
Fixes
  • Fix for profanity tagging in bilingual Spanish & English
  • Fixes for specific transcription accuracy issues in English, German, Swedish and Norwegian
  • Fix for unknown speaker labels on pure punctuation finals with speaker diarization

2024-01-29

Improvements
  • Improved session initialization behaviour with faster connections for the Enhanced operating point:
    • Reduced time between the client StartRecognition message and the API's RecognitionStarted response for Enhanced sessions
    • Reduced variance in time between StartRecognition and RecognitionStarted for both Enhanced and Standard sessions

2023 Release notes

2023-12-20

  • Spanish & English bilingual transcription now available (language='es' with domain='bilingual-en'). Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. Requires the domain config to be set.

2023-12-13

  • Ursa models released for 46 additional languages. Ursa models are now available for all 49 supported languages, bringing improvements to both Standard and Enhanced operating points:
    • Major transcription accuracy gains
    • Major improvement in Speaker Diarization accuracy
  • New transcription language: Persian (fa)
  • Improved models for English transcription (Standard and Enhanced operating points):
    • Enhanced transcription of disfluencies in English. The model now more accurately captures common disfluencies like "um" and "uh". This change makes our ASR even more accurate for verbatim transcription, great for use cases such as audio editing, analytics on hesitations for call centers and legal transcription. For details on how to identify disfluencies in output, see the documentation here
    • More accurate transcription of short utterances of the word "I" in English
    • More accurate transcription of acronyms in English
    • Improved English transcription accuracy around capitalization
  • Transcription accuracy improvements for German (including Swiss and Austrian) and French (including French Canadian)
  • Appropriate punctuation is now provided for finals after a pause in speech, improving transcription for downstream workflows such as translation

2023-11-30

  • Ursa models for French (fr), bringing improvements to both Standard and Enhanced operating points:
    • Major transcription accuracy gains
    • Major improvement in Speaker Diarization accuracy

2023-11-01

  • Language code no longer required in the WebSocket handshake request URI
    • The language code is now only specified in the transcription config
    • Any language code provided in the handshake request URI is ignored

2023-10-30

  • Ursa models for Spanish (es), bringing improvements to both Standard and Enhanced operating points:
    • Major transcription accuracy gains
    • Major improvement in Speaker Diarization accuracy

2023-07-20

  • Improved speaker diarization accuracy for noisy audio (English only, Standard and Enhanced operating points)
  • Fix for transcribed words returned during non-speech audio when Custom Dictionary is used

2023-06-27

  • Major improvement in Speaker Diarization accuracy for English (Standard and Enhanced operating points)
  • Improved transcription accuracy for Basque, Belarusian, Estonian, Mongolian, Thai, Vietnamese, and Welsh
  • Improvements to capitalization for English transcription
  • Fix for zero-duration word timings

2023-05-18

  • Introducing the new Real-Time Translation, tightly integrated with transcription in a single API. Translate your speech to one or more languages. Refer to documentation here to get started
  • Translation will be offered at no additional cost until 31st May 2023
  • Translate speech to and from English for 34 languages
  • The JSON-v2 output version is now 2.9

2023-04-28

The following improvements have been released to all Real-Time SaaS users:

  • Major accuracy gains for English transcription (Standard and Enhanced operating points)
  • Major improvement to Speaker Diarization accuracy for English (Standard and Enhanced operating points)
  • Improved numeral formatting in English
  • Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
  • Alphanumerics now have upper-case letters
  • Added regional handling for en-AU and en-US output locale to keep 'pounds' as words
  • A number of other improvements and fixes for better readability
  • Resolved an issue where words would occasionally be fully upper-cased
  • Resolved an issue where decades from "twenties" to "nineties" could be incorrectly transcribed in some contexts

2023-03-02

The following changes have been released to on-demand users and are coming soon to enterprise customers:

  • Major accuracy gains for English transcription (Standard and Enhanced operating points)
  • Major improvement to Speaker Diarization accuracy for English (Standard and Enhanced operating points)
  • Improved numeral formatting in English
  • Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
  • Alphanumerics now have upper-case letters
  • Added regional handling for en-AU and en-US output locale to keep 'pounds' as words
  • A number of other improvements and fixes for better readability
  • Resolved an issue where words would occasionally be fully upper-cased

2023-02-13

  • Improved error handling for the WebSocket handshake when starting a session:
    • Previously, errors such as failed authentication or exceeded quota were communicated through a HTTP error response to the client. Now, only malformed handshake GET requests will result in a HTTP error response (400 Bad Request or 405 Method Not Allowed). Otherwise, the client will receive a 101 Switching Protocols HTTP response and any errors will be communicated by an in-band WebSocket error message followed by a WebSocket close handshake message. This enables browser clients to understand the cause of handshake errors which was previously not possible from a HTTP response. Please see the Real-Time API reference for full details of the WebSocket call flow and possible errors.
  • New error message types: quota_exceeded and timelimit_exceeded
  • Improved RT SaaS server timeout behaviour:
    • The server will keep a session alive for at least 60 seconds if no messages are received from the client. Previously, sessions were closed after 20 seconds of inactivity. We recommend that clients use a ping interval of 20 to 60 seconds.

2023-01-19

  • Fix potential degradation in accuracy for multi-hour transcription sessions with low max_delay values

2022 Release notes

2022-11-30

  • Remodelled German (de) language pack to utilize subwords, separating words into smaller segments to reduce word error rate
  • Language vocabulary improvements for Latvian (lv), Swedish (sv), Hungarian (hu), Portuguese (pt), Polish (pl), Mandarin Chinese (cmn), Arabic (ar), Dutch (nl), Slovak (sk), Bulgarian (bg), Romanian (ro), Slovenian (sl), Lithuanian (It), Croatian (hr), Malay (ms), Catalan (ca), Czech (cs), Danish (da), Greek (el), Turkish (tr), French (fr), Italian (it), Hindi (hi), Korean (ko)
  • Improved formatting of numeric entities such as dates, currencies and large numbers for Swedish (sv), Norwegian (no), and Dutch (nl)
  • The JSON-v2 output version is now 2.8. Specific changes are:
    • Additional language pack information has been added to the RecognitionStarted websocket message. There is now more detailed information about properties of the language being used, such as writing direction and word delimiter.
    • We now also record the correct attachment direction for punctuation (e.g. before or after a space) in a new attaches_to field.

2022-09-13

  • 14 new languages: Bashkir, Basque, Belarusian, Esperanto, Estonian, Galician, Interlingua, Marathi, Mongolian, Tamil, Thai, Uyghur, Vietnamese, and Welsh
  • Resolved an issue where the French word où (where) is recognised as ou (or)

2022-07-19

  • New language Ukrainian (uk)
  • 16 Languages updated with additional punctuation marks for improved readability
    • The following languages now support (. ? , !): Bulgarian, Catalan, Czech, Greek, Finnish, Croatian, Hungarian, Lithuanian, Latvian, Norwegian, Polish, Romanian, Slovak, Slovenian, Ukrainian, Korean
  • Improved accuracy for French, including more data for Canadian French (fr-ca)
  • Improved accuracy for Portuguese, including more data for Brazilian Portuguese (pt-br)
  • Improved accuracy in Standard operating point for Romanian, Hungarian, Danish, Slovakian, Croatian, Bulgarian, Finnish, Slovenian, Lithuanian
  • Updated Danish, Norwegian and Swedish to remove undesired character sets
  • Improved accuracy in localised spelling for English output locale feature
  • Improved accuracy of percentage symbol recognition in French
  • Fixes for English and Italian written form numeric entities
  • Fix for issue where end times of words could be before the start time in some cases

2022-05-31

  • New Cantonese (yue) and Indonesian (id) language packs
  • Max delay has a new configuration option called max_delay_mode: max_delay_mode defaults to flexible which introduces a change in max delay behaviour to improve accuracy of entities. To maintain previous behaviour set max_delay_mode to fixed
  • Update punctuation marks for the following languages; note that other languages will not see a change in outstanding punctuation marks
    • Japanese (。 、)
    • Italian (. ? , !)
    • Portuguese (. ? , !)
    • Russian (. ? , !)
    • Mandarin (。 ? ! 、)
    • Hindi (। ? , !)
  • Improved accuracy for all 31 language packs, with gains for both Standard and Enhanced operating points
  • Improved formatting of numeric entities such as dates, currencies and large numbers for the following 11 languages: - Cantonese (yue), Chinese Mandarin (cmn), English (en), French (fr), German (de), Hindi (hi), Italian (it), Japanese (ja), Portuguese (pt), Russian (ru), Spanish (es). Additional metadata about the entities can be requested by using the new enable_entities config parameter
  • Improvements to custom dictionary functionality including a reduction in false positives
  • The JSON-v2 output version is now 2.7
  • Non-breaking spaces are now possible in a single word