Real-Time SaaS Release Notes

This page documents updates to Real-Time SaaS including details of new and updated features, bug fixes, known issues, and deprecated functionality.

2024-08-13

New

New languages: Irish (ga) and Maltese (mt)

Improvements

Ursa2 models released, giving a broad accuracy uplift across languages:

Enhanced operating point: all languages, including a major improvement for Arabic dialects
Standard operating point: Basque (eu), Estonian (et), Polish (pl), Swedish (sv), Tamil (ta), Turkish (tr), Uyghur (ug)

2024-08-01

Fixes

Fix for end-of-sentence punctuation not being sent in a transcription final message after a period of silence

2024-07-29

Removed

The legacy Speaker Change Detection feature is now obsolete. Any jobs using the speaker_change and channel_and_speaker_change parameters will be rejected

Fixes

Written form for negative percentages in German transcription is now output as "%" instead of "Prozent"

2024-07-10

Improvements

Lower-latency Finals: the minimum allowed value of the max_delay parameter has been reduced from 2 to 0.7, enabling lower-latency final transcripts. Refer to documentation here for more details

2024-06-25

New

Disfluency removal: automatically remove disfluencies from your transcript. Refer to documentation here to get started

Improvements

Initial improvements from our Ursa2 accuracy uplift; note that further improvements are on the way in the next few weeks
- Improved transcription accuracy and updated vocabulary for 31 languages (Enhanced Operating Point only): Bashkir (ba), Basque (eu), Belarusian (be), Bulgarian (bg), Cantonese (yue), Catalan (ca), Danish (da), Esperanto (eo), Estonian (et), Finnish (fi), French (fr), Galician (gl), Greek (el), Hindi (hi), Indonesian (id), Interlingua (ia), Japanese (ja), Korean (ko), Latvian (lv), Malay (ms), Marathi (mr), Mongolian (mn), Norwegian (no), Romanian (ro), Slovenian (sl), Spanish (es), Swedish (sv), Turkish (tr), Ukrainian (uk), Uyghur (ug), Vietnamese (vi)
- Updated vocabulary for English (Enhanced Operating Point only)
Improved transcription accuracy around endpoints, especially for lower values of max_delay
When a transcription Final does not contain words which appeared in previous Partials, an AddPartialTranscript message containing the missing words is now sent immediately after the Final
Start and end times in AddTranscript and AddPartialTranscript messages are now always rounded to 2 decimal places
Improved music detection accuracy in Audio Events

2024-06-17

New

Audio Filtering: pre-process audio to remove low-volume background speech which might otherwise be detected and transcribed. Refer to documentation here to get started

2024-06-13

Improvements

Faster session initialization: reduced time between the client StartRecognition message and the API's RecognitionStarted response for all sessions

2024-04-29

New

Audio Events: Detection of music, laughter and applause in real-time streams now supported. Refer to documentation here to get started

2024-04-25

Improvements

Accuracy improvements for Romanian (ro)

Fixes

Fix for issue affecting recognition of English words ending in 'erm'

2024-02-21

New

New transcription language: Hebrew (he)

Improvements

Improved accuracy when transcribing audio with periods of silence

Fixes

Fix for profanity tagging in bilingual Spanish & English
Fixes for specific transcription accuracy issues in English, German, Swedish and Norwegian
Fix for unknown speaker labels on pure punctuation finals with speaker diarization

2024-01-29

Improvements

Improved session initialization behaviour with faster connections for the Enhanced Operating Point:

Reduced time between the client StartRecognition message and the API's RecognitionStarted response for Enhanced sessions
Reduced variance in time between StartRecognition and RecognitionStarted for both Enhanced and Standard sessions

2023 Release notes

2023-12-20

Spanish & English bilingual transcription now available (language='es' with domain='bilingual-en'). Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. Requires the domain config to be set.

2023-12-13

Ursa models released for 46 additional languages. Ursa models are now available for all 49 supported languages, bringing improvements to both Standard and Enhanced operating points:
- Major transcription accuracy gains
- Major improvement in Speaker Diarization accuracy
New transcription language: Persian (fa)
Improved models for English transcription (Standard and Enhanced operating points):
- Enhanced transcription of disfluencies in English. The model now more accurately captures common disfluencies like "um" and "uh". This change makes our ASR even more accurate for verbatim transcription, great for use cases such as audio editing, analytics on hesitations for call centers and legal transcription. For details on how to identify disfluencies in output, see the documentation here
- More accurate transcription of short utterances of the word "I" in English
- More accurate transcription of acronyms in English
- Improved English transcription accuracy around capitalization
Transcription accuracy improvements for German (including Swiss and Austrian) and French (including French Canadian)
Appropriate punctuation is now provided for finals after a pause in speech, improving transcription for downstream workflows such as translation

2023-11-30

Ursa models for French (fr), bringing improvements to both Standard and Enhanced operating points:
- Major transcription accuracy gains
- Major improvement in Speaker Diarization accuracy

2023-11-01

Language code no longer required in the WebSocket handshake request URI
- The language code is now only specified in the transcription config
- Any language code provided in the handshake request URI is ignored

2023-10-30

Ursa models for Spanish (es), bringing improvements to both Standard and Enhanced operating points:
- Major transcription accuracy gains
- Major improvement in Speaker Diarization accuracy

2023-07-20

Improved speaker diarization accuracy for noisy audio (English only, Standard and Enhanced operating points)
Fix for transcribed words returned during non-speech audio when Custom Dictionary is used

2023-06-27

Major improvement in Speaker Diarization accuracy for English (Standard and Enhanced operating points)
Improved transcription accuracy for Basque, Belarusian, Estonian, Mongolian, Thai, Vietnamese, and Welsh
Improvements to capitalization for English transcription
Fix for zero-duration word timings

2023-05-18

Introducing the new Real-Time Translation, tightly integrated with transcription in a single API. Translate your speech to one or more languages. Refer to documentation here to get started
Translation will be offered at no additional cost until 31st May 2023
Translate speech to and from English for 34 languages
The JSON-v2 output version is now 2.9

2023-04-28

The following improvements have been released to all Real-Time SaaS users:

Major accuracy gains for English transcription (Standard and Enhanced operating points)
Major improvement to Speaker Diarization accuracy for English (Standard and Enhanced operating points)
Improved numeral formatting in English
Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
Alphanumerics now have upper-case letters
Added regional handling for en-AU and en-US output locale to keep 'pounds' as words
A number of other improvements and fixes for better readability
Resolved an issue where words would occasionally be fully upper-cased
Resolved an issue where decades from "twenties" to "nineties" could be incorrectly transcribed in some contexts

2023-03-02

The following changes have been released to on-demand users and are coming soon to enterprise customers:

Major accuracy gains for English transcription (Standard and Enhanced operating points)
Major improvement to Speaker Diarization accuracy for English (Standard and Enhanced operating points)
Improved numeral formatting in English
Improved formatting for common telephone numbers, measurements, websites, email addresses and credit cards
Alphanumerics now have upper-case letters
Added regional handling for en-AU and en-US output locale to keep 'pounds' as words
A number of other improvements and fixes for better readability
Resolved an issue where words would occasionally be fully upper-cased

2023-02-13

Improved error handling for the WebSocket handshake when starting a session:
- Previously, errors such as failed authentication or exceeded quota were communicated through a HTTP error response to the client. Now, only malformed handshake GET requests will result in a HTTP error response (400 Bad Request or 405 Method Not Allowed). Otherwise, the client will receive a 101 Switching Protocols HTTP response and any errors will be communicated by an in-band WebSocket error message followed by a WebSocket close handshake message. This enables browser clients to understand the cause of handshake errors which was previously not possible from a HTTP response. Please see the Real-Time API reference for full details of the WebSocket call flow and possible errors.
New error message types: quota_exceeded and timelimit_exceeded
Improved RT SaaS server timeout behaviour:
- The server will keep a session alive for at least 60 seconds if no messages are received from the client. Previously, sessions were closed after 20 seconds of inactivity. We recommend that clients use a ping interval of 20 to 60 seconds.

2023-01-19

Fix potential degradation in accuracy for multi-hour transcription sessions with low max_delay values

2022 Release notes

2022-11-30

Remodelled German (de) language pack to utilize subwords, separating words into smaller segments to reduce word error rate
Language vocabulary improvements for Latvian (lv), Swedish (sv), Hungarian (hu), Portuguese (pt), Polish (pl), Mandarin Chinese (cmn), Arabic (ar), Dutch (nl), Slovak (sk), Bulgarian (bg), Romanian (ro), Slovenian (sl), Lithuanian (It), Croatian (hr), Malay (ms), Catalan (ca), Czech (cs), Danish (da), Greek (el), Turkish (tr), French (fr), Italian (it), Hindi (hi), Korean (ko)
Improved formatting of numeric entities such as dates, currencies and large numbers for Swedish (sv), Norwegian (no), and Dutch (nl)
The JSON-v2 output version is now 2.8. Specific changes are:
- Additional language pack information has been added to the RecognitionStarted websocket message. There is now more detailed information about properties of the language being used, such as writing direction and word delimiter.
- We now also record the correct attachment direction for punctuation (e.g. before or after a space) in a new attaches_to field.

2022-09-13

14 new languages: Bashkir, Basque, Belarusian, Esperanto, Estonian, Galician, Interlingua, Marathi, Mongolian, Tamil, Thai, Uyghur, Vietnamese, and Welsh
Resolved an issue where the French word où (where) is recognised as ou (or)

2022-07-19

New language Ukrainian (uk)
16 Languages updated with additional punctuation marks for improved readability
- The following languages now support (. ? , !): Bulgarian, Catalan, Czech, Greek, Finnish, Croatian, Hungarian, Lithuanian, Latvian, Norwegian, Polish, Romanian, Slovak, Slovenian, Ukrainian, Korean
Improved accuracy for French, including more data for Canadian French (fr-ca)
Improved accuracy for Portuguese, including more data for Brazilian Portuguese (pt-br)
Improved accuracy in Standard Operating Point for Romanian, Hungarian, Danish, Slovakian, Croatian, Bulgarian, Finnish, Slovenian, Lithuanian
Updated Danish, Norwegian and Swedish to remove undesired character sets
Improved accuracy in localised spelling for English output locale feature
Improved accuracy of percentage symbol recognition in French
Fixes for English and Italian written form numeric entities
Fix for issue where end times of words could be before the start time in some cases

2022-05-31

New Cantonese (yue) and Indonesian (id) language packs
Max delay has a new configuration option called max_delay_mode: max_delay_mode defaults to flexible which introduces a change in max delay behaviour to improve accuracy of entities. To maintain previous behaviour set max_delay_mode to fixed
Update punctuation marks for the following languages; note that other languages will not see a change in outstanding punctuation marks
- Japanese (。、)
- Italian (. ? , !)
- Portuguese (. ? , !)
- Russian (. ? , !)
- Mandarin (。？！、)
- Hindi (। ? , !)
Improved accuracy for all 31 language packs, with gains for both Standard and Enhanced operating points
Improved formatting of numeric entities such as dates, currencies and large numbers for the following 11 languages: - Cantonese (yue), Chinese Mandarin (cmn), English (en), French (fr), German (de), Hindi (hi), Italian (it), Japanese (ja), Portuguese (pt), Russian (ru), Spanish (es). Additional metadata about the entities can be requested by using the new enable_entities config parameter
Improvements to custom dictionary functionality including a reduction in false positives
The JSON-v2 output version is now 2.7
Non-breaking spaces are now possible in a single word

Real-Time SaaS Release Notes

2024-08-13​

2024-08-01​

2024-07-29​

2024-07-10​

2024-06-25​

2024-06-17​

2024-06-13​

2024-04-29​

2024-04-25​

2024-02-21​

2024-01-29​

2023 Release notes​

2023-12-20​

2023-12-13​

2023-11-30​

2023-11-01​

2023-10-30​

2023-07-20​

2023-06-27​

2023-05-18​

2023-04-28​

2023-03-02​

2023-02-13​

2023-01-19​

2022 Release notes​

2022-11-30​

2022-09-13​

2022-07-19​

2022-05-31​

2024-08-13

2024-08-01

2024-07-29

2024-07-10

2024-06-25

2024-06-17

2024-06-13

2024-04-29

2024-04-25

2024-02-21

2024-01-29

2023 Release notes

2023-12-20

2023-12-13

2023-11-30

2023-11-01

2023-10-30

2023-07-20

2023-06-27

2023-05-18

2023-04-28

2023-03-02

2023-02-13

2023-01-19

2022 Release notes

2022-11-30

2022-09-13

2022-07-19

2022-05-31