AI & transcription. The statistics speak for themselves!

This is always an interesting one… whether the transcription world is ready for artificial intelligence (AI) or not. Can artificial intelligence really replace humans?

Recent figures confirm what we here at FSTL firmly believe: there may have been significant advancements in AI capabilities over the last few years, but humans still stand out as being the best!

In some sectors AI is working pretty well but these tend to be on a very basic level. For example, Netflix are able to make TV and film suggestions based on what you’ve watched before – this is all done using AI. Personal assistants such as Siri, Google or Alexa can decipher speech to answer questions such as what the weather will be like or directions to a particular place – again thanks to AI technology. When you’re talking about global pharmaceutical companies launching new medical products with complicated words by multiple speakers who have different accents, it’s a whole new ball game!

It’s the facts around AI, especially in relation to transcription, that speak for themselves. We’ll digest just a few of them here for you:

Accuracy rates

While these have come on leaps and bounds, they are nowhere near human capability, which is 99-100%. As of 2017 the latest figures showed that AI has a 5.1% error rate in terms of recognising speech. Whilst this may be tolerable for some, when it comes to transcription it’s just not good enough.

Real time

The bots behind AI seemingly have a real advantage in that they can produce transcripts in real time. However, when you look at the margin of error, the real time advantage dissolves as the bots operate at around a 12% error rate. This means that a client will have an end product that’s only 88% accurate – they’ll find they need to carefully read the entire transcript and fix any errors they come across.

Multiple speaker issues

The simple fact is that AI technology struggles to recognise multiple speakers. It just cannot decipher all the different voices or work out who is saying what and when. Coupled with the likelihood of overlapping speech and interruptions, the complications of transcribing become a minefield for the bots.

Deciphering accents

Our previous blog Deciphering an English accent covered how accents can be difficult enough for the human ear – when it comes to AI, it’s almost impossible! AI is often only trained using American English and traditional British English accents so any other accents can be a real problem for voice recognition technology. It’s one of the reasons why Apple introduced the option to train Siri over time to recognise your own voice, in an attempt to be able to understand you better.

Background noise

Background noise, any music or even traffic during a recording will significantly affect the accuracy of a transcript. When using AI for transcription you have to have a good quality and very clean recording to maximise accuracy to anywhere near 88%. Without it, the capabilities of AI are very limited.

The bottom line is that even with 88% accuracy, a client would need to read over a transcript and make corrections to make sense of it all, and before it could be use as a proper document. At FSTL this is all taken care of before the document gets anywhere near a client – clients receive a totally accurate transcript that properly reflects what was said. When a person transcribes a recording with accented speech, background noise or multiple speakers it might take a little longer. With AI you’ll most likely end up with total nonsense!

Leave a comment