Where humans make a difference in transcription

[vc_row][vc_column][vc_column_text]

The bits & pieces machines just can’t do!

As technology develops we hear more and more about AI – Artificial Intelligence.

It sounds all very futuristic… is the world portrayed in TV series “Humans” edging ever nearer to becoming a reality?

Technology giants such as Microsoft are constantly working on developing their AI technology so that their speech recognition software can replace the need for human beings when it comes to transcription. In fact, back in October 2016 Microsoft hit the headlines with claims that their speech recognition AI was now better than the human professionals…

That’s a bold claim! A year later, has their AI taken over…?!

No!

And in our experience there’s very good reason for this – there have indeed been vast improvements since the 1950s when early computers could recognise as many as 10 words spoken very clearly by just one speaker to the machines of the 1980s that could transcribe simple speech with a vocabulary of 1,000 words. But it quite simply isn’t there yet – the software is by no means perfect, even in 2017.

Machines have now been proven to be able to transcribe speech such as broadcast news speech, but this is clearly pronounced and highly structured, very different to how we talk between ourselves every day. It’s everyday conversational speech that has turned out to be the nemesis of the machines.

Humans understand the nuances of conversation, and can interpret context infinitely better than machines. They understand the noises that people make in conversational speech that aren’t actual vocabulary. A good example is when people use sounds for hesitation in their speech. When someone uses an “uh” sound in their speech it can mean one of two things: the person listening is effectively telling the speaker to continue talking (“uh huh”) or it’s a hesitation, a pause for thought while they thinking of what else to say. Microsoft admit that this is where machines fail as they quite simply cannot distinguish between the two – it’s the most common error their speech recognition software makes, which is problematic when you’re looking for the transcription to show an accurate account of a conversation.

Another issue that arises is the ability of machines to correctly distinguish between words that sound very similar, being able to use the context of the conversation to use the correct word. Words like “omission/emission” or “practise/practice” come to mind – here’s where human understanding and experience is key to being able to pick up those nuances and transcribe correctly. It’s easy for a human to know whether someone has said “I scream” or “ice cream” but less so for a machine!

It’s also a skill of an experienced human transcriber to show the speaker’s personality and portray the emotion behind what’s being said. Using commas to show pauses or italics for emphasis enables the transcript to bring these emotions out in the notes, making it a much more powerful document. Even knowing when to include the “of course” and “ums”, and when not to, means creating a document that meets the exact needs of the client, every time.

When clients want this total degree of accuracy – and most of our clients do – it becomes clear that human transcription is the only way to go. AI is edging ever nearer, and perhaps in the future it will learn how to establish these differences much more effectively, but for now – for total peace of mind when it comes to accuracy – it’s definitely not there yet.

[/vc_column_text][/vc_column][/vc_row]

1 thought on “Where humans make a difference in transcription”

Leave a comment