This week, Cambridge-based AI speech recognition provider Speechmatics launched its ‘Autonomous Speech Recognition’ software. The company’s technology was found to outperform Amazon and Google in overall accuracy for African American voices (82.8% versus Google’s 68.7% and Amazon’s 68.6%), based on datasets used in Stanford’s ‘Racial Disparities in Speech Recognition’ study. This equates to a 45% reduction in speech recognition errors – the equivalent of three words in an average sentence – and Speechmatics’ new software looks to deliver similar improvements in accuracy across accents, dialects, age, and other sociodemographic characteristics.
Up to now, speech recognition has been commonly misconceived due to the limited amount of labelled data available to train on. But in this Q&A, Speechmatics CMO David Keene explained to Information Age the value that the technology can bring, and the importance of diversity and inclusion in tech.
Why is D&I important in reducing AI bias and improving accuracy?
The innovation and adoption of AI technologies is gathering speed at an unprecedented pace. From government AI strategies to the NATO announcement today, this tech is going to be front and centre on the agenda for years to come. For AI technology to be truly useful to the world at large, however, it has to be globally representative. We cannot and must not build AI systems for an elite set of users. It is unethical but also doesn’t make commercial sense.
Our machine learning breakthrough has taken a big step forward towards understanding every voice – allowing us to ‘plug in’ to the internet and train on millions of hours of publicly available data rather than smaller, biased labelled datasets. Next step in this journey is to work out how we can understand the digitally excluded – those voices that are not commonplace on the internet – in audio books, on podcasts and social media networks.
The top reasons why a lack of diversity in tech remains a problem
How can businesses, especially in tech, improve levels of diversity and create more inclusive environments?
In an ideal world, your tech team would mirror the market you are selling to and we have to do better as a community – going beyond the cookie cutter hiring process to find those people. That is going to take years and years to achieve though and there are things we can do in the meantime. Inclusion is a mindset and needs to be ingrained into the culture of the business and mapped to the bottom line.
Strength and innovation doesn’t come from homogeneity. It is fascinating to see how much tech skews to the make-up of the tech team developing it. Male-heavy developer teams will build tech that works better for men. Tech teams based in Michigan will better understand voices from Michigan (I am looking at you Bing). We need to recognise that we naturally ‘build for our own’ and make a conscious decision to test innovations with a much broader group of people.
Can you go into more detail about how speech recognition tech will help?
Speech recognition technology is in the fabric of so much of what we do these days. From e-learning to voice assistants, courtroom transcriptions to driverless cars – research varies but we are looking at a $30+ billion market within the next few years which is hugely exciting.
That growth is running alongside a macro-move to productivity requiring us to take low value tasks out of the supply chain driving automation and robotics. This all only works positively for wider society if these speech recognition systems understand all voices.
Take McDonalds as an example – if they want to put in a speech recognition system to take orders in their drive-throughs – that system HAS to understand all its customers. For that to happen the system needs to be trained to understand all voices which means going way beyond the bias labelled datasets that are often limited in terms of representation.
What’s the difference between automatic and autonomous speech recognition?
‘Automatic’ is when the machine is fed specific, usually biased “human-labelled” information to learn on. Autonomous means you plug it in and it learns ‘unsupervised’ from all available data on the internet. In AI we call this ‘learning on first principles’ rather than being rules-led. This is the general move that AI innovators are now trying to make. A similar comparison is IBM’s Deep Blue vs Google’s AlphaZero. Deep Blue was trained on human data from chess games played by people (specific – ‘biased’ data – assuming humans know how to play chess). It was trained to beat a human. AlphaZero was trained from first principles – to play a superhuman game of chess. We now have the technology breakthrough to do this for something more complex than a game with rules and that is, of course, speech recognition.