Microsoft’s artificial intelligence (AI) system can now understand conversational human speech better than a trained transcriptionist, and is less prone to error.
The AI’s error rate in understanding speech moved down to about 5.9% from 6.3%, which puts it slightly below the human error rate.
Error rate refers to the number of times a human, or machine, mishears words.
“We [improved] on our recently reported conversational speech recognition system by about 0.4%, and now exceed human performance by a small margin,” the report stated.
>See also: Think before you speak: voice recognition replacing the password
This news comes only a month after Microsoft achieved the 6.3% error rate. It is learning fast.
The research found, however, that the error rates of human transcribers can vary between 4.1% to 9.6%, depending on how well they concentrate on the transcription.
The near-perfect accuracy, regardless, is somewhat of a breakthrough and should have significant impacts on Microsoft’s AI tools, including its virtual personal assistant Cortana.
Although it is unclear exactly how the real-world applications, where background noise and multiple speakers are significant issues, will take form.
Perhaps, simply, less speech-enabled error when interacting with smartphones or, in the future, autonomous cars.
>See also: Is voice recognition to become part of enterprise
As reported by The Verge citing a statement from the company, Microsoft’s chief speech scientist Xuedong Huang said that they had “reached human parity,” and called the improvement in speech recognition “an historic achievement.”
This human parity was achieved by optimising “convolutional and recurrent neural networks”, using 2,000 hours of voice recorded data.
Microsoft’s AI voice recognition announcement reflects a the focus the company has placed on the technology.
Indeed, last month Microsoft CEO Satya Nadella laid out the organisation’s 4 pillar plan for democratizing AI, and said that its cloud platform Azure is becoming the first AI supercomputer.