Microsoft has a soft rib that is not willing to mention: Win10 speech recognition

Windows has a feature that Microsoft doesn't want to mention. Although Windows allows users to use the stylus to "write and write", use Windows Hello to use the face login system (or protect Web security), and even command Xiaona to set reminders, but it obviously does not want a feature that users use. Yes, use its speech recognition engine to dictate the system or let the user edit the document using voice input.

The reason why Microsoft does not vigorously promote Windows voice recognition can be traced back 10 years, when Microsoft product manager Shanen Boettcher screwed up the Windows Vista voice input function. After that, the voice input technology of Windows has been quite "low-key". At present, almost no users know that Windows has voice input function.

If Windows has a chance to fight the voice input field, it seems to be an opportune now - the advancement of computers and artificial intelligence provides a much better basis for voice input.

Asked about the future of voice input technology in Office, Microsoft's voice recognition research and Harry Shum, executive vice president of Xiaona and Bing, said, "This is a major issue. Voice input does not play. The more important role is incomprehensible."

Reasons for imperfect speech recognition

Some users still think that the voice input is the level of the Apple Newton PDA in the Doonesbury series of comics, and the user said "I am wriTIng a test sentence" as "Siam fighTIng atomic sentry". Users have this idea to be forgiven: Windows Speech Recognition still uses Microsoft Speech Recognizer 8.0 technology, which has remained largely unchanged since Vista. Schum called it "grandfather" technology.

PCWorld said, but the hardware has changed a lot: the processing power to listen to and interpret voice requirements is much lower than 10 years ago. The quality of the integrated microphone array in PC products such as the Surface Book means that high accuracy can be achieved without the use of dedicated microphone speech recognition. However, is the development of voice input technology suitable for the public?

When using speech input software to enter articles of 1028 words in length, 95% accuracy means that users must correct more than 15 errors. In the test, the Windows voice input accuracy rate was 93.6%. In theory, this value is not high, which is lower than other dedicated voice input software tested. Windows has a strange habit of inserting the word "comma" (comma) into a document when entering a comma. The voice input community seems to have different views on whether such relatively small mistakes have a significant impact.

Of course, this is not all. Anyone who has used voice input software knows that the key to accuracy is training. Over time, the voice input software will understand the user's accent. The pronunciation of â€œaâ€ in â€œapricotâ€ is the same as â€œaâ€ in â€œbadâ€ or â€œapeâ€, and how to filter unconscious defamatory language barriers. Microsoft employees have claimed that with proper training, Windows speech recognition technology can achieve 99% accuracy. It is not too bad to have 10 errors in 1000 words.

Few users are willing to spend time training to use speech recognition software. Windows speech recognition software requires users to train a few sentences in 10 minutes, which will make users feel like they are years old. Xiaona and Siri do not require users to train because they have been trained in millions of voice samples.

Xiaona (which can be used on PCs and mobile phones) outperforms the Windows voice input system in speech recognition because it takes advantage of the computing power of Microsoft Cloud Services. Microsoft analyzes the user's voice, correlating the user's voice with other data, and generating intelligence as the soul of Xiaona.

Microsoft values â€‹â€‹speech recognition

Given Xiaona's outstanding performance, users will think that voice should be the center of last week's Microsoft Ignite conference. However, during Ignite there was no conference related to voice input, and only one conference was related to speech recognition. In his keynote speech, Microsoft CEO Satya Nadella called speech recognition a key element of Microsoft's future.

Take the Skype Translator as an example. According to Nadella, Skype Translator relies on three areas of research: speech recognition, speech synthesis, and machine translation. In the speech, Nadella said that the Microsoft speech recognition algorithm has a hyphenation rate of 6.9%, which is a bad result: the accuracy rate is only 93.1%.

PCWorld said that if Microsoft is really optimistic about office software, the future of voice recognition in the PC is not just to use Skype to book a hotel in Bangladesh, but to write the experience, but through voice instead of fingers.

Gain Chip

Gain Chip,Gain Chip Laser,Narrow Linewidth Laser Diode ,Single Spatial Mode Laser Diodes

AcePhotonics Co.,Ltd. , https://www.cgphotonics.com