One of the really exciting current technologies is speech recognition. Many organizations are working on this technology, and there is significant progress being made. We don’t yet have anything like universal speech recognition, but the current systems do an amazingly good job when trained to an individual speaker.
For information on specific speech recognition products, please visit the Speech Recognition portion of our Resource Directory. If you’re looking for general speech recognition information, read on!
USC is conducting speech recognition research for the Navy, and has recently developed a neural net based system that outperforms human listeners in some situations.
Automatic Speech Recognition was a topic at ALDACon99. The good folks at the Northern Virginia Resource Center for Deaf and Hard of Hearing Persons did an awesome job of recording the information, and were kind enough to allow us to publish this article.
November 2001 – This is really exciting news! The Wisconsin Relay Service is testing voice recognition for the relay. I think voice recognition has huge potential for these types of applications, so I’m anxious to hear how this works.
April 2003 – Many of our technology advances come from the military. Do they have a better voice recognition system? Read about how they’re using voice recognition in Iraq.
May 2003 – Speech Recognition technology may be improving even faster than we thought. A company called SpeechWorks has commercial programs in use today that are doing amazing things. Here’s the report!
Voice Recognition Technology in Iraq
Editor: I saw an interesting article about a handheld device called a Phraselator being used by military personnel in Iraq to convert English phrases to Arabic or Kurdish. The device uses voice recognition technology and seems to require no training of individual voices. My guess is that this works because the device only recognizes and translates pre-identified phrases. Even so, it has the capability to translate hundreds or even thousands of phrases.
Is there an application here for people with hearing loss? It would be easy to output English text instead of Arabic or Kurdish speech. The question is whether there are applications where the ability to convert hundreds of spoken phrases to text might be important. Traffic stops? Emergency rooms? Other places where limited communication would be beneficial until more appropriate arrangements could be made?
About 200 American and British military personnel in Iraq are communicating with the locals using a hand-held device into which soldiers speak English phrases to have them sounded out in either Arabic or Kurdish.
“It really helps calm a population when they can hear commands, questions, or information in their own language. … Unfortunately, interpreters are in short supply,” said Sheri Cranford, assistant to the vice president of VoxTec in Annapolis, the company that has developed the new communications system.
The device, called a Phraselator, is designed to help compensate for a shortage of linguists, and it already has proven its worth, Mrs. Cranford said.
“It’s been used to locate caches of weapons and to identify places where troops are hiding,” she said in a telephone interview. VoxTec is a division of Marine Acoustics, a Rhode Island-based military contractor. “We received funding to develop this device from the Defense Advanced Research Projects Agency right after 9/11, but it was being worked on well before then,” Mrs. Cranford said. She said the device has been used by Americans in Afghanistan for about a year to reach residents there in four different languages.
The Phraselator uses speech-recognition technology called Dynaspeak, developed by SRI International. This technology recognizes phrases phonetically and then emits the equivalent pre-recorded phrase in Arabic, Kurdish or another foreign language.
For those who fear there might not be an appropriate Arabic match for their English statement, Mrs. Cranford said, “we load 500 to 1,000 phrases [into the machine]. A 64-megabite flash card can hold 30,000 phrases.”
Kevin S. Hendzel, spokesman for the American Translators Association, said it’s important to recognize that the Phraselator is “not a translation device … but a phrase matcher.”
“It’s not perfect. … Voice-recognition technology has its limits,” he said.
Mrs. Cranford said VoxTec will be coming out with a “commercial version” of the Phraselator later this year. Both emergency relief organizations and law-enforcement agencies are expressing interest, she said.
“Anyone who has to deal with a large number of non-English-speaking people will find it useful,” she said.
Speech Recognition Status Report
Editor: I’ve been singing the praises of Speech Recognition for a couple of years, ever since I used it in place of real time captioning to teach a computer class for people with hearing loss. That particular application was trained to my voice, and I occasionally had to repeat to get the software to “understand” what I was saying. But it was very workable. The newer version that I got a few months ago is noticeably better. And I expect the next version will be significantly better still.
At the other end of the Speech Recognition spectrum from the simple program I use are the ones that interact with thousands of customers without training on their specific voices. Those programs currently only work in situations where the number of possible responses is very limited, but it may not be too long before they are able to work in more general circumstances.
Michael Phillips is the Chief Technical Officer of SpeechWorks, a company that produces these commercial systems. He was recently interviewed by MIT’s “Technology Review” (TR). Here are excerpts from that interview.
Michael Phillips is Chief Technology Officer for SpeechWorks. He spoke with Technology Review Senior Editor Wade Roush about his company’s interactive voice-response technology, which automates the handling of customer calls at companies like United Airlines and Federal Express. With a father’s pride, Phillips introduced Tom, a jaunty voice with an American accent who is one of SpeechWorks’ synthesized-speech “personas.” (Tom’s colleagues Helen and Karen sound like real women from Britain and Australia, respectively, and personas speaking in many other languages and accents are available.)
Phillips, who co-founded SpeechWorks in 1994 to commercialize language-processing software he had helped to build at MIT’s Laboratory for Computer Science, talked about the company’s plans for making such speech-driven interfaces the dominant way we interact with computers.
TR: Is the technology really getting that good that fast?
PHILLIPS: The technology is improving rapidly. We’re sort of in the Moore’s Law of speech recognition. We cut error rates in the speech recognizer by 20 or 30 percent every year.
TR: What would you say the rough error rate was when you started in 1994, and what is it now?
PHILLIPS: It depends on the task. And as the speech recognizer gets better and better, we do more and larger and more complex tasks. So a better measure than just what is the accuracy on a fixed task, is what kinds of tasks you can get acceptable accuracy on. When we first started, it was basically a few hundred-word vocabulary. Things like getting a phone number from a user, or even getting a city name were possible, but stretching it. Since then we’ve deployed stock trading and stock quote systems that have 50,000- to 100,000-word vocabularies. Most of the applications we have exploited are not constrained by the quality of the speech recognition so much as by the user interface. We are doing very sophisticated things like entering any street address in the country, entering any name you have, even something like getting an e-mail address from somebody over the telephone.