Technology for the deaf to go mainstream
By Peter Abrahams
Editor: Voice recognition (VR) technology has made significant progress in the past few years – so much so that it is becoming a mainstream captioning solution. IBM’s Via Voice has long been one of the premier VR solutions, and it looks like ViaScribe may be their next commercial VR technology.
Peter Abrahams has long been involved with accessibility technology and has written over 200 articles on a variety of topics. He is currently a Practice Leader with Bloor Research. For more information on Peter please point your browser to: http://www.it-analysis.com/about/author.php?id=47
Our thanks to Peter and Bloor Research for permission to share this article.
Technology that was initially designed to help deaf university students has a potentially much wider use.
How does a university student who is profoundly deaf or just hard of hearing cope with lectures where the spoken word is the main, if not the only, communication mechanism? Until recently with the help of some sub-optimal solutions:
– Reading the lips of the lecturer is not really an option as the student needs to be very close and the lecturer has always got to be facing them.
– Having an interpreter attend all the lectures and either sign or lip-speak is expensive, is difficult to organise as there are a limited number of interpreters, and it makes the student very dependent on a third party. This is only practical for major lectures or conferences where many deaf people may be attending.
– Using a stenographer to type the lecture in real time has the same limitations as an interpreter, but does have the added benefit of a providing a permanent written transcript that can be used by the deaf student, as well as others, later.
– Having the lecturer provide notes before the lecture is not natural for the lecturer nor will it reflect the dynamic nature of a live lecture.
None of these solutions are practical for an average hearing-impaired student. In 1999 some Universities and IBM set up the Liberated Learning Consortium to see how technology could improve the situation. IBM had a speech recognition product called ViaVoice and the consortium was set up to see if it could be used in the lecture environment.
Some initial trials showed that the technology had potential but there were some major issues the first being that the lecturer was not adding in any punctuation ViaVoice was designed for dictation and the person dictating would add in commands like comma full stop new paragraph so you landed up with text that looks like this paragraph which is very difficult to read in real time even worse it would pick up commands such as save and close and close down the application in the middle of the lecture.
So keeping the same speech recognition engine…
Some modifications were made…
Firstly to stop the system recognising and acting on commands…
Secondly to recognise that lecturers speak in small chunks with pauses in between…
And laying out these chunks on separate lines with ellipses at the end…
Made the text much easier to read…
Various options were tried and tests are still going on but this format seems to work well…
As we hope you can see from this example.
The technology proved itself in the real-time environment but it was not really practical because it was difficult for the lecturer to do the initial training and set up.
For the solution to really work the recognition rate has to be in the high 90s. This is difficult to achieve and that can only happen if the speech recognition engine can learn the lecturer’s voice. Lecturers are busy people and would be willing to spend up to a couple of hours training the system, but with the specialist language this was not normally enough. The solution was to record the audio of the lecture, let the engine do its best and then have an editor correct its mistakes. After a few live lectures the recognition rate dramatically improves into the 90s.
The set up of the equipment at the beginning of the lecture was automated so reducing an unnecessary burden on the lecturer.
This technology means that a transcript of the lecture can be provided in real-time and can be displayed on a screen in the lecture hall. It is also possible to transmit it to portable devices, such as a PDA, to individual students. The deaf student is now able to follow the lecture, just like any other student, so with this technology the student is no longer disabled.
In a sense the student is now more able than an unimpaired student. How often, when you are listening to a lecture, have you wanted an instant replay of the last few sentences, either because you did not fully understand first time, or because you were momentarily distracted and did not hear everything? The technology display shows the last few chunks so a student with the technology is more able.
Even more important than the instant replay is the fact that at the end of the lecture there is a complete transcript available on-demand. This is not just a boon for the hearing impaired students but for all students. However good students are at note taking, and many are not, the ability to go back to the lecture and read the transcript of any section will improve the learning experience.
The technology has been developed much further by enabling the transcript to be synchronised with audio or video recordings and with any PowerPoint presentation material. Over the web and on-demand a student can then see the transcript alongside the appropriate audio, video or slide. This is a complete solution for the hearing impaired student and also provides completely new opportunities to other students such as:
– ‘Attending’ a lecture that they could not attend live.
– Distance learning.
– Foreign students often find it easier to read than listen; the combination of the audio with the transcript is a powerful learning tool.
– Students with learning difficulties can benefit from the ability to replay and to choose the media.
– All students benefit from the ability to have a search engine find relevant sections of the transcripts.
The technology has been named IBM ViaScribe. Up to this point all the development has revolved around the university student and lecture environment. It has now been developed sufficiently to consider other environments where it could be beneficial. Some research has started into the school environment with some initial success and some new challenges. However, research into its use in pubic sector organisations and private enterprises suggests it will have enormous potential.
A study at RBC Financial Group identified the following opportunities:
A client who is deaf, or hard of hearing, or for whom English is a second language, requests services from an RBC employee. The RBC employee, trained in the use of IBM ViaScribe, speaks naturally to the client. The software simultaneously transcribes the conversation, making it available as a text display, and creating a text copy for the client. The synchronous audio and text transcript can also be used as a training tool to study employee/customer interactions.
Multi-media lecture/presentation notes for in-house training and accreditation sessions can be made available via the web. Real time captioning of the presentation materials is also possible. Using IBM ViaScribe in this way creates access for deaf/hard of hearing participants and also creates an additional learning channel for non-English speakers.
Real time captioning of calls can eliminate comprehension problems commonly associated with poor audio quality during teleconferences. Record keeping is also enhanced as the transcript verifies the content for participants in real time, can be used to generate meeting minutes, and allows full access to the content for those who are either late or unable to attend.
Existing webcasts could be captioned, complete with speaker identification.
ViaScribe is not yet available as a commercial product but any organisation that believes they could benefit from this technology should contact the Liberated Learning Consortium.
It is wonderful to see research into accessibility creating a tool that will be valued by all members of society. I believe that the use of this type of technology will become commonplace over the next five years.