Speech Recognition: An Accommodation Planning Perspective

Background

Speech recognition has almost come of age. Although not yet a mature technology, voice input has great potential as an alternative to using a keyboard and mouse, both for people with and without disabilities. Yet many individuals who switch to speech input do not become proficient users. Instead, they use the software in limited or inefficient ways, or "give up" on it altogether.

I believe that most failed attempts are due to (1) inadequate accommodation planning, and (2) unrealistic expectations about the capabilities of the technology. Despite its uneven record as an accommodation, speech recognition remains an attractive option for individuals with certain learning disabilities, upper-body mobility impairments, and increasingly, visual impairments. That many people are strongly motivated to learn to use speech input products makes the choice even more appealing. Given the potential of the technology and the widespread interest in it, I describe, in this paper, ways to enhance the effectiveness of speech recognition as a workplace and educational accommodation.

The first section focuses on the accommodation process itself. I detail an accommodation planning model that attempts to circumvent implementation problems by identifying critical stages during which things can go awry. The second section examines common misconceptions about the nature of voice input technology, how it works, and how to best use it. Addressing these assumptions can help lessen disappointment when the technology proves to be more complicated than expected, or does not perform as imagined.

Part 1: Accommodating Speech Recognition

To recognize the obstacles that can derail the accommodation process, it is first necessary to understand the process itself.

Accommodation is the process of tailoring work (or education) to meet the needs of an individual. It is an ongoing process of identifying and removing — or minimizing — the adverse effects of barriers in the environment and in the method of performing tasks. These barriers prevent an otherwise qualified person with a disability from achieving expected job (or educational) outcomes.

The oft-heard suggestion that accommodation involves simple measures — for example, installing an assistive technology, elevating a desk surface, or extending a deadline — is misleading. Accommodation is usually more complex. Anything overlooked during planning or implementation can jeopardize success. For example, failure to account for the individual's work station set-up and working postures may result in difficulties seeing the monitor, sitting comfortably, or breathing diaphragmatically — factors that may prevent an individual from using speech recognition to advantage. Failure to provide sufficient training can also thwart the accommodation process.

Previously I have described ADAPTABLE, a seven-stage accommodation planning model (Cantor 1996, 1998). To highlight the concerns that should be considered when planning an accommodation that includes speech recognition, I present a modified model with eleven stages:

1. Assessment

Assess the individual's needs. The goal of the assessment is to consider a wide range of accommodation options. Few people can be fully accommodated with one or two items; some need ten, twenty or more adaptations. Generate specific accommodation ideas from these broad categories: building modifications; environmental adjustments; spatial reorganization; work station modifications; computer-based assistive technologies; software customization; electromechanical devices; low-tech devices; alternative formats to print; transportation services; personal support services; human resource strategies; employment/educational policy changes; and training and retraining.

Key assessment questions include: is voice input really necessary? Would, for example, macro software, abbreviation expansion software, MouseKeys, or training in keyboard-only techniques be more effective than or a worthwhile adjunct to speech recognition? Is the technology appropriate to the individual's age and level of development, and realistic for the tasks that he or she hopes to accomplish?

Individuals whose primary access means is the voice should be evaluated by a Speech Language Pathologist who is knowledgeable about speech recognition, and receive training on caring for the voice and safeguarding against vocal injuries. [Endnote 1.]

2. Software Requirements

Decide on a voice input product. Choosing can be a difficult because the current crop of speech recognition products all work reasonably well. Mainstream product reviews may not be helpful because many reviewers evaluate products on the basis of accuracy, speed, and compatibility — factors that may not be crucial to the success of an accommodation. A product feature that reviewers often overlook is commands for revising documents by voice. Revising involves substituting words, rearranging sentences and paragraphs, eliminating verbiage, and clarifying meaning. For people who express themselves through written language, the ability to revise by voice is arguably more important than speed and compatibility. [Endnote 2.]

3. Hardware Requirements

Choose the right hardware. Speech recognition applications are resource-hungry. For individuals who use voice input as a primary access technique, doubling or quadrupling the minimum RAM requirement is advisable. For example, if 64 MB of RAM is recommended, install 128 to 256 MB. The quality of the soundcard and the microphone are also of paramount importance. Manufacturers of voice input products publish lists of approved soundcards and microphones. Note, however, that microphone quality may vary from unit to unit. Running the "Audio Setup Wizard" is an excellent way to gauge the overall quality of the soundcard/microphone combination.

4. Work Station Set-up

Reorganize or replace work station components to ensure that the user can work safely and comfortably while operating a PC by voice. For example, position the monitor so that the user is not forced to crane the neck or bend forward to read the screen.

5. PC Set-up

Optimize the PC for speech recognition. Uninstall superfluous software; free system resources (e.g., in Windows 98, use MSCONFIG to disable non-essential start-up group items); and adjust operating system settings to make the system easier to use (e.g., in the "Display" applet, enlarge illegible system-wide fonts).

6. Enrollment

Create and initiate user voice files. Run the "Audio Setup Wizard," coach on proper microphone placement and dictation techniques, and conduct initial training. Once completed, the user can begin experimenting with dictation.

7. Vocabulary builder

After enrollment, run the "Vocabulary Builder" to tweak the voice files to better match the user's vocabulary and writing style. First, create vocabulary files containing lists of the user's unique words, names and expressions. (The vocabulary files can be based on the user's word processor custom dictionary files.) Next, assemble files containing samples of the user's writing. Run the "Vocabulary Builder" on both sets of files.

8. Initial use

During the first few hours or days of using a speech-enabled PC, practise dictating text and correcting misrecognitions. Resist the temptation to produce perfectly formatted, error-free documents. Focus on learning basic dictation techniques and improving voice file accuracy. Staying relaxed, cultivating healthy work habits [see Endnote 1], speaking clearly and fixing misrecognitions are the best ways to gain precision, competence, and comfort with speech input.

At this stage, it may be prudent to teach dictation techniques independently from other computer skills such as word processing and web browsing.

9. Intermediate use

Once dictating text and correcting misrecognitions becomes second-nature, begin dictating for "real." The voice files will already be quite accurate and will not improve significantly by correcting common words. Focus now on learning to revise documents. Revising text is usually significantly faster and easier than correcting misrecognitions.

10. Expert use

Once proficient with the program, increase speed and efficiency through experimentation and by periodically tuning the voice files by, for example, rerunning the "Audio Wizard;" dictating for one or two minutes in "General Training;" removing nuisance words from the lexicon; and updating word-usage and writing style data by running the "Vocabulary Building" with newly-created files. Backing-up the voice files should become part of the user's regular routine.

11. Follow-up

After implementation, regular follow-up sessions (say, after two weeks, one month, and three months) help to ensure that the individual is using speech recognition properly. There are almost always ways to refine the accommodation to enable the individual to work faster and less stressfully. Difficulties noted during follow-up can usually be traced to skipping or not completing an earlier stage.

Part 2: Principles of Working by Voice

Many individuals become disillusioned when faced with the reality of voice recognition. Others become competent at dictating simple texts (Stage 8 or 9), but cannot use features that would enable them to work more effectively. When accommodating an individual with a speech recognition system, it may be necessary to emphasize the following principles:

Accept the limitations of voice recognition technology. Once good accuracy is achieved, do not try to perfect the voice files. They will never be perfect. If lost, new voice files can be generated in short order using backups of the "Vocabulary Builder" files.
Recognize that written and spoken language are different. Learning to think and speak in grammatically-correct sentences and paragraphs takes discipline and effort. Some people cannot do it.
Use the speech recognition program's proprietary editor if the dictation software works best in it. Copying text into one's usual word processor after a dictation session is easy.
Understand the differences between correcting misrecognitions and revising text. To use speech recognition effectively, one must be able to do both.
Do not correct misrecognized words that are usually interpreted correctly. It is faster to undo, select an entire utterance and dictate it again, or use the "Resume With..." command.
Navigate using "Insert Before..." and "Insert After..." instead of "Move" commands. Specifying two or more target words can increase precision, e.g., "Insert After achieve good accuracy," "Insert Before accept the limitations."
Revise using "Select" instead of "Backspace..." and "Delete..." commands. Select the text and dictate. It is not necessary to explicitly delete selected text; the dictated words will replace the selection. Use "Select From... Through..." to mark larger selections.
Revise phrases rather than individual words, even if some words are correct. For example, to change the first comma in this sentence to a colon, say "Select for example comma," pause for a split second, and say "for example colon."

Conclusion

Speech recognition has tremendous — but as yet unfulfilled — potential as a workplace and educational accommodation for people with disabilities. In this paper I have suggested two ways to help people realize the promise of the technology: be methodical and thorough in planning and implementing the accommodation so that individuals are well-prepared to operate a PC by voice; and clearly understand how current voice input technologies do (and do not) work.

References

Cantor, Alan. (1996). The future of workplace accommodations: containing costs and maximizing effectiveness. National Conference on Disability and Work Conference Proceedings. October 1996. Toronto.

Cantor, Alan. (1998). Disability in the workplace: effective and cost-effective accommodation planning. NATCON 1998 Conference Proceedings. Toronto.

Kambeyanda, Singer and Cronk. (1997). Potential problems associated with use of speech recognition products. Assistive Technology, Volume 9, Number 2, 95 - 102.

Endnotes

Good vocal hygiene includes performing "warm-up" and "cool-down" exercises, drinking fluids, limiting total time on a voice input system, and taking regular breaks. Singer et al (1997) found evidence that persons with musculoskeletal problems (such as various computer-induced repetitive strain injuries and osteo- and rheumatoid arthritis) are susceptible to serious vocal damage when they switch from keyboards and mice to voice recognition systems.
Recently I evaluated three voice recognition products: IBM ViaVoice, Dragon Systems NaturallySpeaking, and L+H VoiceXpress. Although ViaVoice and NaturallySpeaking were about equally accurate, correcting misrecognitions and revising text were easier using NaturallySpeaking. For this reason, I use the technical terms associated with NaturallySpeaking throughout this paper.

Acknowledgements

I thank Daniel Hilton Chalfen, Ph.D., of Boulder, Colorado for his attentive reading of and many comments on an early draft of this paper, and Lois Singer, D.S.P.A., of the Voice Laboratory and Treatment Centre of Ontario, Toronto, for our discussion on injuries in speech recognition users.