Nuance released a substantially revised NaturallySpeaking in August 2010. Since then, many of my clients have asked my opinion of it. The question foremost on their minds: "Should I upgrade from Version 10.1 to Version 11?"
To help answer the question, I prepared the following summary. The first draft, which I emailed to a group of my clients in January 2011, was based on testing the product for three months; helping clients with NaturallySpeaking 11; and monitoring online NaturallySpeaking discussions. I expanded the review after receiving excellent suggestions from Jane Berliss-Vincent, Ray Grott, and contributors to the Knowbrainer Speech Recognition Forums.
This review is not comprehensive. I do not describe every new feature, command, enhancement, and bug. For a full product description, the Nuance website is a good place to start.
I will be updating this document throughout Version 11's life-cycle. Feel free to send me comments and suggestions.
Nuance offers five editions of Version 11 — "Home," "Premium," "Professional," "Legal" and "Medical" — each with a different set of features. The "Preferred" Edition is no more; it has been re-branded as the "Premium" Edition.
In January 2011, Nuance announced the release of the UK version of the "Medical" Edition. The North American version has not yet been released.
In June 2011, Nuance announced the release of Version 11.5. Changes between Version 11 and 11.5 include:
Version 11.x supports more programs than Version 10.1, including Microsoft Office 2010 applications, and OpenOffice Writer. Compatibility with OpenOffice Writer is not 100%, but includes, according to the Nuance website, "dictation, correction, selection, and playback." Note that "formatting" is not listed.
The "Home" and "Premium" Editions now fully support Microsoft Outlook. In the past, Outlook support was available only with the "Professional," "Legal," and "Medical" Editions.
Nuance claims support for Mozilla Firefox, but I notice no improvements over Version 10.1. Browsing by voice is much easier with Internet Explorer.
Version 11.x is noticeably more accurate than Version 10.1. Expect to correct misrecognition errors about half as often as before.
When creating a new user profile, there are three documented "Training" options, plus a fourth option that is not. The documented options are:
"Show text with prompting" is for users with standard voices. "Show text without prompting" is for those with non-standard accents, or who have reading difficulties.
"Skip training" corresponds to the "None" (no training) option in Version 10.1, which worked surprisingly well. I regularly created accurate profiles in about five minutes. In Version 11, I find this option works less well; I achieve better accuracy after a short training session. Now, I spend about ten minutes creating a profile instead of five. But given the accuracy improvements in Version 11, this is not a deal-breaker.
I have read reports of people who get good accuracy when they skip training. My suggestion: Try it. But if accuracy is off, do five minutes of "General Training" later.
There is a new, undocumented training option in Version 11. Choose "Show text without prompting," select a reading from the list — it does not matter which — and then ignore it! Start training, but read any text until the counter reaches four minutes. I have found this method yields excellent accuracy.
For people with standard voices, Version 11 appears to need four or five minutes of data to build an accurate profile, regardless of audio source. For example, Nuance claims it now takes four minutes, instead of 15 minutes, to create a profile for a digital recorder.
The user interface has been redesigned. A new contextual help system, the "Dragon Sidebar," automatically displays commands and tips as you switch between windows. Experienced users will likely choose to hide the Sidebar, but novices may find it helpful.
The new "Results Display" is a streamlined version of the old "Results Box." Instead of showing the results of NaturallySpeaking's ongoing analysis, the "Results Display" provides more subtle feedback: for example, a rotating shape indicates NaturallySpeaking is processing speech. The simpler display is meant to be less distracting, and encourage users to dictate in longer phrases, which improves accuracy. Some people prefer the "Results Display," others the traditional "Results Box." It does not matter. You can choose the one you want, or hide both.
The new User Interface may create barriers to people with low vision and/or learning disabilities. For example, the "Spelling Window" (which replaces the "Spell" dialog box) is not resizable in Version 11.0 (fixed in 11.5); its fonts are hard-coded (so they cannot be changed or enlarged); and the poor contrast between the green typeface and the white background makes text harder to read. For everybody else, the "Spelling Window" takes getting used to, but presents no particular problems.
Noteworthy commands were introduced in Version 11. To display a list of all open windows, say list all windows. To display a list of application-specific windows, say (for example) list windows for Microsoft Word or list windows for Firefox. In response, NaturallySpeaking displays a numbered list of windows. You can switch to any window on the list without knowing its exact name.
Commands introduced in Version 11.5 include:
Editing by voice is more precise. When you say commands like delete <text>, capitalize <text>, and bold <text>, NaturallySpeaking overlays a small number next to each instance of the word or phrase. Pick the one you want by saying its number; or choose them all by saying choose all.
In other words, when editing by voice, you may need to say two commands instead of one. But you are more likely get the result you want. Some users like this new behaviour, others not. There is no option to turn it off.
Certain microphones will perform better. Version 11.x samples a different frequency range than before. This change will not improve accuracy for every microphone, but will for some.
Notwithstanding this change, I continue to recommend that "serious" users get top-of-the-line microphones. A quality USB microphone makes NaturallySpeaking more responsive, more accurate, and less error-prone. Improved productivity will ensure quick recovery of any additional cost — in some cases, cost recovery will happen in one or two days.
Be skeptical of Nuance's published hardware requirements for NaturallySpeaking. To take full advantage of Version 11, you may need an up-to-date computer. For a 64-bit Windows 7 PC, an i7 CPU with 8 GB RAM might not be excessive.
NaturallySpeaking 11.x runs on older PCs. In fact, some people report excellent performance on Core 2 Duo CPUs. As with Version 10, Version 11 adjusts itself to match the system, but it may select overly optimistic program settings. To get acceptable performance, you may need to manually change the default program settings. For example, when creating a new user profile on a Core 2 Duo 2.0 GHz PC, I select the "Best Match III" option instead of the more resource-intensive "Best Match IV" option.
When adding words or phrases via the "Vocabulary Editor," the "Spoken form" can no longer contain punctuation marks or symbols. For example, if you add power/knowledge to the vocabulary, and want to pronounce the slash, the Version 10 "Spoken form" could be "power / knowledge". In Version 11.x, you must change the symbol to a word: "power slash knowledge".
When importing a file containing custom words and phrases, check the list before importing it into Version 11.x. Edit any "Spoken forms" that contain symbols or punctuation marks:
| Written form | Spoken form: Version 10.1 | Spoken form: Version 11.x |
|---|---|---|
| Ti & Lion Inc. | Tie & Lion Inc. | Tie and Lion Ink |
| Midge + Bros | Midge + Brothers | Midge plus Brothers |
| Cathy | Cathy with a C. | Cathy with a C (or "with a See") |
| colour | color with a U. | color with a U (or "with a You") |
Commands scripted for Versions 9 and 10 should work in Version 11.x, with a few exceptions:
Several problems were reported in Version 11. (It is too early to say which have been fixed in Version 11.5):
If you are successfully using Version 10.1, the answer is a definite maybe. For me, the improved accuracy has made the upgrade worthwhile, despite the growing pains. I much prefer the new "Results Display" over the "Results Box" (which I have always found obtrusive). Initially, I was skeptical about the changes in the behaviour of editing commands, but I have come to appreciate them. Overall, I am happy with the upgrade. But I have an up-to-date computer running Windows 7; I am not sure I would feel the same if I were still using my four-year-old Vista machine.
On an older PC, you may experience slow or halting performance. I have seen Version 11.0 struggle on a Windows XP Pro machine with a Core 2 Duo CPU and 3 GB RAM, even when I created a profile with conservative settings: BestMatch III instead of BestMatch IV; Medium vocabulary instead of Large Vocabulary, Speed vs. Accuracy slider set to 50%.
If you have low-vision, you may find that the "Spelling Window" hard to read without screen magnification software. But if you already use screen magnification software, this will not be an issue.
My company's NaturallySpeaking training and scripting services.
Other articles on speech recognition.
What's new in Version 11? (Nuance website).
Ray Grott and Jane Berliss-Vincent's thoughtful comments on this review helped me make it better. Many thanks also to all who contribute to the lively discussions on the Knowbrainer Speech Recognition Forums.