Speech Recognition Engines

Comparing Sphinx, IBM's Via Voice, Philips Speech SDK, Microsoft Win XP, Dragon Naturally Speaking
User: kmaclean
Date: 12/10/2008 1:33 pm
Views: 8253
Rating: 19

Here is a study by the National Research Council of Canada entitled:

A Comparison of Microphone and Speech Recognition Engine Efficacy for Mobile Data Entry

Abstract. The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. [...] Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. [...]

They compared the speaker-independent efficacy these 5 Speech Recognition Engines ("SRE"s):

(1) IBM ViaVoice;

(2) CMU's Sphinx 4 open source SRE (Java); 

(3) Philips Speech SDK;

(4) Microsoft, Windows Desktop Speech Technology;

(5) Nuance, Dragon Naturally Speaking.

Engines 3, 4, and 5 were used in a speaker-independent mode in order to assess the ‘walk-up-and-use’ capabilities of each.

Their primary measure of accuracy was calculated as a ratio of the total number of first attempt correct entries divided by the total number of tests, Sphinx 4 did pretty well (based on the results in Figure 1 of the report):

  • IBM - approx 53% of attempts accurate,
  • Sphinx - approx 38% of attemtps accurate,
  • Philips - approx 25% of attempts accurate,
  • Microsoft - approx 45% of attempts accurate,
  • Dragon - approx 40% of attempts accurate.


--- (Edited on 12/10/2008 2:33 pm [GMT-0500] by kmaclean) ---