Speech Recognition in the News

Flat
On Speech Recognition: Web App Integration, Pointers for Newbies, & Lessons Learned from a failed startup
User: kmaclean
Date: 8/30/2009 10:36 am
Views: 3553
Rating: 9

From this article: On Speech Recognition: Web App Integration, Pointers for Newbies, & Lessons Learned from a failed startup:

For all of those thinking of integrating speech recognition into their apps I have a word of advice for you: Don’t.

[...]

[The] speech rec discussed in this article is the kind that understands short phrases and/or commands with no training required. It’s not free flowing dictation like that found in Dragon software. [...]

He reviews some of ways to integrate speech recognition into a web application:

  1. Telephony
  2. Web Services
  3. Embedded

And then describes the main stumbling block for open source speech recognition:

[...] The only real differences between the open source and commercially available solutions lie in what’s called their Acoustic Models. AMs for speech rec are like gold. A good AM is produced from several thousand hours of good audio samples.

Re: On Speech Recognition: Web App Integration, Pointers for Newbies, & Lessons Learned from a failed startup
User: kmaclean
Date: 11/19/2009 11:14 am
Views: 87
Rating: 8

Found an interesting reply on the original article.  From the post:

[...] I use sphinx4 and pocketsphinx and am very pleased. They are state of the art decoders. The acoustic model, or lack thereof is the reason why commercial engines are perceived as superior.  [...] But after making my own model, and still in the neverending process of making my own, tweaking it, etc.., I appreciate the misery that is collecting and organizing transcribed data and appreciate the work they do even if I can't use it.

Notice something wrong with your model, need to retrain it. Takes days with a quad core. [...] And spotting errors is hard. I cant emphasize enough how boring it is to listen to hours on end of audio and see that it matches up with the text perfectly. In some cases, listening a bunch of times to make sure. Noticing issues with your model, having to go figure out why.

[...] You dont need thousands of hours unless your doing dictation and if that extra few percent is worth it. You can get good results with low hundreds.

There are other equally important factors like language models that he should of mentioned that could be equally as important as the acoustic model. How its important to have relevant, and lots of data to train them. The acoustic model is only one of many factors(as is the decoder for the matter). [...]

 

 

PreviousNext