Speech Recognition Engines

Flat
How Speech Recognition Works
User: kmaclean
Date: 1/30/2007 8:30 pm
Views: 6521
Rating: 33

There is a good article on the Howstuffworks.com site that describes How Speech Recognition Works.

Ken 

--- (Edited on 1/30/2007 9:30 pm [GMT-0500] by kmaclean) ---

Re: How Speech Recognition Works
User: David Gelbart
Date: 8/14/2007 4:38 pm
Views: 262
Rating: 16

I've also got a list of links on this topic at

http://www.icsi.berkeley.edu/~gelbart/edu.html 

--- (Edited on 8/14/2007 4:38 pm [GMT-0500] by Visitor) ---

Re: How Speech Recognition Works
User: sritha
Date: 5/8/2009 4:28 am
Views: 1730
Rating: 5

I don't know how to submit the question to this site

plz give the reply to this question.

I develop a small speech recognition project taking the software from sphinx3 and insatalled and testing AN database and rm1 data base.

But i dont know how to build my own database.

plz tellme what is the process to build the db.atleast answer website address.

                                  Thanku

--- (Edited on 5/8/2009 4:28 am [GMT-0500] by Visitor) ---

Speak in whole sentences!
User: ralfherzog
Date: 8/17/2007 6:52 pm
Views: 382
Rating: 22
Hello Ken!

In the article you have linked to it is said:

"It is much easier for the program to understand words when we speak them separately, with a distinct pause between each one."

I don't agree. Frown I am using DNS 9.  This program uses the natural speaking approach.  This means that it understands your words best if you speak in a *natural* way.  So make no "distinct pause" between the words.  You should speak in a clear kind of way.  You get the best results if you speak like a news speaker.  But don't speak the words separately, speak in whole sentences.  Speak naturally, and everything should be fine.

Or why do you fragment the texts for the voxforge.org project into pieces of five to 10 seconds containing several words? You need whole sentences to get good results.  Long sentences that need more than about 10 seconds to say, should be split into several fragments.

So this is my opinion: voxforge.org does the right thing by splitting speech into parts between five to 10 seconds. Wink

--- (Edited on 8/17/2007 6:52 pm [GMT-0500] by ralfherzog) ---

Re: Speak in whole sentences!
User: kmaclean
Date: 8/18/2007 2:17 pm
Views: 713
Rating: 27

Hi Ralf,

Thanks for the question! 

I think in the context of the whole paragraph, the sentence you highlighted is correct:

Speech recognition systems made more than 10 years ago also faced a choice between discrete and continuous speech. It is much easier for the program to understand words when we speak them separately, with a distinct pause between each one. However, most users prefer to speak in a normal, conversational speed. Almost all modern systems are capable of understanding continuous speech.

I think they are referring to the late nineties era when the first speech recognition engines that came out only recognized discrete speech.  Back then it was easier to create a *program* to recognize discrete speech then it was to create one to recognize continuous speech, simply because the average computer of the time was not powerful enough to power continuous speech recognition.

These days, as the paragraph says "Almost all modern systems are capable of understanding continuous speech".

Ken 

--- (Edited on 8/18/2007 3:17 pm [GMT-0400] by kmaclean) ---

PreviousNext