Training for speech recognizer of specific person?

Acoustic Model Discussions

Flat

User: Alex
Date: 10/12/2010 6:44 am

Views: 6387
Rating: 1

Hi!

I am planing to perform specific project, a model that will recognize just one speaker.

Basically the software will be trained by me and I will be the only person using it.

In such case what kind of training is needed?

Thanks in advance, Alex

--- (Edited on 10/12/2010 6:44 am [GMT-0500] by Visitor) ---

Re: Training for speech recognizer of specific person?

User: nsh
Date: 10/13/2010 7:32 am

Views: 71
Rating: 1

Hello Alex

Training single-speaker model is not different from training multispeaker model. I recommend you train model for CMUSphinx using SphinxTrain. See the training tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialam

The only thing you need to care about is to set number of tied states properly for the amount of your data. If you have 2-3 hours of training data, 2000 tied states will work for you.

--- (Edited on 10/13/2010 16:32 [GMT+0400] by nsh) ---

Re: Training for speech recognizer of specific person?

User: Alex
Date: 10/14/2010 10:26 am

Views: 107
Rating: 2

Hi!

Actually what I am asking is, if you are createing an application for SINGLE user and limited vocabulary(lets say 500 words) can the training work in such a way:

1) Record EVERY word you want to be recognized later, by saying it once during the training proccess.

2) The recognizer will be able to recognize only those words that were added to the database in step one.

Thanks in advance!

--- (Edited on 10/14/2010 10:26 am [GMT-0500] by Visitor) ---

Re: Training for speech recognizer of specific person?

User: nsh
Date: 10/14/2010 10:43 am

Views: 100
Rating: 1

> 1) Record EVERY word you want to be recognized

> later,bysaying it once during the training proccess.

Heh, your poor user. Do you care about him? Do you think he need to spend ages recording all 500 words you've invented for him? That's crazy. I'm certainly don't want to be your user. FYI, for reliable training the number of samples of the each context in db needs to be like 50-100.

Common practice is to suggest user to read 2-3 paragraphs of interesting text and adapt generic model to user speech. This way you can reach even superior accuracy and make user's life enjoying. Dragon Naturally Speaking for example suggests users to read funny "Dogbert's top secret management handbook". It has some funnies related to your question like

If you don't know what to do ask for the weekly report

> 2) The recognizer will be able to recognize only those words that were added to the database in step one.

Recognizer will recognize words specified in the language model. It's unrelated to words used during training/adaptation.

--- (Edited on 10/14/2010 19:43 [GMT+0400] by nsh) ---

Re: Training for speech recognizer of specific person?

User: Alex
Date: 10/14/2010 11:24 am

Views: 108
Rating: 1

hmm...

I didn't think that the number of samples of the each context in db needs to be like 50-100...

Thank you for clerifing that...

My assamption was that since the program should recognize just one user it will be enogh to say each word once.

I thought that for example same person saying a word "hello" will look similar(the voice wave) every time the same person says it...

Am I wrong with this assamption?

Alex

--- (Edited on 10/14/2010 11:24 am [GMT-0500] by Visitor) ---

Re: Training for speech recognizer of specific person?

User: nsh
Date: 10/14/2010 11:33 am

Views: 2618
Rating: 2

> Am I wrong with this assumption?

Yes, there are thousand ways to say hello the way it will be very different :)

Check this

http://www.youtube.com/watch?v=99jVPJUeqr4

Interesting how many will you count

--- (Edited on 10/14/2010 20:36 [GMT+0400] by nsh) ---

Previous • Next •


Username	Password