General Discussion

Flat
How to add new words... ?
User: swbluto
Date: 9/14/2010 10:17 am
Views: 4960
Rating: 3

I probably misunderstand the forums, but it seems like this thread that I posted to didn't bump because I didn't see it at the top of the category. So I'm posting a redirect...

http://www.voxforge.org/home/forums/message-boards/general-discussion/how-can-i-make-new-words__/re-how-can-i-make-new-words__

He says to ...

 

"One way is to simply add your 'out-of-vocabulary' grammar words to your dictionary (I am assuming you are using thevoxforge_dictionary in Step 2 of the VoxForge Tutorial).  To do this, you need to look at the pronunciation of similar words in the dictionary, and then create a new pronunciation entry for your word. "

...

"You then need to add this word in the Pronunciation Dictionary in Alphabetical sequence, and re-run the HDMan command in Step 2 of the VoxForge Tutorial.  You need to repeat these steps for all the "out-of-vocabulary" words in your dictionary. "

 

So I was looking at this post and it says to add it to the "pronunctiation dictionary" and rerun HDMan. Is this pronunciation dictionary the entire dictionary used to create the hmmdefs (And, thus, you'd have to go through all the steps again to create the hmmdefs)? If so, would you have to rerun the entire multistep process to regenerate the hmmdefs file?

 

I tried just changing the voca file, and recompiled using HDMAN, but then julian complained of not finding my particular word's triphone in the dictionary. It appears the 'dictionary' it was referring to was the hmmdefs, since that was the only other input it seemed to require (Other than the tied list), and I didn't change that nor the tiedlist.

 

Recompiling the dictionary entirely to get the hmmdefs (A 5 minute process it seems, probably even substantially longer if I used the voxForce speech corpus which is about the only one that seems to work well for me.) seems pretty long, and it seems there should be a more streamlined way to add an arbitrary word since it seems adding a word would require just a little addition to the hmmdefs and tiedlist file, and wouldn't radically change it completely.

 

I looked into HTK, and they can apparently make an entire hmm for a word using Hint + Hrest (As detailed in the middle of page 18 of the HTK manual); I'm thinking once you got the individual word HMM, just add it to the HMMdefs file. Then, I just need to reload the HMMdefs within julian... but can julian recognize an abitrary word HMM? The "Quick Julian Start" seems to rely solely on monophone and triphone modes, and not whole word hmms, but according to the link at http://julius.sourceforge.jp/en_index.php?q=index-en.html#feature , it has...

"(Rev. 4) Rapid isolated word recognition"

Suggesting I could use an individual word HMM, and so adding an arbitrary word shouldn't necessarily be a 5+ minute recompilation process (To get the entire HMMdefs file, again.).

 

--- (Edited on 9/14/2010 10:17 am [GMT-0500] by swbluto) ---

--- (Edited on 9/14/2010 10:57 am [GMT-0500] by swbluto) ---

Re: How to add new words... ?
User: kmaclean
Date: 9/24/2010 10:01 pm
Views: 2214
Rating: 2

>Is this pronunciation dictionary the entire dictionary used to create the

>hmmdefs (And, thus, you'd have to go through all the steps again to create

>the hmmdefs)?

Short Answer: Yes.

Longer answer: If the word you are adding already has its triphones described in the acoustic model, then you should be able to add it to your .voca file.

If the word does not have its triphones described in the acoustic model, but you have lots of training data, you may be able to get away with just adding an entry to your pronunciation dict in Step 10, and run the HDMan, HHEd, and HERest as indicated, and HTK will try to map your 'unseen' triphones to a physical triphone hmms.

If you do not have much training data, then you may need to add the word to the pronunciation dictionary and to your prompts file in Step 2, record a few instances of the word (3-5 times),  and retrain your acoustic model from scratch.

Ken

--- (Edited on 9/24/2010 11:01 pm [GMT-0400] by kmaclean) ---

PreviousNext