General Discussion

Nested
Using Keith Vertanen's AM and LM in Julius
User: noegroz1987
Date: 3/17/2011 12:31 pm
Views: 9717
Rating: 3

Hi all,

I've tried to use Keith Vertanen's AM and LM in Julius SR decoder. However I still haven't satistied with the result. I can say that the result were almost always wrong.

Here is what I already did. Please tell me your opinion and help me. I really want to make this work. :)

*) convert HTK's binary-hmmdefs --> HTK's ascii-hmmdefs:

cd wsj_all_10000_32
touch empty.hed
HHEd -H hmmdefs -w hmmdefs_ascii empty.hed tiedlist

*) convert ARPA N-gram --> binary N-gram (forward)

cd lm_giga
mkbingram -nlr lm_giga_64k_nvp_3gram.arpa lm_giga_64k_nvp_3gram_fw.bin

*) convert Hvite's dictionary --> Julius' dictionary:

cd lm_giga
cat lm_giga_64k_nvp.hvite.dic | gawk '{$1=sprintf("%s [%s]", $1, $1); print $0}' > 64k_nvp.julius.dic

*) tweak the Julius' dictionary:

  1. omit the "sp" and "sil"
  2. delete duplicate entries 
  3. correct "<s>" and "</s>" definition

*) create Julius configuration file (imitate fast.jconf from Julius' dictation-kit-v4.1)

## Language Model
-d lm_giga/lm_giga_64k_nvp_3gram_fw.bin
-v lm_giga/64k_nvp.julius.dic

## Acoustic Model
-h wsj_all_10000_32/hmmdefs_ascii
-hlist wsj_all_10000_32/tiedlist

-n 5
-output 1

-input mic
-zmeanframe

-rejectshort 800

#-demo
#-debug

*) run the Julius and test it using mp3s of "cmu_com_kal_ldom" I got from VoxForge

Btw, I got this error message: "Error: voca_malloc: maximum dict size exceeded limit (65535)." Is it related with the very poor performance?

Thanks.

-arie

--- (Edited on 3/17/2011 12:31 pm [GMT-0500] by noegroz1987) ---

--- (Edited on 3/17/2011 12:39 pm [GMT-0500] by noegroz1987) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: kmaclean
Date: 3/17/2011 5:20 pm
Views: 115
Rating: 1

>run the Julius and test it using mp3s of

>"cmu_com_kal_ldom" I got from VoxForge

I am confused,.. the VoxForge version of the cmu_com_kal_ldom onoy only seems to include wav data - if you are trying to recognize mp3 audio with an acoustic model trained with wav data, there will be some degradation in recognition rates.

Another problem may be that you are trying to do dictation recognition using Keith's AMs and LMs, and these may not be designed for this...

>Error: voca_malloc: maximum dict size exceeded limit(65535)

see Julius doc:

Size Limit

The recognition dictionary is limited to 65,535 words.

However, at configuration time if the "-enable-word-int" option is used the dictionary can be extended to

2^31 words. At present performance is not guaranteed when using this option.

 

--- (Edited on 3/17/2011 6:20 pm [GMT-0400] by kmaclean) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: noegroz1987
Date: 3/17/2011 9:27 pm
Views: 163
Rating: 1

Hi Ken,

Thanks for the response.

I am confused,.. the VoxForge version of the cmu_com_kal_ldom onoy only seems to include wav data - if you are trying to recognize mp3 audio with an acoustic model trained with wav data, there will be some degradation in recognition rates.

I will also try to use the wav version. I'm still downloading the file now. Hope the result will be better.

Size Limit: The recognition dictionary is limited to 65,535 words. However, at configuration time if the "-enable-word-int" option is used the dictionary can be extended to 2^31 words.

Thanks for the clue. I missed it because I didn't expect I will found the info in 'libsent options' part (refer to Juliusbook v4.1.5).

Furthermore, do you think the Julius configuration I use was good enough? Any suggestion?

-arie

--- (Edited on 3/18/2011 9:27 am [GMT+0700] by noegroz1987) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: Visitor
Date: 3/18/2011 5:53 am
Views: 178
Rating: 2

However, at configuration time if the "-enable-word-int" option is used the dictionary can be extended to 2^31 words.

I tried to use that options when configuring Julius. But, when I tried to run Julius with above configuration, I got this error message:

Error: mymalloc_big: failed to allocate 1 x 4294907157 bytes

Any idea how to solve this problem?

Thanks.

-arie

--- (Edited on 3/18/2011 5:53 am [GMT-0500] by Visitor) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: noegroz1987
Date: 3/18/2011 11:12 am
Views: 100
Rating: 2

I am confused,.. the VoxForge version of the cmu_com_kal_ldom onoy only seems to include wav data - if you are trying to recognize mp3 audio with an acoustic model trained with wav data, there will be some degradation in recognition rates.

I ignored the vocabulary size limit problem and tried to test using wavs of "cmu_com_kal_ldom". Unfortunately, the recognition result was still very poor. For example:

  • played: "Hello"; recognized as: "then an"
  • played: "I'm sorry"; recognized as: "annex then an"
  • played: "Please speak clearly and naturally"; recognized as: "an ice fifth then that then an"
  • etc.

Besides, sometimes I also got no result and a warning: "WARNING: 00 _default: hypothesis stack exhausted, terminate search now".

I checked the input by recording it and I think it's quite good. I checked the ARPA n-gram file and found n-grams that can construct the sentence "Please speak clearly and naturally". Any idea how to check whether the AM and LM are correct and can be used?

Thanks.

-arie

 

--- (Edited on 3/18/2011 11:12 pm [GMT+0700] by noegroz1987) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: kmaclean
Date: 3/18/2011 5:16 pm
Views: 84
Rating: 1

>Any idea how to solve this problem?

best to talk to the folks on the Julius forum

 

--- (Edited on 3/18/2011 6:16 pm [GMT-0400] by kmaclean) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: kmaclean
Date: 3/18/2011 5:27 pm
Views: 273
Rating: 1

> Any idea how to check whether the AM and LM are correct

>and can be used?

For the AM, test it using a grammar.  Unfortunately, I do not have much experience with LMs - I tried playing with Julius dictation a while back, but was not successful.... I assumed it was that the acoustic model I was using was not good enough (early VoxForge AM).   

My understanding with respect to dictation is that it is very difficult to make it work for multiple speakers, but if you adapt a generic acoustic model to a particular user, then you may get better results.

 

--- (Edited on 3/18/2011 6:27 pm [GMT-0400] by kmaclean) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: kmaclean
Date: 3/20/2011 8:09 pm
Views: 76
Rating: 1

> but if you adapt a generic acoustic model to a particular user,

>then you may get better results

See this post on Nickolay's blog: How to create a speech recognition application for your needs - he talks about  server-based, but a similar approach could be used for dictation on a single computer.

 

--- (Edited on 3/20/2011 9:09 pm [GMT-0400] by kmaclean) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: noegroz1987
Date: 3/21/2011 12:54 am
Views: 95
Rating: 0

Thanks, Ken.

--- (Edited on 3/21/2011 12:54 pm [GMT+0700] by noegroz1987) ---

Re: Using Keith Vertanen's AM and LM in Julius
User: noegroz1987
Date: 3/21/2011 12:56 am
Views: 307
Rating: 1

> best to talk to the folks on the Julius forum

I already posted the same question there. But, there is no answer yet.

--- (Edited on 3/21/2011 12:56 pm [GMT+0700] by noegroz1987) ---

PreviousNext