Speech Recognition Engines

Re: CMUSLM *.arpa
User: ravi
Date: 4/9/2010 6:48 am
Views: 94
Rating: 2


We are building a small application which asks user to read out a given sentence, and shows the correctly spoken words and in correctly spoken words from the sentence.

We have been using online tool from http://speech.cs.cmu.edu/tools/lmtool.html for creating lm files for the sentences in the application and it is working good.

I have tried creating a lm file using recent nightly build of cmuclmtk using exe files from bin.

These new lm files (from cmuclmtk) are not working in our application, but using the application is working with lm generated from online tool.

I have used 

  • text2wfreq.exe -hash 1000000 -verbosity 2 <corpus.txt> corpus.wfreq
  • wfreq2vocab.exe -top 20000 -records 1000000 -verbosity 2 <corpus.wfreq> corpus.vocab
  • text2idngram.exe -vocab corpus.vocab -files 25 -n 3 -write_ascii -fof_size 10 -verbosity 2 <corpus.txt> corpus.idngram
  • idngram2lm -idngram corpus.idngram -vocab corpus.vocab -arpa corpus.arpa -ascii_input 

and got the .arpa  file, which i renamed to .lm to check in our application. We have about 300 unique words in the text file

Can you please suggest where i am going wrong?



--- (Edited on 4/9/2010 6:48 am [GMT-0500] by Visitor) ---

Re: CMUSLM *.arpa
User: nsh
Date: 4/9/2010 8:36 pm
Views: 39
Rating: 1

> Can you please suggest where i am going wrong?

You are asking on wrong forum (It's way better to ask on CMUSphinx forums) in wrong thread (It's not polite to continue discussion that ended a year ago).

--- (Edited on 4/10/2010 05:36 [GMT+0400] by nsh) ---

Re: CMUSLM *.arpa
User: chn
Date: 4/10/2010 3:34 am
Views: 2685
Rating: 3

I think you should delete 2-8 rows in *.arpa,then rename it for ".lm".

--- (Edited on 4/10/2010 3:34 am [GMT-0500] by chn) ---