Speech Recognition Engines

Flat
Re: CMUSLM *.arpa
User: nsh
Date: 8/24/2008 1:53 am
Views: 58
Rating: 4

Are you sure you recompiled it? Everything works for me:

2 9 0 1
2 10 13 1
2 11 4 1
2 11 8 1
7 13 1 1
8 12 1 1
9 0 1 1
10 13 1 1
11 4 1 1
11 8 12 1

--- (Edited on 8/24/2008 1:53 am [GMT-0500] by nsh) ---

Re: CMUSLM *.arpa
User: chn
Date: 8/24/2008 3:28 am
Views: 176
Rating: 5

When I run autoconf

 [cui@localhost cmuclmtk]$ autoconf
configure.ac:6: error: possibly undefined macro: AM_INIT_AUTOMAKE
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
And I do it like this in configure.ac

#AM_INIT_AUTOMAKE

then I run autoconf again .(pass)

When I run ./configure

[cui@localhost cmuclmtk]$ ./configure
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for ranlib... ranlib
configure: error: cannot find install-sh or install.sh in "." "./.." "./../.."
 

I didn't resolve it. 

--- (Edited on 8/24/2008 3:29 am [GMT-0500] by chn) ---

Re: CMUSLM *.arpa
User: nsh
Date: 8/24/2008 3:42 am
Views: 108
Rating: 12

For sure you are not trying the latest version. It's the older one. Checkout the latest with subversion:

svn checkout  https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmuclmtk

--- (Edited on 8/24/2008 3:42 am [GMT-0500] by nsh) ---

Re: CMUSLM *.arpa
User: chn
Date: 8/24/2008 4:30 am
Views: 78
Rating: 3

Thank you very much!

I got it!

Thanks again! 

--- (Edited on 8/24/2008 4:30 am [GMT-0500] by chn) ---

Re: CMUSLM *.arpa
User: chn
Date: 8/24/2008 8:14 pm
Views: 53
Rating: 4

Excuse me!

When I got *.arpa with -n 3 ,the 3gram is the same as what we got online,But the 2gram is not !

what i got is :

\data\
ngram 1=13
ngram 2=12
ngram 3=13

\1-grams:
-99.0000 </s>    0.0000
-99.0000 <s>    0.0688
-1.1206 BACKWARD    0.0000
-1.1206 BROWSER    0.0000
-1.1206 EMAIL    0.0000
-1.1206 FORWARD    0.0000
-1.0414 LAST    -0.2668
-1.0414 MUSIC    -0.2668
-1.0414 NEW    -0.2668
-1.0414 NEXT    -0.2668
-0.7404 OPEN    -0.2218
-1.1206 PLAYER    0.0000
-1.1206 WINDOW    0.0000

\2-grams:
-1.1139 <s> BACKWARD -0.3010
-1.1139 <s> FORWARD -0.3010
-1.1139 <s> LAST 0.0000
-1.1139 <s> NEW 0.0000
-1.1139 <s> NEXT 0.0000
-0.8129 <s> OPEN 0.0000
-0.3010 LAST WINDOW -0.3010
-0.3010 MUSIC PLAYER -0.3010
-0.3010 NEW EMAIL -0.3010
-0.3010 NEXT WINDOW -0.3010
-0.6021 OPEN BROWSER -0.3010
-0.6021 OPEN MUSIC 0.0000

\3-grams:
-0.3010 <s> BACKWARD </s>
-0.3010 <s> FORWARD </s>
-0.3010 <s> LAST WINDOW
-0.3010 <s> NEW EMAIL
-0.3010 <s> NEXT WINDOW
-0.6021 <s> OPEN BROWSER
-0.6021 <s> OPEN MUSIC
-0.3010 LAST WINDOW </s>
-0.3010 MUSIC PLAYER </s>
-0.3010 NEW EMAIL </s>
-0.3010 NEXT WINDOW </s>
-0.3010 OPEN BROWSER </s>
-0.3010 OPEN MUSIC PLAYER

\end\

What I got online tool is :

\data\
ngram 1=13
ngram 2=18
ngram 3=13

\1-grams:
-0.8873 </s> -0.3010
-0.8873 <s> -0.2407
-1.7324 BACKWARD -0.2407
-1.7324 BROWSER -0.2407
-1.7324 EMAIL -0.2407
-1.7324 FORWARD -0.2407
-1.7324 LAST -0.2846
-1.7324 MUSIC -0.2929
-1.7324 NEW -0.2929
-1.7324 NEXT -0.2846
-1.4314 OPEN -0.2846
-1.7324 PLAYER -0.2407
-1.4314 WINDOW -0.2407

\2-grams:
-1.1461 <s> BACKWARD 0.0000
-1.1461 <s> FORWARD 0.0000
-1.1461 <s> LAST 0.0000
-1.1461 <s> NEW 0.0000
-1.1461 <s> NEXT 0.0000
-0.8451 <s> OPEN 0.0000
-0.3010 BACKWARD </s> -0.3010
-0.3010 BROWSER </s> -0.3010
-0.3010 EMAIL </s> -0.3010
-0.3010 FORWARD </s> -0.3010
-0.3010 LAST WINDOW 0.0000
-0.3010 MUSIC PLAYER 0.0000
-0.3010 NEW EMAIL 0.0000
-0.3010 NEXT WINDOW 0.0000
-0.6021 OPEN BROWSER 0.0000
-0.6021 OPEN MUSIC 0.0000
-0.3010 PLAYER </s> -0.3010
-0.3010 WINDOW </s> -0.3010

\3-grams:
-0.3010 <s> BACKWARD </s>
-0.3010 <s> FORWARD </s>
-0.3010 <s> LAST WINDOW
-0.3010 <s> NEW EMAIL
-0.3010 <s> NEXT WINDOW
-0.6021 <s> OPEN BROWSER
-0.6021 <s> OPEN MUSIC
-0.3010 LAST WINDOW </s>
-0.3010 MUSIC PLAYER </s>
-0.3010 NEW EMAIL </s>
-0.3010 NEXT WINDOW </s>
-0.3010 OPEN BROWSER </s>
-0.3010 OPEN MUSIC PLAYER

\end\

Thanks! 

--- (Edited on 8/24/2008 8:14 pm [GMT-0500] by chn) ---

Re: CMUSLM *.arpa
User: nsh
Date: 8/25/2008 2:41 am
Views: 366
Rating: 4

Hm, indeed there is a problem. QuickLm script generates exactly the correct output:

  http://www.speech.cs.cmu.edu/tools/download/quick_lm.pl

but it's not efficient. I'll try to look what's the problem with cmuclmtk.

 

--- (Edited on 8/25/2008 2:41 am [GMT-0500] by nsh) ---

Re: CMUSLM *.arpa
User: sarvesh
Date: 5/20/2009 7:35 am
Views: 44
Rating: 2

hey after installing the cmuclmtk using

make install

i am unable to run any of its file like

text2wfreq,.. etc

or can you give me the exact steps to follow in building language model using cmuclmtk

thank

--- (Edited on 5/20/2009 7:35 am [GMT-0500] by sarvesh) ---

Re: CMUSLM *.arpa
User: chn
Date: 5/20/2009 8:10 pm
Views: 253
Rating: 1

CMU-Cambridge Statistical Language Modeling Tookit v2
=====================================================

Documentation:
--------------

For installation and usage instructions for the toolkit, see

doc/toolkit_documentation.html

(for the sake of convenience, the installation instructions are also
given below).

Installation:
-------------

For "big-endian" machines (eg those running HP-UX, IRIX, SunOS,
Solaris) the installation procedure is simply to type

  cd src
  make install

The executables will then be copied into the bin/ directory, and the
library file SLM2.a will be copied into the lib/ directory.

For "little-endian" machines (eg those running Ultrix, Linux) the
variable "BYTESWAP_FLAG" will need to be set in the Makefile. This can
be done by editing src/Makefile directly, so that the line

#BYTESWAP_FLAG  = -DSLM_SWAP_BYTES

is changed to

BYTESWAP_FLAG  = -DSLM_SWAP_BYTES

Then the program can be installed as before.

If you are unsure of the "endian-ness" of your machine, then the shell
script endian.sh should be able to provide some assistance.

In case of problems, then more information can be found by examining
src/Makefile.

Files:
------

endian.sh  Shell script to report "endian-ness" (see installation
   instructions). Not terribly robust; needs to be able to see gcc,
   for example.

doc/toolkit_documentation.html   The standard html documentation for the
   toolkit. View using netscape or equivalent.

doc/toolkit_documentation_no_tables.html   As above, but doesn't use
   tables, so is suitable for use with browsers which don't support
   tables (eg lynx).

doc/toolkit_documentation.txt   The documentation in flat text.

doc/change_log.html   List of changes from version to version.

doc/change_log.txt   The above in flat text.

src/*.c src/*.h  The toolkit source files

src/Makefile  The standard make file.

src/install-sh  Shell script to install executables in the appropriate
   directory. An improvement on cp, as it will check to see whether it is
   about to overwrite an execuatable which is already in use.

include/SLM2.h   File containing all of src/*.h, allowing
   functions from the toolkit to be included in new software.

bin/   Directory where executables will be installed.

lib/   Directory where SLM2.a will be stored (useful in conjunction with
   include/SLM2.h for including functions from the toolkit to be included
   in new software.)

 

$INPUT_NAME="ec-asr_train.transcription";
$INPUT_DIR="/root/Desktop";
$OUTPUT_NAME ="ec-asr.word.transcription";
$OUTPUT_DIR = "/root/Desktop";  
$BIN_DIR = "/root/tutorial/CMU-Cam_Toolkit_v2/bin";

system("$BIN_DIR/text2wfreq <$INPUT_DIR/$INPUT_NAME.text >$OUTPUT_DIR/$OUTPUT_NAME.wfreq");
system("$BIN_DIR/wfreq2vocab <$OUTPUT_DIR/$OUTPUT_NAME.wfreq >$OUTPUT_DIR/$OUTPUT_NAME.temp.vocab");
$n=3;

 #########################################genarate .idngram directly
system("$BIN_DIR/text2idngram -n $n -vocab $OUTPUT_DIR/$OUTPUT_NAME.temp.vocab <$INPUT_DIR/$INPUT_NAME.text >$OUTPUT_DIR/$OUTPUT_NAME.dir.idngram ");


###################################generate .dir.arpa
system("$BIN_DIR/idngram2lm -vocab_type 0 -idngram $OUTPUT_DIR/$OUTPUT_NAME.dir.idngram -vocab $OUTPUT_DIR/$OUTPUT_NAME.temp.vocab -arpa $OUTPUT_DIR/$OUTPUT_NAME.dir.${n}gram.arpa -n $n -witten_bell -context $INPUT_DIR/a.ccs ");
system("$BIN_DIR/idngram2lm -vocab_type 0 -idngram $OUTPUT_DIR/$OUTPUT_NAME.dir.idngram -vocab $OUTPUT_DIR/$OUTPUT_NAME.temp.vocab -binary $OUTPUT_DIR/$OUTPUT_NAME.dir.${n}gram.binlm -n $n -witten_bell -context $INPUT_DIR/a.ccs ");


#############  evallm .binlm
#system("$BIN_DIR/evallm -binary $OUTPUT_DIR/$OUTPUT_NAME.dir.${n}gram.binlm");

##########################################generate .DMP file
system("$BIN_DIR/lm3g2dmp $OUTPUT_DIR/$OUTPUT_NAME.dir.${n}gram.arpa $OUTPUT_DIR");

--- (Edited on 5/20/2009 8:10 pm [GMT-0500] by chn) ---

Re: CMUSLM *.arpa
User: sarvesh
Date: 9/3/2009 4:55 am
Views: 139
Rating: 2

i build the "open vocabulary model (type 2)" using cmuclmtk, i dont know why am not getting any value  assigned to

2-gram discounting ratios :

3-gram discounting ratios :

only it assign

1-gram discounting ratios : 0.81

can you figure out the cause for this.

--- (Edited on 9/3/2009 3:25 pm [GMT+0530] by sarvesh) ---

Re: CMUSLM *.arpa
User: chn
Date: 9/5/2009 10:52 pm
Views: 318
Rating: 1

Read what I had written,I think you will find it!

--- (Edited on 9/5/2009 10:52 pm [GMT-0500] by chn) ---

PreviousNext