I have downoad Julius 4.1.5 and tried to use it to do dictation. The sample comes with a grammar while I am looking for using ngram language model. I create my lm using the tool mkbingram from a 3gram arpa format.
mkbingram -nlr mylm.arpa lmf2r3.bin
Acoustic model is download from voxforge julius nightly built.
When I run the julius for transcript my recording files
julius -C julian.jconf
>input file: myvoice.raw
I got "<s></s>", basically it does not recognize anything.
myvoice.raw is 8KHz big Eiden audio file.
Could anyone give me a direction where my problem is?
Here is my configuration file:
#### Files
## Grammar definition file (DFA and dictionary)
#### There are three ways to specify the grammar files.
#### (1) and (2) can be used multiple times.
#### (1) Specify by common prefix of .dfa and .dict files. Comma-separated
#### prefixes can be specified for multiple grammar recognition
#-gram /cdrom/testrun/sample_grammars/vfr/vfr
#### (2) Or you can give Julian a text file which contains list of grammar
#### prefixes one per line.
#-gramlist file
#### (3) Classic way to specify a grammar.
#-dfa grammar/sample.dfa
#-v grammar/sample.dict
-v /home/ubuntu/julius4/models/acoustic/dict
-d /home/ubuntu/julius4/models/language/lmf2r3.bin
#### If you want to clear previously specified grammars, use this at the
#### point.
## Acoustic HMM file
# support ascii hmmdefs or binary format (converted by "mkbinhmm")
# format (ascii/binary) will be automatically detected
-h models/acoustic/hmmdefs
## triphone model needs HMMList that maps logical triphone to physical ones.
-hlist models/acoustic/tiedlist
#### Multiple grammar recognition
#-multigramout # Output results for each grammar
#### Language Model
## word insertion penalty
-penalty1 5.0 # first pass
-penalty2 20.0 # second pass
#### Dictionary
## do not giveup startup on error words
#### Acoustic Model
## Context-dependency handling will be enabled according to the model type.
## Try below if julius wrongly detect the type of hmmdefs
#-no_ccd # disable context-dependency handling
#-force_ccd # enable context-dependency handling
## If julius go wrong with checking parameter type, try below.
## (PTM/triphone) switch computation method of IWCD on 1st pass
#-iwcd1 best N # assign average of N-best likelihood of the same context
#-iwcd1 max # assign maximum likelihood of the same context
-iwcd1 avg # assign average likelihood of the same context (default)
#### Gaussian Pruning
## Number of mixtures to select in a mixture pdf.
## This default value is optimized for IPA99's PTM,
## with 64 Gaussians per codebook
#-tmix 2
## Select Gaussian pruning algorithm
## defulat: beam (standard setting), safe (others)
-gprune safe # safe pruning, accurate but slow
#-gprune heuristic # heuristic pruning
#-gprune beam # beam pruning, fast but sensitive
#-gprune none # no pruning
#### Gaussian Mixture Selection
#-gshmm hmmdefs # monophone HMM for GMS
# (OFF when not specified)
#-gsnum 24 # number of states to be selected on GMS
#### Search Parameters
#-b 400 # beam width on 1st pass (#nodes) for monophone
#-b 800 # beam width on 1st pass (#nodes) for triphone,PTM
-b 10000 # beam width on 1st pass (#nodes) for triphone,PTM,engine=v2.1
-b2 50 # beam width on 2nd pass (#words)
#-sb 200.0 # score beam envelope threshold
#-s 500 # hypotheses stack size on 2nd pass (#hypo)
#-m 2000 # hypotheses overflow threshold (#hypo)
#-lookuprange 5 # lookup range for word expansion (#frame)
#-n 1 # num of sentences to find (#sentence)
-n 10 # (default for 'standard' configuration)
#-output 1 # num of found sentences to output (#sentence)
#-looktrellis # search within only backtrellis words
#### Inter-word Short Pause Handling
## Specify short pause model name to be treated as special
-spmodel "sp" # HMM model name
## For insertion of context-free short-term inter-word pauses between words
## (multi-path version only)
-iwsp # append a skippable sp model at all word ends
-iwsppenalty -70.0 # transition penalty for the appenede sp models
#### Speech Input Source
## select one (default: mfcfile)
#-input mfcfile # MFCC file in HTK parameter file format
-input rawfile # raw wavefile (auto-detect format)
# WAV(16bit) or
# RAW(16bit(signed short),mono,big-endian)
# AIFF,AU (with libsndfile extension)
# other than 16kHz, sampling rate should be specified
# by "-smpFreq" option
#-input mic # direct microphone input
# device name can be specified via env. val. "AUDIODEV"
#-input netaudio -NA host:0 # direct input from DatLink(NetAudio) host
#-input adinnet -adport portnum # via adinnet network client
#-input stdin # from standard tty input (pipe)
#-filelist filename # specify file list to be recognized in batch mode
#-nostrip # switch OFF dropping of invalid input segment.
# (default: strip off invalid segment (0 sequence etc.)
-zmean # enable DC offset removal (invalid for mfcfile input)
#### Recording
#-record directory # auto-save recognized speech data into the dir
#### GMM-based Input Verification and Rejection
#-gmm gmmdefs # specify GMM definition file in HTK format
#-gmmnum 10 # num of Gaussians to be computed per mixture
#-gmmreject "noise,laugh,cough" # list of GMM names to be rejected
#### Too Short Input Rejection
#-rejectshort 200 # reject input shorter than specified millisecond
#### Speech Detection
#-pausesegment # turn on speech detection by level and zero-cross
-nopausesegment # turn off speech detection by level and zero-cross
# (default: on for mic or adinnet, off for file)
-lv 1000 # threshold of input level (0-32767)
-headmargin 500 # head margin of input segment (msec)
-tailmargin 2000 # tail margin of input segment (msec)
-zc 60 # threshold of number of zero-cross in a second
#### Acoustic Analysis
-smpFreq 8000 # sampling rate (Hz)
-smpPeriod 1250 # sampling period (ns) (= 10000000 / smpFreq)
#-fsize 400 # window size (samples)
#-fshift 160 # frame shift (samples)
#-delwin 2 # delta window (frames)
#-hifreq 4000 # cut-off hi frequency (Hz) (-1: disable)
#-lofreq 10 # cut-off low frequency (Hz) (-1: disable)
#-cmnsave filename # save CMN param to file (update per input)
#-cmnload filename # load initial CMN param from file on startup
#### Spectral Subtraction (SS)
#-sscalc # do SS using head silence (file input only)
#-sscalclen 300 # length of head silence for SS (msec)
#-ssload filename # load constant noise spectrum from file for SS
#-ssalpha 2.0 # alpha coef. for SS
#-ssfloor 0.5 # spectral floor for SS
#### Forced alignment
#-walign # do forced alignment with result per word
#-palign # do forced alignment with result per phoneme
#-salign # do forced alignment with result per HMM state
#### Word Confidence Scoring
#-cmalpha 0.05 # smoothing coef. alpha
#### Output
#-separatescore # output language and acoustic score separately
-progout # output partial result per a time interval
-proginterval 300 # time interval for "-progout" (msec)
#-quiet # output minimal result
#-demo # = "-progout -quiet", suitable for dictation demo
#-debug # output full message for debug
#-charconv from to # output character set conversion (see manual for
# available code set name)
#### Server module mode
#-module # Run Julius on "Server module mode"
#-module 5530 # (when using another port number for connection)
#-outcode WLPSC # select output message toward module (WLPSCwlps)
#### Misc.
#-help # output help and exit
#-setting # output engine configuration and exit
#-C jconffile # expand other jconf file in its place
################################################################# end of file
--- (Edited on 8/5/2010 3:23 pm [GMT-0500] by cchen1103) ---
To recognize 8khz audio you need a model trained from 8khz audio. Voxforge nightly model is trained on 16khz audio and is not compatible with telephone bandwidth signal. You can try to decode 16khz first.
--- (Edited on 8/9/2010 01:58 [GMT+0400] by nsh) ---
HTK_AcousticModel-2010-08-21_8kHz_16bit_MFCC_O_D.zip 21-Aug-2010 05:53 3.4M
--- (Edited on 8/22/2010 12:24 am [GMT-0400] by kmaclean) ---