Click here to register.

Running Julian Live

1. First you need to create your Julian configuration file.  Copy this sample configuration file (julian.jconfjulian.jconf) to you 'voxforge/manual' folder.  For details on the parameters contained in the julian.jconf file, see the Julius manual.  The main parameters are shown below:

## Grammar definition file (DFA and dictionary)
-dfa sample.dfa
-v sample.dict

## Acoustic HMM file
-h hmm15/hmmdefs
## HMMList that maps logical triphone to physical ones.
-hlist tiedlist
-smpFreq 48000		# sampling rate (Hz) 


2. Make sure your Microphone volume is similar to when you created your audio files. Then run Julian with:

$julian -input mic -C julian.jconf


Click here to see what your Julian startup output should look like.

The first 2-3 seconds of your speech will not be recognized - Julian adjusts its recognition levels (that is what the reference to their being "no CMN parameter is available on startup" is all about).  In addition, Julian will only recognize phrases from the grammar you created in Step 1.

Click here to see what the Julian Recognition Output looks like when I say "Phone Steve" into my microphone.

You should get fair recognition results.  To improve recognition, your Acoustic Model needs more audio training data.  You need to create new prompts, and record more speech audio files based on these prompts in order to create better acoustic models.  You can speed up the training process by using the Acoustic Model creation script in the How-to (i.e. How-to Create an Acoustic Model - using a script).


Comments

Click the 'Add' link to add a comment to this page; click the 'Read More' link to view replies to a posted comment.

AddSearch

How to improve recognition
By kmaclean - 4/18/2008

If you have a small grammar, the following things can help improve recognition performance:

  • using a noise model,
  • using word-based hmm with more states (rather than phone-based hmm),
  • not using context-independent models

 

how to discard useless result?
By manio - 5/19/2008 - 2 Replies

Julian will find out the best fit sentence in the grammar every time I speak out,EVEN a noise happen,it will give a result.

How do I know a result is generated from a purposeful voice or a noise?

use the score1? but I found that the score of the purposeful voice and the score of the noise are nearly the same.where is the bounds?

is there another way?

thanks 

noise's result
--------------------------------------

pass1_best: <s> <n><num>1</num>ZYF</n>
pass1_best_wordseq: 0 2
pass1_best_phonemeseq: sil | jh ow r er n f aa
pass1_best_score: -10684.339844

length: 318 frames (1.06 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=318
sentence1: <s> <n><num>3</num>ZMY</n> </s>
wseq1: 0 2 1
phseq1: sil | jh aa ng m ae ng y iy | sil
cmscore1: 1.000 0.639 1.000
score1: -13599.010742
6 generated, 6 pushed, 4 nodes popped in 318

 purposeful voice
---------------------------------------------------------

pass1_best: <s> <n><num>1</num>ZYF</n>
pass1_best_wordseq: 0 2
pass1_best_phonemeseq: sil | jh ow r er n f aa
pass1_best_score: -12398.662109

length: 386 frames (1.28 sec.)
### Recognition: 2nd pass (RL heuristic best-first with DFA)
samplenum=386
sentence1: <s> <n><num>2</num>LDH</n> </s>
wseq1: 0 2 1
phseq1: sil | y ow b aa hh aa | sil
cmscore1: 1.000 1.000 1.000
score1: -14228.536133
8 generated, 8 pushed, 4 nodes popped in 386

Congratulation
By royerfa - 2/27/2008 - 8 Replies

Thanhs a lot for this tutorial,

It is a really good help to start using SRE.

I do the tutorial and in fact I am not really satisfied of the Julian Result. He recognize less than one sentence on four.

Quite bad result no.

I record the sample using audacity at a sample rate of 98000Hz. Maybe it is the cause of my problem, what do do think ?

But I don't forget to change the sampling rate in Jconf.

What shoulld I do to improve the recognition.

THX

FabWink

error when running Julius
By amza - 1/15/2008 - 1 Replies

First of all, I say thanks a lot for your help before. I am really supported by your answers on my questions. Now, I have already trained several Indonesian audio files to get hmm models. Then, I get language model, that is 2-gram language model (ARPA format), by using a tool provided in "http://www.speech.cs.cmu.edu/tools/lm.html".

I use all of those models, acoustic model and language model, with Julius to test some speech. The content of jconf file (julius.jconf) I use is:

-nlr model/grammar
-v dict
-h hmm15/hmmdefs
-hlist tiedlist
-gprune safe     
-input rawfile        # ‰¹º”gŒ`ƒf[ƒ^ƒtƒ@ƒCƒ‹(ƒtƒH[ƒ}ƒbƒgŽ©“®”»•Ê)

            # Œ`Ž®FWAV(16bit) ‚Ü‚½‚Í
            #    RAW(16bit(signed short),mono,big-endian)
            #    16kHzˆÈŠO‚̃tƒ@ƒCƒ‹‚Í -smpFreq ‚Ŏü”g”Žw’è
-filelist listfile.txt    # ”Fޝ‘Ώۃtƒ@ƒCƒ‹‚ÌƒŠƒXƒg
-smpFreq 16000        # ƒTƒ“ƒvƒŠƒ“ƒOŽü”g”(Hz)
-smpPeriod 625    # ƒTƒ“ƒvƒŠƒ“ƒOŽüŠú(ns) (= 10000000 / smpFreq)
-demo            # "-progout -quiet" ‚Æ“¯

When I run Julius by typing: "julius -C julius.jconf", there are errors (I show you the whole results) as follows:

$ julius -C julius.jconf
STAT: include config: julius.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found:
"sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: init_phmm: defined HMMs:    23
Stat: init_phmm: logical names:   117 in HMMList
Stat: init_phmm: base phones:    23 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 92 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
Stat: init_voca: read 23 words
Stat: init_ngram: reading in ARPA forward n-gram from model/grammar
Stat: ngram_read_arpa: this is 2-gram file
Stat: ngram_read_arpa: reading 1-gram part...
Stat: ngram_read_arpa: read 19 1-gram entries
Stat: ngram_read_arpa: reading 2-gram part...
Stat: ngram_read_arpa: 2-gram read 0 (0%)
Stat: ngram_read_arpa: 2-gram read 32 end
Stat: ngram_compact_context: bigram bowt compaction: 32 -> 0
Error: mymalloc: failed to reallocate 0 bytes

Before that, I have also experienced "Error: ngram_compact_context: 2-gram has no upper 3-gram, but not 0.0 back-off w
eight".

Could you help me to solve those two errors? Thanks a lot.

 

regards,

Amalia zahra 

error in loading model when executing julius
By amza - 1/10/2008 - 2 Replies

I have prepared all model needed to run Julius. The contents of jconf file (julius.jconf) to run julius are:

-nlr model/grammar        # 2-gram
-v dict
-h hmm15/hmmdefs
-hlist tiedlist
-gprune safe        # safe pruning ãˆÊNŒÂ‚ªŠmŽÀ‚É‹‚Ü‚éD³ŠmD
-n 10
-output 10        # ‘æ2ƒpƒX‚ÅŒ©‚‚©‚Á‚½•¶‚Ì‚¤‚¿o—Í‚·‚鐔 i•¶”j
-input rawfile        # ‰¹º”gŒ`ƒf[ƒ^ƒtƒ@ƒCƒ‹(ƒtƒH[ƒ}ƒbƒgŽ©“®”»•Ê)
            # Œ`Ž®FWAV(16bit) ‚Ü‚½‚Í
            #    RAW(16bit(signed short),mono,big-endian)
            #    16kHzˆÈŠO‚̃tƒ@ƒCƒ‹‚Í -smpFreq ‚Ŏü”g”Žw’è
-filelist listfile.txt    # ”Fޝ‘Ώۃtƒ@ƒCƒ‹‚ÌƒŠƒXƒg
-zmean            # DC¬•ª‚̏œ‹Ž‚ðs‚¤ (-input mfcfileŽž–³Œø)
-rejectshort 100    # Žw’èƒ~ƒŠ•bˆÈ‰º‚Ì’·‚³‚Ì“ü—Í‚ðŠü‹p‚·‚é
-lv 10000        # ƒŒƒxƒ‹‚Ì‚µ‚«‚¢’l (0-32767)
-smpFreq 16000        # ƒTƒ“ƒvƒŠƒ“ƒOŽü”g”(Hz)
-smpPeriod 625    # ƒTƒ“ƒvƒŠƒ“ƒOŽüŠú(ns) (= 10000000 / smpFreq)
-demo            # "-progout -quiet" ‚Æ“¯‚¶

When I run julius by typing:

$julius -C julius.jconf 

There were errors as follows:

STAT: include config: julius.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found:
"sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: init_phmm: defined HMMs:    17
Stat: init_phmm: logical names:    46 in HMMList
Stat: init_phmm: base phones:    17 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 28 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
Error: voca_load_htkdict: line 1: triphone "uh-ah+sp" not found
Error: voca_load_htkdict: line 1: triphone "ah-sp+*" or biphone "ah-sp" not foun
d
Error: voca_load_htkdict: the line content was: DUA             [DUA]
d uh ah sp
Error: voca_load_htkdict: line 2: triphone "ah-t+sp" not found
Error: voca_load_htkdict: line 2: triphone "t-sp+*" or biphone "t-sp" not found
Error: voca_load_htkdict: the line content was: EMPAT           [EMPAT]
ax m p ah t sp
Error: voca_load_htkdict: line 3: triphone "ah-m+sp" not found
Error: voca_load_htkdict: line 3: triphone "m-sp+*" or biphone "m-sp" not found
Error: voca_load_htkdict: the line content was: ENAM            [ENAM]
ax n ah m sp
Error: voca_load_htkdict: line 4: triphone "oh-ng+sp" not found
Error: voca_load_htkdict: line 4: triphone "ng-sp+*" or biphone "ng-sp" not foun
d
Error: voca_load_htkdict: the line content was: KOSONG          [KOSONG]
k oh sh oh ng sp
Error: voca_load_htkdict: line 5: triphone "m-ah+sp" not found
Error: voca_load_htkdict: line 5: triphone "ah-sp+*" or biphone "ah-sp" not foun
d
Error: voca_load_htkdict: the line content was: LIMA            [LIMA]
l ih m ah sp
Error: voca_load_htkdict: line 6: triphone "t-uh+sp" not found
Error: voca_load_htkdict: line 6: triphone "uh-sp+*" or biphone "uh-sp" not foun
d
Error: voca_load_htkdict: the line content was: SATU            [SATU]
sh ah t uh sp
Error: voca_load_htkdict: line 9: triphone "g-ah+sp" not found
Error: voca_load_htkdict: line 9: triphone "ah-sp+*" or biphone "ah-sp" not foun
d
Error: voca_load_htkdict: the line content was: TIGA            [TIGA]
t ih g ah sp
Error: voca_load_htkdict: begin missing phones
Error: voca_load_htkdict: ah-m+sp
Error: voca_load_htkdict: ah-sp+* or biphone ah-sp
Error: voca_load_htkdict: ah-t+sp
Error: voca_load_htkdict: g-ah+sp
Error: voca_load_htkdict: m-ah+sp
Error: voca_load_htkdict: m-sp+* or biphone m-sp
Error: voca_load_htkdict: ng-sp+* or biphone ng-sp
Error: voca_load_htkdict: oh-ng+sp
Error: voca_load_htkdict: t-sp+* or biphone t-sp
Error: voca_load_htkdict: t-uh+sp
Error: voca_load_htkdict: uh-ah+sp
Error: voca_load_htkdict: uh-sp+* or biphone uh-sp
Error: voca_load_htkdict: end missing phones
Error: init_voca: error in reading dict: 7 words failed out of 2 words
ERROR: m_fusion: failed to read dictionary, terminated
ERROR: m_fusion: failed to initialize dictionary
ERROR: Error in loading model

I knew that those missing phones didn't exist in "triphones1" file, but this file was generated by HTKToolkit. I think this problem occurs because of the content of "dict" file that consists of list of words like this: "DUA             [DUA]           d uh ah sp", which is generated by HTKToolkit and already consists of "sp". Can you help me to solve this problem? Thank you very much.

 

regards,

Amalia Zahra