Click here to register.

Acoustic Model Discussions

Flat
Where is the STEP 11
User: royerfa
Date: 2/29/2008 9:11 am
Views: 404
Rating: 9    Rate [

+

]

Hello,

I would like to ask where is the step 11 from the HTKbook.

For me who as a very bad recognition, I think it is important to know  if it comes from the hmm or not !!

When I write the first command of the step 11, HVIte he tells me that I have a problem of Target. Can you explain the difference between MFFC_0 and MFCC_D_N_Z_0 ???

An over question is why the promp in the tutorial doesn't always correspond to the grammar?

 

Thanks 

Reply
Re: Where is the STEP 11
User: kmaclean
Date: 2/29/2008 10:55 am
Views: 55
Rating: 10    Rate [

+

]

Hi royerfa,

>I would like to ask where is the step 11 from the HTKbook.

look here: Testing Your Acoustic Model with HTK & Julius

>Can you explain the difference between MFFC_0 and MFCC_D_N_Z_0 ???

see this post MFCC_D_N_Z_0 format

see also the HTK book: 5.10.1 HTK Format Parameter Files.

>An over question is why the promp in the tutorial doesn't always correspond to the grammar?

read the Background section in Step 2 - Pronunciation Dictionnary of the tutorial.

Ken 

Reply
Re: Where is the STEP 11
User: royerfa
Date: 3/3/2008 9:08 am
Views: 57
Rating: 13    Rate [

+

]
Hi,

THank you for the quick answers.

This is my result

====HTK Results Analysis==========
Date: Mon Mar 3 15:53:09 2008
Ref : testref.mlf
Rec : recout.mlf
------------------------ Overall Results --------------------------
SENT: %Correct=16.00 [H=8, S=42, N=50]
WORD: %Corr=68.78, Acc=6.88 [H=130, D=4, S=55, I=117, N=189]
===========================



What do you think of that ??

It seems strange that he recognise well the words but not the sentences. Moreover julian recognise sentences with 2 words but not more, it seems to be a problem.

I send you by email, my HVite_log for you to tell me if it is alright. Because maybe as you said in the tuto the problem comes from here.

I would like to put the julian application on a target with Linux (embedded system) in oder to recognise simple commands.

This time the frequency rate is of 48K, and the microphone is well configured. but there is sometimes some saturation on the training records, I need to change this.

We can see on the HTKbook that they succeed to have a result of 98%. Did you or someone else already get this result ?


I need to understand what is going wrong with my approach of Speech recognition. With my poor exerience I realise that the recognition is realy dependent to the trainning records.



Thank in advance.
RoyerfaSmile
 
HVite -A -D -T 1 -l * -o SWT -b SENT-END -C config -H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0 150.0 1000.0 -y lab -a -I words.mlf -S train.scp dict monophones1 

HTK Configuration Parameters[10]
Module/Tool Parameter Value
# NUMCEPS 12
# CEPLIFTER 22
# NUMCHANS 26
# PREEMCOEF 0.970000
# USEHAMMING TRUE
# WINDOWSIZE 250000.000000
# SAVEWITHCRC TRUE
# SAVECOMPRESSED TRUE
# TARGETRATE 100000.000000
# TARGETKIND MFCC_0_D_N_Z

Read 44 physical / 44 logical HMMs
Label file will be used to align each file
Aligning File: ../train/mfcc/sample1.mfc
Created lattice with 14 nodes / 13 arcs from label file
SENT-END DIAL ONE TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE OH ZERO SENT-END == [1229 frames] -51.9141 [Ac=-63802.4 LM=0.0] (Act=14.4)
Aligning File: ../train/mfcc/sample2.mfc
Created lattice with 14 nodes / 13 arcs from label file
SENT-END DIAL ONE THREE FIVE SEVEN NINE ZERO TWO FOUR SIX EIGHT OH SENT-END == [1383 frames] -49.2045 [Ac=-68049.8 LM=0.0] (Act=10.6)
Aligning File: ../train/mfcc/sample3.mfc
Created lattice with 14 nodes / 13 arcs from label file
SENT-END DIAL ZERO NINE SEVEN FIVE THREE ONE OH EIGHT SIX FOUR TWO SENT-END == [1580 frames] -49.5402 [Ac=-78273.6 LM=0.0] (Act=15.6)
Aligning File: ../train/mfcc/sample4.mfc
Created lattice with 13 nodes / 12 arcs from label file
SENT-END DIAL ONE ONE TWO TWO THREE THREE FOUR FOUR FIVE FIVE SENT-END == [1370 frames] -48.6649 [Ac=-66670.9 LM=0.0] (Act=16.6)
Aligning File: ../train/mfcc/sample5.mfc
Created lattice with 15 nodes / 14 arcs from label file
SENT-END DIAL SIX SIX SEVEN SEVEN EIGHT EIGHT NINE NINE OH OH ZERO ZERO SENT-END == [1781 frames] -49.7491 [Ac=-88603.1 LM=0.0] (Act=17.3)
Aligning File: ../train/mfcc/sample6.mfc
Created lattice with 8 nodes / 7 arcs from label file
SENT-END PHONE STEVE YOUNG CALL STEVE YOUNG SENT-END == [883 frames] -48.4922 [Ac=-42818.6 LM=0.0] (Act=10.1)
Aligning File: ../train/mfcc/sample7.mfc
Created lattice with 10 nodes / 9 arcs from label file
SENT-END PHONE STEVE CALL STEVE PHONE YOUNG CALL YOUNG SENT-END == [1105 frames] -49.8568 [Ac=-55091.8 LM=0.0] (Act=12.8)
Aligning File: ../train/mfcc/sample8.mfc
Created lattice with 10 nodes / 9 arcs from label file
SENT-END PHONE PHONE STEVE STEVE CALL CALL YOUNG YOUNG SENT-END == [977 frames] -48.7927 [Ac=-47670.5 LM=0.0] (Act=9.9)
Aligning File: ../train/mfcc/sample9.mfc
Created lattice with 7 nodes / 6 arcs from label file
SENT-END MEASURE LEISURE AND LEISURE MEASURE SENT-END == [691 frames] -49.7223 [Ac=-34358.1 LM=0.0] (Act=8.5)
Aligning File: ../train/mfcc/sample10.mfc
Created lattice with 7 nodes / 6 arcs from label file
SENT-END COMPLAIN CHAMPLAIN AIRPLANE ELAINE EXPLAIN SENT-END == [716 frames] -51.6523 [Ac=-36983.1 LM=0.0] (Act=10.3)
Aligning File: ../train/mfcc/sample11.mfc
Created lattice with 7 nodes / 6 arcs from label file
SENT-END BOOKENDS KENNEL KENNETH KENYA WEEKEND SENT-END == [733 frames] -49.9992 [Ac=-36649.4 LM=0.0] (Act=12.6)
Aligning File: ../train/mfcc/sample12.mfc
Created lattice with 8 nodes / 7 arcs from label file
SENT-END BELT BELOW BEND AEROBIC DASHBOARD DATABASE SENT-END == [926 frames] -50.2476 [Ac=-46529.3 LM=0.0] (Act=10.7)
Aligning File: ../train/mfcc/sample13.mfc
Created lattice with 8 nodes / 7 arcs from label file
SENT-END GATEWAY GATORADE GAZEBO AFGHAN AGAINST AGATHA SENT-END == [917 frames] -51.8025 [Ac=-47502.9 LM=0.0] (Act=10.1)
Aligning File: ../train/mfcc/sample14.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END ABALON ABDOMINALS BODY ABOLISH SENT-END == [682 frames] -52.2922 [Ac=-35663.3 LM=0.0] (Act=9.2)
Aligning File: ../train/mfcc/sample15.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END ABOUNDING ABOUT ACCOUNT ALLENTOWN SENT-END == [695 frames] -50.1436 [Ac=-34849.8 LM=0.0] (Act=7.1)
Aligning File: ../train/mfcc/sample16.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END ACHIEVE ACTUAL ACUPUNCTURE ADVENTURE SENT-END == [750 frames] -51.5971 [Ac=-38697.8 LM=0.0] (Act=7.9)
Aligning File: ../train/mfcc/sample17.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END ALGORITHM ALTHOUGH ALTOGETHER ANOTHER SENT-END == [917 frames] -49.8120 [Ac=-45677.6 LM=0.0] (Act=6.2)
Aligning File: ../train/mfcc/sample18.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END BATTLE BEATLE LITTLE METAL SENT-END == [575 frames] -52.9104 [Ac=-30423.5 LM=0.0] (Act=9.9)
Aligning File: ../train/mfcc/sample19.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END BITTEN BLATANT BRIGHTEN BRITAIN SENT-END == [785 frames] -48.7686 [Ac=-38283.4 LM=0.0] (Act=12.4)
Aligning File: ../train/mfcc/sample20.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END BROOKHAVEN HOOD BROUHAHA BULLHEADS SENT-END == [862 frames] -48.3169 [Ac=-41649.2 LM=0.0] (Act=7.7)
Aligning File: ../train/mfcc/sample21.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END BUSBOYS CHOICE COILS COIN SENT-END == [661 frames] -52.4602 [Ac=-34676.2 LM=0.0] (Act=9.0)
Aligning File: ../train/mfcc/sample22.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END COLLECTION COLORATION COMBINATION COMMERCIAL SENT-END == [768 frames] -52.1477 [Ac=-40049.4 LM=0.0] (Act=8.7)
Aligning File: ../train/mfcc/sample23.mfc
Created lattice with 6 nodes / 5 arcs from label file
SENT-END MIDDLE NEEDLE POODLE SADDLE SENT-END == [609 frames] -50.6109 [Ac=-30822.1 LM=0.0] (Act=12.6)
Aligning File: ../train/mfcc/sample24.mfc
Created lattice with 8 nodes / 7 arcs from label file
SENT-END ALRIGHT ARTHRITIS BRIGHT COPYRIGHT CRITERIA RIGHT SENT-END == [973 frames] -53.5024 [Ac=-52057.9 LM=0.0] (Act=12.1)
Aligning File: ../train/mfcc/sample25.mfc
Created lattice with 5 nodes / 4 arcs from label file
SENT-END COUPLE CRADLE CRUMBLE SENT-END == [562 frames] -47.8244 [Ac=-26877.3 LM=0.0] (Act=7.3)
Aligning File: ../train/mfcc/sample26.mfc
Created lattice with 5 nodes / 4 arcs from label file
SENT-END CUBA CUBE CUMULATIVE SENT-END == [691 frames] -48.6433 [Ac=-33612.5 LM=0.0] (Act=8.1)
Aligning File: ../train/mfcc/sample27.mfc
Created lattice with 5 nodes / 4 arcs from label file
SENT-END CURING CURLING CYCLING SENT-END == [515 frames] -48.8570 [Ac=-25161.4 LM=0.0] (Act=8.0)
Aligning File: ../train/mfcc/sample28.mfc
Created lattice with 5 nodes / 4 arcs from label file
SENT-END CYNTHIA DANFORTH DEPTH SENT-END == [515 frames] -50.3170 [Ac=-25913.2 LM=0.0] (Act=9.2)
Aligning File: ../train/mfcc/sample29.mfc
Created lattice with 5 nodes / 4 arcs from label file
SENT-END DIGEST DIGITAL DILIGENT SENT-END == [558 frames] -54.1039 [Ac=-30190.0 LM=0.0] (Act=8.9)
Aligning File: ../train/mfcc/sample30.mfc
Created lattice with 7 nodes / 6 arcs from label file
SENT-END AMNESIA ASIA AVERSION BEIGE BEIJING SENT-END == [879 frames] -50.2145 [Ac=-44138.6 LM=0.0] (Act=9.0)
Aligning File: ../train/mfcc/sample31.mfc
Created lattice with 8 nodes / 7 arcs from label file
SENT-END HELP HELLO HELMET HELPLESS AHEAD HELP SENT-END == [1003 frames] -49.2337 [Ac=-49381.4 LM=0.0] (Act=16.2)

HTK Configuration Parameters[10]
Module/Tool Parameter Value
NUMCEPS 12
CEPLIFTER 22
NUMCHANS 26
PREEMCOEF 0.970000
USEHAMMING TRUE
WINDOWSIZE 250000.000000
SAVEWITHCRC TRUE
SAVECOMPRESSED TRUE
TARGETRATE 100000.000000
TARGETKIND MFCC_0_D_N_Z
 
Reply
Re: Where is the STEP 11
User: kmaclean
Date: 3/3/2008 9:49 am
Views: 43
Rating: 12    Rate [

+

]

Hi Royerfa,

>And If I don't have a result better than 90% of recognition on a laptop , I am

>not going to continue in the julius direction.

You are going to have to play with the following settings:

  • word insertion penalty
    • first pass (-penalty1)
    • second pass (-penalty2)
  • transition penalty (-iwsppenalty) (for short-term inter-word pauses between words)

Each acoustic model seems like it needs to be "tuned".  It's a bit of trial and error (for me at least ...).  There may be other parameters to tweak - you might also ask on the Julius forum.

See the results file included in one of the Nightly Builds files.  I've been getting better results with Julius than with HTK (only a "sanity test" using 50 prompts):

Julian 16kHz_16bit
------------------
  Parameters:
    word insertion penalty
      first pass (-penalty1):0.5
      second pass (-penalty2):100.0
    transition penalty (-iwsppenalty):-55.0 (for short-term inter-word pauses between words)
====================== Results Analysis =======================
  Date: Mon Mar  3 03:45:01 2008
  Ref : testref.mlf
  Rec : julianProcessed
------------------------ Overall Results --------------------------
SENT: %Correct=86.00 [H=43, S=7, N=50]
WORD: %Corr=96.83, Acc=96.30 [H=183, D=2, S=4, I=1, N=189]
=================================================================== 

You may have better luck in Sphinx (though I am not as familiar with it as I am with HTL/Julius), but I think you might need to do some acoustic model parameter tweaking there too.

This page on the Simon project website is interesting, it's really the only comparison that I have found between Julius and Sphinx:

Analysis of existing software (translated from German using Google translate - original page)

They chose Julius over Sphinx and some other commercial products (based on criteria for their specific application context). 

Ken 

 

Reply
Re: Where is the STEP 11
User: royerfa
Date: 3/4/2008 9:24 am
Views: 41
Rating: 10    Rate [

+

]

Hello,

 The acoustique model on the Nightly  Builds files works quite good with my voice.

Do you built your acoustique model with the same pronunciation balanced dictionnary as the tutorial ?

If I change my grammar, do I need to change the pronunciation dictionnary? 

Maybe a stupid Question, why is there a new daily acoustique  Model ?

 Regards,

Fabien

other thing : What do you think about the Vista speech recognition from Microsoft ?Surprised

 

Reply
Re: Where is the STEP 11
User: kmaclean
Date: 3/4/2008 12:10 pm
Views: 46
Rating: 9    Rate [

+

]

Hi Fabien ,

> The acoustique model on the Nightly  Builds files works quite good with my voice.

That's good to hear!  :) 

>Do you built your acoustique model with the same pronunciation balanced

>dictionnary as the tutorial ?

No, the pronunciation dictionary used in the Tutorial and How-to is based on the ISIP Switchboard corpus (contains around 27,500 words).  Whereas the  QuickStart and nightly AM builds is based on version 0.6 of the CMU Pronunciation Dictionary (contains around 130,000 words).   Unfortunately, the Switchboard and CMU pronunciation dictionaries use slightly different phoneme syntax.  This is enough to make them incompatible from a Grammar and Acoustic Model testing perspective (see ticket #52).

>If I change my grammar, do I need to change the pronunciation dictionnary? 

Yes.

Use the Voxforge Pronunciation Dictionary:

[   ] VoxForge.tgz            29-Feb-2008 18:02   2.6M  

>why is there a new daily acoustique  Model ?

As new speech is submitted by users, we build a brand new acoustic model (rather than just adapting).  We can do this because our speech corpus is still relatively small.  The scripts actually run as a nightly cron job, so they create a new AM daily, regardless of whether there is new speech or not.

>What do you think about the Vista speech recognition from Microsoft ?

I mostly use Linux (Fedora) for my desktop and servers.  I did purchased a new PC with Vista, but have not worked with it much.

Ken 

Reply
Off-topic: Vista speech recognition
User: ralfherzog
Date: 3/4/2008 7:26 pm
Views: 58
Rating: 8    Rate [

+

]
Hello Fabien,

"What do you think about the Vista speech recognition from Microsoft ?"

I have bought Windows Vista Ultimate (64-bit), because I wanted to test the Windows speech recognition in French and in Spanish.  But at the moment I prefer to use Dragon NaturallySpeaking 9.5 (German/English) under Windows XP (32-bit). I have the first impression that the Vista speech recognition for Spanish seems to be OK, but for the French language it didn't work out for me.  At the moment, I prefer to stick to NaturallySpeaking in German/English.  If I would get better results in Spanish/French, I would use Vista more often.

I didn't want to purchase the Spanish and the French version of NaturallySpeaking, so I had decided to give Vista Ultimate a try.  As far as I know, only the "Ultimate" version of Vista supports the recognition of several languages.

Greetings, Ralf
Reply
Re: Where is the STEP 11
User: DavidGelbart
Date: 3/6/2008 4:56 pm
Views: 33
Rating: 9    Rate [

+

]

"It seems strange that he recognise well the words but not the sentences."  I have never worked with sentence-level scoring, but my guess is that a sentence is only scored as correct if all the words in it are correct. 

"This time the frequency rate is of 48K, and the microphone is well configured. but there is sometimes some saturation on the training records, I need to change this." So all the data (training data and test data) was recorded at 48 kHz?  If there is a mix of different sampling rates in the data, then you'd probably need to downsample some of it to the lowest sampling rate so that all the audio has the same sampling rate. 

 

Reply
Re: Where is the STEP 11
User: DavidGelbart
Date: 3/6/2008 4:58 pm
Views: 106
Rating: 9    Rate [

+

]
Also, how many hours of training data do you have?  And how many different words are in your grammar?  If you have little training data, or a lot of different words which could be confused with each other, that will tend to lower your recognition accuracy.
Reply
Add