Step 9 - Making Triphones from Monophones

Background 

triphones

In the dict file you created in Step 2, the pronunciation of a word was given by a series of phonemes (also called monophones - i.e. a single phone).   To generate a triphone (i.e. a group of 3 phones) declaration from monophones, the "L" phone  (i.e. the left-hand phone) precedes "X" phone and the "R" phone (i.e. the right-hand phone) follows it.  The triphone is declared in the form "L-X+R". 

Below is an example of the conversion to a triphone declaration of the word "TRANSLATE" (the first line shows the "monophone" declaration, and the second line shows the "triphone" declaration):

TRANSLATE [TRANSLATE] t r ae n z l ey t
TRANSLATE [TRANSLATE] t+r t-r+ae r-ae+n ae-n+z n-z+l z-l+ey l-ey+t ey-t

(Note that we may also get biphones (i.e. a group of 2 phones) at the beginning and end of the word.)

We are therefore moving to an improved level of recognition accuracy.  So far, we have created a monophone Acoustic Model, which can be used with Julius.  But with such a model, we are not looking at the 'context' of the monophone.  The SRE is trying to match the sound that it has heard to a single phone - a  single sound.

With a triphone acoustic model, we are essentially looking for a monophone in the "context" other monophones - i.e. the one immediately before and the one immediately after (if they exist - it may be the beginning or end of the word).  This greatly improves recognition accuracy, because the SRE is looking to match a specific sequence of 3 sounds together (a triphone), rather than only one sound.  This is like using a 3 word Google search rather than a single word Google search - you get more accurate results.  Triphones reduce the possibility of error caused by confusing one sound with another, because we are now looking for a distinct sequence of 3 sounds.

states

Up until now, we have glossed over what hidden markov models (hmm) are by saying that they are esssentially statistical representations of the phones that make up a word.  But an hmm is made up of many 'states', and these states can be shared (in the same way that the sp and sil phones now share their centre 'state' after step 7). These clustered or 'tied' states are sometimes called senones.

It does not make sense to share states between monophones, because they are so different.  Otherwise, why define the monophone? The point is that you want different sounds to be modelled separately, so the speech recognition engine can tell them apart. 

However, when you start looking at triphones, each with its own hmm definition, you start getting multiple instances of triphones with states that are similar enough that the data can be shared among a group of triphones. This sharing process is called 'tying'.  Therefore, we can 'tie' the states of many triphone hmms so that they share the same set of parameters.  This way, when we reestimate these new tied parameters, the data from each of the original untied parameters is pooled so that a better estimate can be obtained. 

Basically, we don't have enough speech data to model all possible triphone combinations contained in the words of our training set, so we 'cheat' and share parts of the data amongst similar triphones to improve recognition.

Tutorial 

To convert the monophone transcriptions in the aligned.mlf file you created in Step 8 to an equivalent set of triphone transcriptions, you need to execute the HLEd command.  HLEd can be used to generate a list of all triphones for which there is at least one example in the training data.

First you need to create the mktri.led edit script:

WB sp
WB sil
TC
 

Then you execute the HLEd (label file editor) command as follows:

Linux:

HLEd -A -D -T 1 -n triphones1 -l '*' -i wintri.mlf mktri.led aligned.mlf

Windows:

HLEd -A -D -T 1 -n triphones1 -l * -i wintri.mlf mktri.led aligned.mlf

This creates 2 files:

Next, download the Julia script mktrihed.jl to your 'voxforge/bin' folder, then create the mktri.hed file by executing:

julia ../bin/mktrihed.jl monophones1 triphones1 mktri.hed

This creates the mktri.hed file. This file contains a clone command 'CL' followed by a series of 'TI' commands to 'tie' HMMs so that they share the same set of parameters.  This way, when we reestimate these new tied parameters (with HRest below) the data from each of the original untied parameters is pooled so that a better estimate can be obtained.

Then create  3 more folders: hmm10-12

Next, execute the HHEd command:

(HHEd is the HTK hmm definition editor and is mainly used for applying 'tyings' across selected HMM parameters.)

HHEd -A -D -T 1 -H hmm9/macros -H hmm9/hmmdefs -M hmm10 mktri.hed monophones1 

The files created by this command are:

 Next run HERest 2 more times: 

HERest  -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 triphones1

You will also get lots of warnings (-2331).  These are occuring because we don't have much training data.  These can be safely ignored for this tutorial.

The files created by this command are:

HERest  -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -s stats -S train.scp -H hmm11/macros -H hmm11/hmmdefs -M hmm12 triphones1

 

The files created by this command are:

Comments

By uzma perveen - 12/1/2018 - 1 Replies

By mrageshrajan - 12/30/2016 - 4 Replies This happened when I ran the following command:

By lubingwu88 - 7/23/2013 Hi,

By Crunkrock - 3/22/2013 - 1 Replies I get

By adoh - 11/10/2012 Hi,

By tt - 2/25/2012 - 2 Replies on this section i try to run perl maketrihed monophones1 triphones1.

By Babak - 7/29/2011 - 1 Replies

By swbluto - 9/9/2010 - 2 Replies I think I already posted a thread with the same issue in the auto section, but it looks like it's gone. Anyways, when I run

By Aswin Juari - 4/10/2009 - 2 Replies Hello,

By Moe - 3/16/2009 - 3 Replies Hi,

By Annie - 12/26/2007 - 1 Replies Hi! I have a problem during the re-estimation for 11 times after the triphones compilation has been succeded. the command : HERest -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 triphones1 and it gives me this error : ERROR [+7321] CreateInsts : Unknown label B I'll already followed and checked everything, but can't find the solution...please help me! Thank you; Regards; Annie

By Manuel - 9/11/2007 - 5 Replies At the point to create hmm11: HERest -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 triphones1 It give me many warning like this: WARNING [-2331] UpdateModels: t+ae[1] copied: only 1 egs in HERest WARNING [-2331] UpdateModels: t+ay[3] copied: only 2 egs in HERest .... It's a problem, or it's all ok? Because I reach the end of tutorial to create my personal acoustic model, but when I try to use it with Julian it give me some errors: Reading in dictionary... line 3: triphone "*-f+ow" or biphone "f+ow" not found line 3: triphone "f-ow+n" not found > 2 [PHONE] f ow n line 4: triphone "*-k+ao" or biphone "k+ao" not found line 4: triphone "k-ao+l" not found > 2 [CALL] k ao l line 5: triphone "d-ay+ax" not found > 3 [DIAL] d ay ax l line 6: triphone "t-iy+v" not found > 4 [STEVE] s t iy v line 8: triphone "b-aa+b" not found > 4 [BOB] b aa b line 9: triphone "*-jh+aa" or biphone "jh+aa" not found line 9: triphone "jh-aa+n" not found line 9: triphone "aa-n+s" not found > 4 [JOHNSTON] jh aa n s t ax n line 10: triphone "*-jh+aa" or biphone "jh+aa" not found line 10: triphone "jh-aa+n" not found > 4 [JOHN] jh aa n line 11: triphone "*-jh+ao" or biphone "jh+ao" not found line 11: triphone "jh-ao+r" not found line 11: triphone "r-d+ax" not found > 4 [JORDAN] jh ao r d ax n line 13: triphone "f-ay+v" not found > 5 [FIVE] f ay v line 15: triphone "n-ay+n" not found > 5 [NINE] n ay n line 21: triphone "th-r+iy" not found > 5 [THREE] th r iy line 23: triphone "z-ih+r" not found line 23: triphone "ih-r+ow" not found > 5 [ZERO] z ih r ow ////// Missing phones: *-f+ow or biphone f+ow *-jh+aa or biphone jh+aa *-jh+ao or biphone jh+ao *-k+ao or biphone k+ao aa-n+s b-aa+b d-ay+ax f-ay+v f-ow+n ih-r+ow jh-aa+n jh-ao+r k-ao+l n-ay+n r-d+ax t-iy+v th-r+iy z-ih+r ////////////////////// error in reading sample.dict: 12 words failed out of 23 words ERROR: failed to read dictionary, terminated If I try to use monophones model It start but all the recognizes are wrong Tks Manuel