Click here to register.

Step 9 - Making Triphones from Monophones

Background 

In the dict file you created in Step 2, the pronunciation of a word was given by a series of phonemes (also called monophones - i.e. a single phone).   To generate a triphone (i.e. a group of 3 phones) declaration from monophones, the "L" phone  (i.e. the left-hand phone) precedes "X" phone and the "R" phone (i.e. the right-hand phone) follows it.  The triphone is declared in the form "L-X+R". 

Below is an example of the conversion to a triphone declaration of the word "TRANSLATE" (the first line shows the "monophone" declaration, and the second line shows the "triphone" declaration):

TRANSLATE [TRANSLATE] t r @ n s l e t
TRANSLATE [TRANSLATE] t+r t-r+@ r-@+n @-n+s n-s+l s-l+e l-e+t e-t

We are therefore moving to an improved level of recognition accuracy.  So far, we have created a monophone Acoustic Model, which can be used with Julius.  But with such a model, we are not looking at the 'context' of the monophone.  The SRE is trying to match the sound that it has heard to a single phone - a  single sound.

With a triphone acoustic model, we are essentially looking for a monophone in the "context" other monophones - i.e. the one immediately before and the one immediately after (if they exist - it may be the beginning or end of the word).  This greatly improves recognition accuracy, because the SRE is looking to match a specific sequence of 3 sounds together (a triphone), rather than only one sound.  This is like using a 3 word Google search rather than a single word Google search - you get more accurate results.  Triphones reduce the possibility of error caused by confusing one sound with another, because we are now looking for a distinct sequence of 3 sounds. 

Note that some commercial systems use quintphones (5 phone groupings) in their recognition systems - but this requires a very large amounts of speech audio data.

Tutorial 

To convert the monophone transcriptions in the  aligned.mlfaligned.mlf file you created in Step 8 to an equivalent set of triphone transcriptions, you need to execute the HLEd command. 

First you need to create the mktri.led edit script:

WB sp
WB sil
TC
 

Then you execute the HLEd command as follows:

$HLEd -A -D -T 1 -n triphones1 -l '*' -i wintri.mlf mktri.led aligned.mlf

This creates 2 files:

Next, to create the mktri.hed file by executing the following script:

$perl ../HTK_scripts/maketrihed monophones1 triphones1

This creates the mktri.hedmktri.hed file.

Then create  3 more folders: hmm10-12

Then execute the HHEd command:

$HHEd -A -D -T 1 -H hmm9/macros -H hmm9/hmmdefs -M hmm10 mktri.hed monophones1 

The files created by this command are:

 

 Next run HERest 2 more times: 

$HERest  -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 triphones1

The files created by this command are:

 

$HERest  -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -s stats -S train.scp -H hmm11/macros -H hmm11/hmmdefs -M hmm12 triphones1

 

The files created by this command are:


Comments

Click the 'Add' link to add a comment to this page; click the 'Read More' link to view replies to a posted comment.

AddSearch

Problem during re-estimation
By Annie - 12/26/2007 - 1 Replies

Hi!

I have a problem during the re-estimation for 11 times after the triphones compilation has been succeded.

the command :

HERest -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 triphones1

and it gives me this error :

ERROR [+7321] CreateInsts : Unknown label B

I'll already followed and checked everything, but can't find the solution...please help me!

 Thank you;

Regards;

Annie