In the dict file you created in Step 2, the pronunciation of a word was given by a series of phonemes (also called monophones - i.e. a single phone). To generate a triphone (i.e. a group of 3 phones) declaration from monophones, the "L" phone (i.e. the left-hand phone) precedes "X" phone and the "R" phone (i.e. the right-hand phone) follows it. The triphone is declared in the form "L-X+R".
Below is an example of the
conversion to a triphone declaration of the word "TRANSLATE" (the first
line shows the "monophone" declaration, and the second line shows the
"triphone" declaration):
|
TRANSLATE [TRANSLATE] t r @ n s l e t |
We
are therefore moving to an improved level of recognition
accuracy. So far, we have created a monophone Acoustic Model,
which can be used with Julius. But with such a model, we are not
looking at the 'context' of the monophone. The SRE is trying to
match the sound that it has heard to a single phone - a single
sound.
With a triphone acoustic model, we are essentially looking for a monophone in the "context" other monophones - i.e. the one immediately before and the one immediately after (if they exist - it may be the beginning or end of the word). This greatly improves recognition accuracy, because the SRE is looking to match a specific sequence of 3 sounds together (a triphone), rather than only one sound. This is like using a 3 word Google search rather than a single word Google search - you get more accurate results. Triphones reduce the possibility of error caused by confusing one sound with another, because we are now looking for a distinct sequence of 3 sounds.
| Note that some commercial systems use quintphones (5 phone groupings) in their recognition systems - but this requires a very large amounts of speech audio data. |
To convert the monophone transcriptions in the
aligned.mlf
file you created in Step 8 to an
equivalent set of triphone transcriptions, you need to execute the HLEd
command.
First you need to create the mktri.led edit script:
| WB sp WB sil TC |
Then you execute the HLEd command as follows:
| $HLEd -A -D -T 1 -n triphones1 -l '*' -i wintri.mlf mktri.led aligned.mlf |
This creates 2 files:
Next, to create the mktri.hed file by executing the following script:
| $perl ../HTK_scripts/maketrihed monophones1 triphones1 |
This
creates the
mktri.hed
file.
Then create 3 more folders: hmm10-12
Then execute the HHEd command:
| $HHEd -A -D -T 1 -H hmm9/macros -H hmm9/hmmdefs -M hmm10 mktri.hed monophones1 |
The files created by this command are:
Next run HERest 2 more times:
| $HERest -A -D -T 1 -C config -I wintri.mlf -t 250.0
150.0 3000.0 -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11
triphones1 |
The files created by this command are:
| $HERest -A -D -T 1 -C config -I wintri.mlf -t 250.0
150.0 3000.0 -s stats -S train.scp -H hmm11/macros -H hmm11/hmmdefs -M
hmm12 triphones1 |
The files created by this command are: