In the dict file you created in Step 2, the pronunciation of a
word was given by a series of
phonemes (also called monophones - i.e. a single phone). To
generate a triphone (i.e. a group of 3 phones) declaration from
monophones,
the "L" phone (i.e. the left-hand phone) precedes "X" phone and
the "R" phone (i.e. the right-hand phone) follows it. The
triphone is
declared in the form "L-X+R".
Below is an example of the
conversion to a triphone declaration of the word "TRANSLATE" (the first
line shows the "monophone" declaration, and the second line shows the
"triphone" declaration):
TRANSLATE [TRANSLATE] t r @ n s l e t
TRANSLATE [TRANSLATE] t+r t-r+@ r-@+n @-n+s n-s+l s-l+e l-e+t e-t
We
are therefore moving to an improved level of recognition
accuracy. So far, we have created a monophone Acoustic Model,
which can be used with Julius. But with such a model, we are not
looking at the 'context' of the monophone. The SRE is trying to
match the sound that it has heard to a single phone - a single
sound.
With a triphone acoustic model, we are essentially looking for
a monophone in the "context" other monophones - i.e. the one
immediately before and the one immediately after (if they exist - it
may be the beginning or end of the word). This greatly improves
recognition accuracy, because the SRE is looking to match a specific
sequence of 3 sounds together (a triphone), rather than only one
sound. This is like using a 3 word Google search rather than a
single word Google search - you get more accurate results.
Triphones reduce the possibility of error caused by confusing one sound
with another, because we are now looking for a distinct sequence of 3
sounds.
Note that some commercial systems use quintphones
(5 phone groupings) in their recognition systems - but this requires a
very large amounts of speech audio data.
Tutorial
To convert the monophone transcriptions in the
aligned.mlf
file you created in Step 8 to an
equivalent set of triphone transcriptions, you need to execute the HLEd
command.
First you need to create the mktri.led edit script: