Step 4 - Realigning the Training Data

Create Training Script 

Next you need to create a new file in your adapt directory called 'adapt.scp'.  This file will tell HTK where all your .mfc files (i.e. the speech audio files you converted into HTK '.mfc' format) are located - your file should look like this:

adapt.scpadapt.scp

Forced Alignment

To minimize the problem of multiple pronunciations, you now use HVite (HTK tool) to perform a 'forced alignement' of the phone level transcription of the adaptation data.   

Execute the HVite command as follows (remember to change the path to the location of your HTK 3.2.1 install):

$/home/yourusername/htk-3.2.1/bin.linux/HVite -A -D -T 1 -l '*' -o SWT -b SENT-END -C config -H macros -H hmmdefs -i adaptPhones.mlf -m -t 250.0 150.0 1000.0 -y lab -a -I adaptWords.mlf -S adapt.scp dict tiedlist

This creates the adaptPhones.mlfadaptPhones.mlf file.

Review the log output of the HVite command very carefully.  Catching errors here will save a lot of headache later on.   Because seemingly minor problems at this step sometimes show up as major errors at later steps, and they are very difficult to trace back to here.  Here is the log output from the above noted command: HVite_logHVite_log.  It is time well spent to review the log to make sure that HVite recognized all the words for each line in your prompts file.