VoxForge
First you need to create an HVite configuration file called "wav_config" containing the following:
| SOURCEFORMAT = WAV TARGETKIND = MFCC_D_N_Z_0 TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 |
Next create a training script called "train.scp" with the name(s) of the audio files you will be using for forced alignment, something like this:
| downsampled.wav |
Next, get the most current version of the 16khz-16bit VoxForge Acoustic Models (you can use the current stable release, or one of the nightly builds). Copy the following files to your directory:
- macros
- hmmdefs
- tiedlist
Run the HVite command as follows:
$HVite -A -D -T 1 -l '*' -a -b SENT-END -m -C wav_config -H macros -H hmmdefs -m -t 250.0 150.0 1000.0 -I words.mlf -i aligned.out -S train.scp dict tiedlist
This creates a file called aligned.out containing all the words from your words.mlf file, with time alignments. The output from the HVite command is here.
| Note: different acoustic models may produce slightly different forced alignment results (i.e. the better the Acoustic Model, the more accurate the forced aligments). |