HTK calls this last step in data preparation the "parameterizing the raw speech waveforms into sequences of feature vectors". All this means is that HTK is not as efficient in processing wav files as it is with its internal format. Therefore, you need to convert you audio wav files to another format called MFCC format (which refers to Mel Frequency Cepstral Coefficients; which are more generally referred to as 'feature vectors').
You
use the HCopy tool to convert your wav files to MFCC format. You
have 2 options. You could execute the HCopy command by hand for
each audio file you created in Step 3, or you can create a file
containing a list of each source audio file and the name of the MFCC
file it will be converted to, and use that file as a parameter to the
HCopy command. We will use the second approach in this
example. Create the following 'codetrain.scp' script file in your
'voxforge/manual' folder:
codetrain.scp
The HCopy command performs the conversion from wav format to MFCC. To do this, a configuration file (config) which specifies all the needed conversion parameters is required. Create a file called wav_config in your 'voxforge/manual' folder and add the following:
SOURCEFORMAT = WAV |
If you would like more details on the contents of the config file, please see the HTK documentation.
Create a new directory called 'mfcc' in your 'voxforge/train' folder. Then execute HCopy from your 'voxforge/manual' folder as follows:
| $HCopy -A -D -T 1 -C wav_config -S codetrain.scp |
The result is the creation of a series of mfc files corresponding to the files listed in your codetrain.scp script in the "voxforge/train/mfcc" folder. Be sure to monitor the output of the HCopy command to ensure that all wav files get processed properly. Most problems are related to file paths or audio files in a non-wav format.