VoxForge
Re: Packing of the audio files
OK, I've been looking at forced alignment issues listed in the PROBLEMS file of your latest Sphinx acoustic model...
The approach I have been taking in processing speech for the VoxForge corpus (so far) is to include a submission (English) in the corpus, unless it has really bad audio quality.
For those with marginal quality (some concerns, but not enough to exclude them), I include them in the corpus, and then remove them from the master_prompts files. I also manually note in the Readme for such a submission, under the "Quality" heading, the type of problem that might concern me (line noise, non-speech noise, line hiss, audio clipping,...).
For example, the "mjmm-20080526-hca" submission has Quality description that says: "extreme line noise", and for this reason I did not include it in the Master Prompts file. See this file for a list of all the submissions with forced alignment issues.
This differs from your approach to use the prompts files in each submission for the creation the new Sphinx acoustic model (and explains why I have not gotten around to fixing the prompts files in the submissions until now...).
Possible fix:
Each submission has a prompts-orginal file, which contains the prompts in their unaltered format, and a PROMPTS file, which has had some cleanup done, and all words are captialized.
A. Ignore all those submission with anything in their 'Quality' field.
B. add a "Clean_speech" field/Tag to all ReadMe files that have clean speech.
C. If a submission has problem prompts, then remove them from the PROMPTS file. That way, if someone needs noisy speech, they can still get a the prompt (in original-prompts), but any script would encounter a blank PROMPTS file, would just skip the contents and not include those prompts for the creation of acoustic models.
Which these (or any other approach), from your standpoint, should we use?
thanks,
Ken
--- (Edited on 2/19/2010 2:26 pm [GMT-0500] by kmaclean) ---