Acoustic model testing
I have downloaded the corpus and trained decision tree clustered triphones with it. Now I would like to evaluate it's performance. Did anyone do this before me (I would like to compare my results so that I could see possible mistakes in my training process)? Is there something like a standardized list of test files?
What I did is that I randomly taken out approx 1600 recordings from the corpus and used it as testing data. At the moment I do not have any language model so I used uniform distribution for transition between words probabilities (+ word transition penalty).
With the complete vocabulary supplied with VoxForge (130k words) the results are the following (the scoring process was similar to the one used in HResults):
%Corr=33,14 Acc=29,80 H=6640 D=4306 S=9093 I=668 N=20039
If I restrict the vocabulary to only those words that can be found in the VoxForge prompts (approx 14k words) the results are a bit better:
%Corr=42,62 Acc=38,52 H=8540 D=4317 S=7182 I=821 N=20039
Since I did not work with a corpus of this size before (most of my previous ASR experiments were done with much smaller corpora and grammar based tasks) I cannot tell whether these results are good or bad. I am not even sure whether it is a good idea to test acoustic models separate from language models.
What I would like to do next is to try to incorporate some kind of language model into it. I think even simple unigram LM might help because if I look at the results they often consist of very rare (often non-english) words (e.g. HOW DOES YOUR WAGER LOOK NOW gets recognized as OO HOUT DARES WAAG GOLOB NURRE). Regarding that I would like to ask from which text were the prompts in voxforge generated. I am guessing that text should be used in LM training, I do not think (but maybe I am wrong) that a LM trained from completelly different source would perform very well.
--- (Edited on 2/10/2009 7:50 am [GMT-0600] by tpavelka) ---