I am trying to adapt the german voxforge model (cmusphinx-de-voxforge-5.2.tar.gz with the appropriate lm and dictionary).
I have done this according to the guide on the cmu sphinx homepage (http://cmusphinx.sourceforge.net/wiki/tutorialadapt), using MLLR-transforming.
Then I tested the result using pocketsphinx_batch and word_align.pl.
Unfortunately, the detection rate has dropped significantly from 59% to 19%, which is why I am now looking for my fault.
I've done the following steps to adapt:
1. I created 30 german records (16kHz Mono) and created the related .fileids- and .transcription-file.
2. creating acoustic feature files:
sphinx_fe -argfile de-de/feat.params -samprate 16000 -c adapt30.fileids -di . -do . -ei wav -eo mfc -mswav yes
(I renamed the acoustic model directory to de-de)
3. Accumulating observation counts:
bw -hmmdir de-de -moddeffn de-de/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn voxforge.dic -ctlfn adapt30.fileids -lsnfn adapt30.transcription -accumdir .
(I renamed the dictionary to voxforge.dic)
mllr_solve -meanfn de-de/means -varfn de-de/variances -outmllrfn mllr_matrix -accumdir .
5. Update the means-file:
mllr_transform -inmeanfn de-de/means -outmeanfn de-de/means-new -mllrmat mllr_matrix
(and renamed means-new to means)
For testing i used the command:
pocketsphinx_batch -adcin yes -cepdir wav -cepext .wav -ctl test.fileids -lm voxforge.lm.bin -dict voxforge.dic -hmm de-de -hyp test.hyp
Can you tell me if something is wrong with this approach? I'm aware that the 30 recordings are not much, but according to my understanding, the recognition rate should not drop so much.
I would be grateful for every note.
Thanks in advance
I added an attachment with my training data and all generated files, with exception of the model.
TOTAL Words: 40 Correct: 27 Errors: 14
TOTAL Percent correct = 67.50% Error = 35.00% Accuracy = 65.00%
TOTAL Insertions: 1 Deletions: 0 Substitutions: 13
(see the attachment for the complete result)
I also created bigger test sets (around 100 words), but the accuracy is not getting better. So I would like to be able to estimate, if it's worth testing with a few thousand words.
I am grateful for any help :)
If you have more adaptation data you'd better use MAP adaptation, not MLLR adaptation.
I would also use smaller language model more specific for your application. With generic language model it is not going to work very accurately.