New 31k words 310 hours german models released

German

Flat

User: guenter
Date: 10/15/2017 8:06 pm

Views: 9370
Rating: 4

Just uploaded the latest 20171016 release of the german voxforge models to

http://goofy.zamia.org/voxforge/de/

besides the usual inclusion of all new voxforge submissions this release focuses on noise resistance.

I have added some 100 hours of noisy recordings (auto-generated by adding random background and foreground noise to existing recordings) and introduced a new "NSPC" noise phoneme and dictionary entry.

Also, I have further pruned to language model which greatly reduces the nnet3 model's memory footprint.

Please note that is is most lately the last release to feature Kaldi GMM models - nnet3 yields much better results so I don't think it makes sense to spend the compute time to produce the GMM models (drop me a line if you would like those continued).

With the additon of noisy recordings WER rates have degraded somewhat from previous releases (noisy recordings are much harder to decode, after all) - I hope you will still find this new model useful especially in noise/distant-microphone situations.

stats:

31207 lexicon entries.
total duration of all good submissions: 311:33:41
CMU Sphinx models:
cmusphinx cont model: SENTENCE ERROR: 45.2% (3969/8788)   WORD ERROR RATE: 10.4% (10338/99119)
cmusphinx ptm model: SENTENCE ERROR: 38.3% (3365/8788)   WORD ERROR RATE: 10.8% (10665/99119)
Kaldi models:
%WER 11.15 [ 10998 / 98672, 1880 ins, 2614 del, 6504 sub ] exp/tri3b/decode.si/wer_14_0.0
%WER 10.65 [ 10507 / 98672, 1026 ins, 4347 del, 5134 sub ] exp/tri3b_mmi/decode/wer_12_0.0
%WER 10.61 [ 10468 / 98672, 1575 ins, 3286 del, 5607 sub ] exp/tri2b/decode/wer_15_0.0
%WER 10.52 [ 10385 / 98672, 982 ins, 4346 del, 5057 sub ] exp/tri3b_mmi_b0.05/decode/wer_12_0.0
%WER 10.06 [ 9922 / 98672, 1402 ins, 3193 del, 5327 sub ] exp/tri3b_mpe/decode/wer_14_0.0
%WER 7.77 [ 7663 / 98672, 1087 ins, 2457 del, 4119 sub ] exp/tri2b_mpe/decode/wer_13_0.0
%WER 7.32 [ 7225 / 98672, 780 ins, 2688 del, 3757 sub ] exp/tri2b_mmi/decode/wer_12_0.0
%WER 7.32 [ 7221 / 98672, 829 ins, 2631 del, 3761 sub ] exp/tri2b_mmi_b0.05/decode/wer_11_0.0
%WER 7.13 [ 7038 / 98672, 1625 ins, 1737 del, 3676 sub ] exp/tri3b/decode/wer_15_0.0
%WER 3.72 [ 3673 / 98672, 719 ins, 1666 del, 1288 sub ] exp/nnet3/nnet_tdnn_a/decode/wer_11_0.0
sequitur g2p model:
    total: 3118 strings, 36657 symbols
    successfully translated: 3116 (99.94%) strings, 36635 (99.94%) symbols
        string errors:       1263 (40.53%)
        symbol errors:       2822 (7.70%)
            insertions:      980 (2.68%)
            deletions:       985 (2.69%)
            substitutions:   857 (2.34%)
    translation failed:      2 (0.06%) strings, 22 (0.06%) symbols
    total string errors:     1265 (40.57%)
    total symbol errors:     2844 (7.76%)

Re: New 31k words 310 hours german models released

User: guenter
Date: 11/9/2017 6:39 pm

Views: 80
Rating: 0

have been experimenting with kaldi 5.2 tdnn-chain models lately and they show very promising results. This is not a complete release, but I have uploaded my latest model called

kaldi-chain-voxforge-de-r20171109.tar.xz

to the usual model download server here:

http://goofy.zamia.org/voxforge/de/

test decode result for this model was:

%WER 1.18 [ 1174 / 99422, 188 ins, 373 del, 613 sub ] exp/nnet3_chain/tdnn_sp/decode_test/wer_9_0.0

also pretty impressive is the decoding speed: I measured 7.67s decode time for a 4.7s wave file on a raspberry pi 3 (!)

Please note that in order to use this model I recommend the latest kaldi-asr 5.2 plus py-kaldi-asr 0.2.0.

Re: New 31k words 310 hours german models released

User: guenter
Date: 11/13/2017 2:21 pm

Views: 344
Rating: 0

another quick update: by reducing the layer size to 250 I managed to create another kaldi nnet3 chain model that achieves near realtime performance on a raspberry pi 3:

[bofh@donald py-kaldi-asr]$ python examples/chain_incremental.py
tdnn_250 loading model...
tdnn_250 loading model... done, took 7.084181s.
tdnn_250 creating decoder...
tdnn_250 creating decoder... done, took 14.327128s.
decoding data/gsp1.wav...
 0.041s:  4000 frames ( 0.250s) decoded.
 0.319s:  8000 frames ( 0.500s) decoded.
 0.643s: 12000 frames ( 0.750s) decoded.
 0.864s: 16000 frames ( 1.000s) decoded.
 1.086s: 20000 frames ( 1.250s) decoded.
 1.312s: 24000 frames ( 1.500s) decoded.
 1.530s: 28000 frames ( 1.750s) decoded.
 1.760s: 32000 frames ( 2.000s) decoded.
 2.133s: 36000 frames ( 2.250s) decoded.
 2.387s: 40000 frames ( 2.500s) decoded.
 2.624s: 44000 frames ( 2.750s) decoded.
 2.840s: 48000 frames ( 3.000s) decoded.
 3.080s: 52000 frames ( 3.250s) decoded.
 3.449s: 56000 frames ( 3.500s) decoded.
 3.682s: 60000 frames ( 3.750s) decoded.
 3.939s: 64000 frames ( 4.000s) decoded.
 4.165s: 68000 frames ( 4.250s) decoded.
 4.375s: 72000 frames ( 4.500s) decoded.
 4.952s: 75200 frames ( 4.700s) decoded.
*****************************************************************
** data/gsp1.wav
** berlin gilt als weltstadt der kultur politik medien und wissenschaften
** tdnn_250 likelihood: 1.71563148499
*****************************************************************
tdnn_250 decoding took     4.96s
[bofh@donald py-kaldi-asr]$ uname -a
Linux donald 4.9.40-v7.1.el7 #1 SMP Tue Aug 8 14:03:02 UTC 2017 armv7l armv7l armv7l GNU/Linux

while WER still looks good:

%WER 1.18 [ 1174 / 99422, 188 ins, 373 del, 613 sub ] exp/nnet3_chain/tdnn_sp/decode_test/wer_9_0.0
%WER 1.57 [ 1563 / 99422, 250 ins, 446 del, 867 sub ] exp/nnet3_chain/tdnn_250/decode_test/wer_8_0.0

both models are available for download here:

http://goofy.zamia.org/voxforge/de/kaldi-chain-voxforge-de-r20171113.tar.xz

Re: New 31k words 310 hours german models released

User: guenter
Date: 12/17/2017 4:48 pm

Views: 206
Rating: 0

yet another update to this model: I have done another round of (auto-)reviews and added a few more words to the lexicon (most of them were needed for the german port of the zamia-ai project) so here is another complete release of the german CMU Sphinx and Kaldi ASR models:

http://goofy.zamia.org/voxforge/de/

stats:

31650 lexicon entries.
total duration of all good submissions: 314:06:07
CMU Sphinx Models:
cont model: SENTENCE ERROR: 48.7% (4311/8852)   WORD ERROR RATE: 11.4% (11419/100347)
ptm model: SENTENCE ERROR: 39.6% (3507/8852)   WORD ERROR RATE: 11.3% (11299/100347)
Kaldi ASR models:
%WER 1.12 [ 1127 / 100295, 212 ins, 292 del, 623 sub ] exp/nnet3_chain/tdnn_sp/decode_test/wer_8_0.0
%WER 1.45 [ 1452 / 100295, 229 ins, 459 del, 764 sub ] exp/nnet3_chain/tdnn_250/decode_test/wer_8_0.0
sequitur g2p model:
    total: 3165 strings, 36938 symbols
    successfully translated: 3164 (99.97%) strings, 36930 (99.98%) symbols
        string errors:       1316 (41.59%)
        symbol errors:       3094 (8.38%)
            insertions:      1076 (2.91%)
            deletions:       1000 (2.71%)
            substitutions:   1018 (2.76%)
    translation failed:      1 (0.03%) strings, 8 (0.02%) symbols
    total string errors:     1317 (41.61%)
    total symbol errors:     3102 (8.40%)

Re: New 31k words 310 hours german models released

User: mpuels
Date: 1/3/2018 11:10 am

Views: 105
Rating: 0

Hi Guenter,

these are awesome results! Would you mind sharing your Kaldi recipe to train a nnet3 based model? I managed to run `egs/voxforge/s5/run.sh`, but it only trains a GMM model. When trying to train a nnet3 model, I run into issues.

To train a nnet3 based model on VoxForge I copied `egs/wsj/s5/local/nnet3/run_tdnn.sh` (and its dependencies) to `egs/voxforge/s5/local/nnet3/run_tdnn.sh` and tried to execute it as follows:

 for stage in 0; do
    local/nnet3/run_tdnn.sh \
        --stage $stage \
        --nj $njobs \
        --train-set train \
        --test-sets test \
        --gmm tri3b
 done

I know that `stage` has to be iterated over, so the above for loop is just for debugging. The arguments `--train-set` and `--test-sets` are set according to the folder names in `egs/voxforge/s5/data` after running `egs/voxforge/s5/run.sh`.

But here is where I got stuck: In `egs/wsj/s5/local/nnet3/run_tdnn.sh`, `gmm` is set to `tri4b`, but after running `egs/voxforge/s5/run.sh` there is no `egs/voxforge/s5/exp/tri4b`, so I set it to `tri3b` (as you can see in the code snippet above), because `egs/voxforge/s5/exp/tri3b` exists after running `egs/voxforge/s5/run.sh`. Now, after running above code snippet, I eventually get the following error message:

local/nnet3/run_tdnn.sh: expected file exp/tri3b/graph_tgpr/HCLG.fst to exist

which corresponds to line 72 in egs/wsj/s5/local/nnet3/run_tdnn.sh.

So either I have to run a recipe that creates `egs/voxforge/s5/exp/tri4b` together with `graph_tgpr/HCLG.fst` or I have to modify my scripts to use `egs/voxforge/s5/exp/tri3b/graph/HCLG.fst` instead. Now, I don't know which recipe creates `egs/voxforge/s5/exp/tri4b` and I don't know if it has a negative impact on WER if I modify the scripts to use `egs/voxforge/s5/exp/tri3b/graph/HCLG.fst`.

Thanks in advance for your help!

Marc

Re: New 31k words 310 hours german models released

User: guenter
Date: 1/5/2018 7:46 am

Views: 28
Rating: 0

Hi Marc,

thanks for the positive feedback :)

I am using my own set of python scripts to build these models, everything is free and open source hosted on my github here:

https://github.com/gooofy/speech

to get started, I recommend you have a look at the "py-kaldi-export.py" script.

Re: New 31k words 310 hours german models released

User: mpuels
Date: 1/6/2018 7:43 am

Views: 3511
Rating: 0

Hi Guenter,

thank you very much for making your scripts public. I'll have a look at them.

Cheers,

Marc

Previous • Next •


Username	Password