Click here to register.

Acoustic Model Discussions

Flat
New English Zamia-Speech Models released
User: guenter
Date: 6/23/2018 5:38 pm
Views: 559
Rating: 1

The latest 20180611 builds of the english models were trained on over 800 hours of training material (containing material with noise and phone codec effects added).

You can find download links to all our models and dicts here:

https://github.com/gooofy/zamia-speech#download

WER results for these models are not comparable to previous releases as we are measuring WERs for speakers not in the training set from now on and also tried to make the language model more neutral (i.e. not over-represent prompts in the training material) so the WER results should give a more realistic assessment of what performance one can expect from our models without adaptation.

WER for the large kaldi model is 7.02% for the large model and 7.84% for the embedded model.

WER for the continuous CMU Sphinx model is 25.4%.

We have also been quite busy cleaning up our scripts and documentation so it should become easier to understand what we are doing here. The models come complete with example scripts and pre-compiled binary packages for various platforms, more information on that can be found in our getting started guide here:

https://github.com/gooofy/zamia-speech#get-started-with-our-pre-trained-models

Please note that we have changed the tarball format of our models significantly so you will have to use the latest 0.3.1 py-kaldi-asr wrappers with these models. The new tarball format allows for model adaptation

https://github.com/gooofy/zamia-speech#model-adaptation

as well as automatic segmentation and transcript alignment of long audio recordings (e.g. librivox audiobooks):

https://github.com/gooofy/zamia-speech#audiobook-segmentation-and-transcription-kaldi

comments, suggestions and contributions are very welcome. For more information about the zamia-speech project, please visit http://zamia-speech.org/

--- (Edited on 6/23/2018 5:38 pm [GMT-0500] by guenter) ---

Re: New English Zamia-Speech Models released
User: guenter
Date: 7/2/2018 11:33 am
Views: 104
Rating: 1

I have just released an updated version of the Kaldi Models which comes with improved noise resistance and tokenizer bugfixes resulting in slightly better WERs:

https://github.com/gooofy/zamia-speech#download

%WER 6.97 [ 53104 / 761856, 3598 ins, 14296 del, 35210 sub ] exp/nnet3_chain/tdnn_sp/decode_test/wer_9_0.0
%WER 7.78 [ 59271 / 761856, 4323 ins, 14974 del, 39974 sub ] exp/nnet3_chain/tdnn_250/decode_test/wer_8_0.0

--- (Edited on 7/2/2018 11:34 am [GMT-0500] by guenter) ---

Re: New English Zamia-Speech Models released
User: guenter
Date: 8/17/2018 7:34 am
Views: 34
Rating: 0

The latest 20180815 Kaldi Models are trained on 1200 hours of recordings now that we have added the mozilla common voice v1 corpus material. Available for download in the usual places:

https://github.com/gooofy/zamia-speech#download

WERs are still good: 

%WER 8.03 [ 65993 / 821583, 4460 ins, 18032 del, 43501 sub ] exp/nnet3_chain/tdnn_sp/decode_test/wer_9_0.0
%WER 9.03 [ 74192 / 821583, 5394 ins, 19016 del, 49782 sub ] exp/nnet3_chain/tdnn_250/decode_test/wer_8_0.0

A slight increase was to be expected as the new training material has more diverse speakers and more noisy content which should contribute to real world unknown-speaker performance as well as noise resistance.

 

--- (Edited on 8/17/2018 7:34 am [GMT-0500] by guenter) ---

Re: New English Zamia-Speech Models released
User: guenter
Date: 9/1/2018 10:16 am
Views: 150
Rating: 2

A new Zamia-Speech Kaldi nnet3-chain model based on factorized TDNN is available for download now here:

https://github.com/gooofy/zamia-speech#download

the new model is trained on the same dataset as the models from the 20180815 release but offers slightly better performance:

%WER 8.03 [ 65993 / 821583, 4460 ins, 18032 del, 43501 sub ] exp/nnet3_chain/tdnn_sp/decode_test/wer_9_0.0
%WER 9.03 [ 74192 / 821583, 5394 ins, 19016 del, 49782 sub ] exp/nnet3_chain/tdnn_250/decode_test/wer_8_0.0
%WER 7.54 [ 61946 / 821583, 3834 ins, 17569 del, 40543 sub ] exp/nnet3_chain/tdnn_f/decode_test/wer_8_0.0

--- (Edited on 9/1/2018 10:16 am [GMT-0500] by guenter) ---

Next