Click here to register.

German

Flat
New 375k word 260 hours german models available
User: guenter
Date: 6/23/2018 5:24 pm
Views: 57
Rating: 1

The latest 20180611 builds of the german models were trained on 260 hours of training material and thanks to IPA extraction from german wiktionary cover a dictionary of more than 375,000 entries now.

You can find download links to all our models and dicts here:

https://github.com/gooofy/zamia-speech#download

WER results for these models are not comparable to previous releases as we are measuring WERs for speakers not in the training set from now on and also tried to make the language model more neutral (i.e. not over-represent prompts in the training material) so the WER results should give a more realistic assessment of what performance one can expect from our models without adaptation.

WER for the large kaldi model is 6.23% for the large model and 7.49% for the embedded model.

WER for the continuous CMU Sphinx model is 29%.

We have also been quite busy cleaning up our scripts and documentation so it should become easier to understand what we are doing here. The models come complete with example scripts and pre-compiled binary packages for various platforms, more information on that can be found in our getting started guide here:

https://github.com/gooofy/zamia-speech#get-started-with-our-pre-trained-models

Please note that we have changed the tarball format of our models significantly so you will have to use the latest 0.3.1 py-kaldi-asr wrappers with these models. The new tarball format allows for model adaptation

https://github.com/gooofy/zamia-speech#model-adaptation

as well as automatic segmentation and transcript alignment of long audio recordings (e.g. librivox audiobooks):

https://github.com/gooofy/zamia-speech#audiobook-segmentation-and-transcription-kaldi

comments, suggestions and contributions are very welcome. For more information about the zamia-speech project, please visit http://zamia-speech.org/

 

Next