VoxForge
Hi Ralf,
I've created a dev site for German at:
http://www.dev.voxforge.org/projects/de (trac site)
http://www.dev.voxforge.org/svn/de (subversion site)
This is basically a Subversion site (used for software version control) with a Trac front-end. Trac is nice because it provides a simple to use wiki environment. I will send you password so you can log on and make changes (you don't actually need to log on to make changes, but then the wiki won't keep track of who made which changes; there are also some admin functions that require a log on).
With respect to creating a German version of the VoxForge site with something called: de.voxforge.org I need to think about how to structure this so that you could make the updates. WebGUI (the content management system front-end) is a little difficult to learn at first, but very powerful.
Once you have a few hours of audio (from different users), we can look at creating something like "de.voxforge.org".
all the best,
Ken
Great, nice to see such progress :)
A few thoughts about German. CMU people were going to share the framework and a models trained from Vermobile (large German database):
http://www.speech.cs.cmu.edu/sphinx/twiki/bin/view/Sphinx4/GermanAcousticModel
http://sourceforge.net/forum/message.php?msg_id=4279928
I suppose it will take years to make a decision for them :(
German dictionary is available here:
http://www.ims.uni-stuttgart.de/phonetik/synthesis/
under a restrictive license. But probably we can use it for bootstrapping. Under GPL we have only rules from espeak I suppose. The same situation as with Dutch.
Hello Ken McLean!
Thanks for adding a new section "German speech files." I am planning to create some text fragments as author, so that there are no copyright problems.
I found a German translation of the GPL license:
http://www.gnu.de/documents/gpl-3.0.de.html
It is version 3 of the license. Until now, I have used the version 2 of the GPL license. From now on, I am planning to use only the version 3. So my future submissions will be GPL version 3.
Please tell me if you want me to submit under the GPL version 2. I would prefer GPL version 3, because I think it provides a better protection of the open source principle.
So I will include in my German zip-files a German and an English version of the GPL version 3.
Maybe I will talk to the people from the Simon project, but I think at the moment I will stick to Voxforge. VoxForge is exactly the project I have been looking for and there couldn't be a better project. The programmers need free speech examples, I can try to support them by submitting some speech in the English and in the German language.
Thanks for creating a trac site and a subversion site for the German language. Thanks for sending me a password.
OK, you can think about something like "de.voxforge.org" - at the moment, it might be a little bit too early for something like that. But keep in mind: a lot of German people may understand the English language, but prefer to read their own mother language. I could help you with the translation, but let's wait some weeks or months. I am new to the VoxForge project, I don't know the details at the moment, so we shouldn't be too fast. I think the section "German speech files" is a pretty good start.
Hi nsh,
Thanks for the hyperlinks. I think that they should come to VoxForge, not we to them. I *know* that Voxforge does have the right concept. GPL is the way to go. Only a free license like GPL has a chance to compete against commercial products like DNS 9.
Hi Ralph,
>I would prefer GPL version 3, because I think it provides a better protection of the open source principle. So I will include in my German zip-files a German and an English version of the GPL version 3.
I agree. I just have not had a chance to implement GPL version 3 on the site.
>Maybe I will talk to the people from the Simon project, but I think at the moment I will stick to Voxforge. [...] The programmers need free speech examples, I can try to support them by submitting some speech in the English and in the German language.
That is ok. The Simon project is currently working at creating German triphone acoustic models. Their work will provide benefits to you and the German sub-project on VoxForge. And, as you say, you audio submissions will benefit them.
Ken
Hi evryone.
I am very interested in starting a project to develop Swedish open source speech recognition. We have quite a lot of tools and corpora, but unfortunately not much time to handle it at the moment. Is there a time plan for when there will be possibilities for people to donate their recordings etc in other languages?If I (for example) arrange a phone number etc and take care of the recordings and so on, would it then be possible to admit information and resources from these web pages for such a project? I am very interested in creating speaker recognition for Swedish as well.
Best regards
Jonas
Gothenburg University, Sweden
Hi Jonas,
>We have quite a lot of tools and corpora, but unfortunately not much time to handle it at the moment. Is there a time plan for when there will be possibilities for people to donate their recordings etc in other languages?
If you would like us to host your corpora on VoxForge, yes this can be done. If it is a large corpus, I can set up an FTP link for your to upload your speech, transcriptions, pronunciation dictionary, and tools. If it is not that large, I can set up a forum for Swedish, and you can upload files as time permits, and I can put them into the SVN repository.
>If I (for example) arrange a phone number etc and take care of the recordings and so on, would it then be possible to admit information and resources from these web pages for such a project?
Yes. Note that we currently have an automated speech collection script that works with Asterisk and submits the audio automatically to a VoxForge Forum - see the VoxForge IVR project (many thanks to trevarthan for developing this app). You could modify the prompts to suite your needs.
thanks,
Ken
Hi Timo,
>5. Some more administrative stuff: I am unable to edit the dev wiki. Do I need an extra account for that?
Yes.
Up until about 1-2 months ago, I had mod_security working perfectly to catch spammers on the Trac dev wiki, and allow users to post without signing in. But when I upgraded the distro on the server, I could not get it to work properly... :( I need to spend some more time on this.
I will send you an email with a password to allow you to update the wiki.
>6. What about a sub forum for the German language? This would improve
>both our communication as well as the visibility for this language's sub project.
Certainly, what did you have in mind?
I could add a separate section on the Forums Page called "International" or something like that. And have specific forums for each language we support.
I'd also like to use the proper labels for each language (I've been too English-centric up until now ...) - should the German forum be called the "Deutsch Forum"?
Ken
Hi Ken,
> I will send you an email with a password to allow you to update the wiki.
may this have slipped from your todo-list?
> > 6. What about a sub forum for the German language?
> Certainly, what did you have in mind?
Well, I think I have figured out that in order to be on par with Dutch/Italian/etc., we'd just have to post one level higher, instead of commenting in the other languages thread. But then again: It would be cooler to have separate forums for each language within the other language forum, so that we can have different threads for each language (which would automatically improve our visibility, as only new threads are shown in the recent posts section). Is it possible with the forum engine to have a hierarchical forum structure? Otherwise we could move "established" languages to the top level (probably with a common prefix so that they all stand next to each other).
I personally don't care, if it's "international" or "other" languages. I kind of thought though, that English is quite an international language by itself? As for the forum labels: I really prefer "German" over "Deutsch". We would only discourage people from other languages to read the contents of the other sub forums. I can probably learn a lot by reading through the italian forums. But only if I can read it.
Greetings from Berlin!
Timo
Hi Timo,
>> I will send you an email with a password to allow you to update the wiki.
>may this have slipped from your todo-list?
My mistake, I sent it to myself ... not to bright of me :) ... I'll resend
>Is it possible with the forum engine to have a hierarchical forum structure?
I don't think so ... not without some programming at least (to keep the counts, etc).
>Otherwise we could move "established" languages to the top level (probably with a common prefix so that they all stand next to each other).
Take a look at the second "message board" I just created on the forum web page ... see if that looks OK.
I can move the contents of the current "other languages" forum to the new message board (note we will lose the views, rating and thread counts with such a move).
or we can look at integrating them into the main forum (WebGUI allows arbitrary ordering of the forums on a message board, so no prefix would be required)
Ken