General Discussion

Nested
Re: German language
User: kmaclean
Date: 8/15/2007 3:47 pm
Views: 488
Rating: 28

Hi Ralf,

I've created a dev site for German at:

http://www.dev.voxforge.org/projects/de (trac site) 

http://www.dev.voxforge.org/svn/de (subversion site) 

This is basically a Subversion site (used for software version control) with a Trac front-end.  Trac is nice because it provides a simple to use wiki environment.  I will send you password so you can log on and make changes (you don't actually need to log on to make changes, but then the wiki won't keep track of who made which changes; there are also some admin functions that require a log on).

With respect to creating a German version of the VoxForge site with something called: de.voxforge.org I need to think about how to structure this so that you could make the updates.  WebGUI (the content management system front-end) is a little difficult to learn at first, but very powerful. 

Once you have a few hours of audio (from different users), we can look at creating something like "de.voxforge.org".

all the best, 

Ken 

 

Re: German language
User: nsh
Date: 8/15/2007 4:12 pm
Views: 3172
Rating: 728

Great, nice to see such progress :)

A few thoughts about German.  CMU people were going to share the framework and a models trained from Vermobile (large German database):

http://www.speech.cs.cmu.edu/sphinx/twiki/bin/view/Sphinx4/GermanAcousticModel

http://sourceforge.net/forum/message.php?msg_id=4279928

I suppose it will take years to make a decision for them :( 

German dictionary is available here:

http://www.ims.uni-stuttgart.de/phonetik/synthesis/

under a restrictive license. But probably we can use it for bootstrapping. Under GPL we have only rules from espeak I suppose. The same situation as with Dutch.

German; GPL v3; de.voxforge.org
User: ralfherzog
Date: 8/16/2007 5:25 am
Views: 380
Rating: 38

Hello Ken McLean!

Thanks for adding a new section "German speech files."  I am planning to create some text fragments as author, so that there are no copyright problems.

I found a German translation of the GPL license:

http://www.gnu.de/documents/gpl-3.0.de.html

It is version 3 of the license.  Until now, I have used the version 2 of the GPL license.  From now on, I am planning to use only the version 3.  So my future submissions will be GPL version 3.  
Please tell me if you want me to submit under the GPL version 2.  I would prefer GPL version 3, because I think it provides a better protection of the open source principle.
So I will include in my German zip-files a German and an English version of the GPL version 3.

Maybe I will talk to the people from the Simon project, but I think at the moment I will stick to Voxforge. VoxForge is exactly the project I have been looking for and there couldn't be a better project.  The programmers need free speech examples, I can try to support them by submitting some speech in the English and in the German language.

Thanks for creating a trac site and a subversion site for the German language.  Thanks for sending me a password. Cool

OK, you can think about something like "de.voxforge.org" - at the moment, it might be a little bit too early for something like that.  But keep in mind: a lot of German people may understand the English language, but prefer to read their own mother language.  I could help you with the translation, but let's wait some weeks or months.  I am new to the VoxForge project, I don't know the details at the moment, so we shouldn't be too fast.  I think the section "German speech files" is a pretty good start.

Hi nsh,

Thanks for the hyperlinks.  I think that they should come to VoxForge, not we to them.  I *know* that Voxforge does have the right concept.  GPL is the way to go. Only a free license like GPL has a chance to compete against commercial products like DNS 9.

Re: German; GPL v3; de.voxforge.org
User: kmaclean
Date: 8/16/2007 7:09 pm
Views: 454
Rating: 53

Hi Ralph, 

>I would prefer GPL version 3, because I think it provides a better protection of the open source principle.  So I will include in my German zip-files a German and an English version of the GPL version 3.

I agree.  I just have not had a chance to implement GPL version 3 on the site.

>Maybe I will talk to the people from the Simon project, but I think at the moment I will stick to Voxforge. [...] The programmers need free speech examples, I can try to support them by submitting some speech in the English and in the German language.

That is ok.  The Simon project is currently working at creating German triphone acoustic models.  Their work will provide benefits to you and the German sub-project on VoxForge.  And, as you say, you audio submissions will benefit them.

Ken 

Re: Other languages
User: bunte
Date: 9/28/2007 8:09 am
Views: 429
Rating: 34

Hi evryone.

I am very interested in starting a project to develop Swedish open source speech recognition. We have quite a lot of tools and corpora, but unfortunately not much time to handle it at the moment. Is there a time plan for when there will be possibilities for people to  donate their recordings etc in other languages?If I (for example) arrange a phone number etc and take care of the recordings and so on, would it then be possible to admit information and resources from these web pages for such a project? I am very interested in creating speaker recognition for Swedish as well.

Best regards

Jonas

Gothenburg University, Sweden 

Re: Other languages
User: kmaclean
Date: 9/28/2007 9:25 am
Views: 568
Rating: 26

Hi Jonas,

>We have quite a lot of tools and corpora, but unfortunately not much time to handle it at the moment. Is there a time plan for when there will be possibilities for people to  donate their recordings etc in other languages?

If you would like us to host your corpora on VoxForge, yes this can be done.  If it is a large corpus, I can set up an FTP link for your to upload your speech, transcriptions, pronunciation dictionary, and tools.  If it is not that large, I can set up a forum for Swedish, and you can upload files as time permits, and I can put them into the SVN repository.

>If I (for example) arrange a phone number etc and take care of the recordings and so on, would it then be possible to admit information and resources from these web pages for such a project?

Yes.  Note that we currently have an automated speech collection script that works with Asterisk and submits the audio automatically to a VoxForge Forum - see the VoxForge IVR project (many thanks to trevarthan for developing this app).  You could modify the prompts to suite your needs. 

thanks,

Ken 

Re: German language
User: timobaumann
Date: 11/21/2007 2:08 pm
Views: 348
Rating: 27
Hi und Hallo everybody!

as you seem to be talking about me (among others), I might just as well answer :-)

I've only started to explore this project a few days ago and am quite enthusiastic about it. I'd like to join the struggle for truly free acoustical models. The absence of *any* freely available models for most languages has been annoying me for a while.

Well, let's get more technical after this initial statement:

1. I've recently built acoustical models from the Kiel Corpus of Read [sic!] Speech, which work allright for me. I'll make them available to anyone who asks. The model is trained using the CMU SphinxTrainer. I'm unably to support HTK for the moment and unfortunately there is no way for me to share the original KCoRS data.

2. I've put my plans for a model based on the Verbmobil (VM) corpus on hold for the moment. I just lack the time to do the necessary perl voodoo and also the processor cycles. The latter will hopefully be resolved in early 2008.

3. We definitely lack a GPLed lexicon. There is the BOMP (the link in the dev section should probably point to http://www.ikp.uni-bonn.de/dt/forsch/phonetik/bomp/BOMP.en.html ), but it's limited to non-military research, thus more restrictive than GPL. The Verbmobil license is even more restrictive.

I don't know if the grapheme to phoneme conversion of the German version of Festival can be of any help to create a lexicon. (It would be cheated to use the German Festival as it is, because it contains BOMP and we would still violate its terms.) I've never used eSpeak, how does its G2P conversion compare to GFestival?

It might be an idea to just ask the author of BOMP, Stefan Breuer, if he would license BOMP under the GPL or if he could allow its use in the voxforge project.

The lexicon is the key issue in building models from the data, so we really have to find a solution, if we ever want to take off.

For the time being we should probably limit our prompt collection to a restricted vocabulary in order to limit the work needed to manually build a preliminary dictionary.

4. I've recorded a tiny digits corpus the other night (50 utterances totalling 191 digits :-). I'll upload that right after finishing this post. I'll add the perl script that I used to create the script, go ahead and record a few digit strings yourself!

5. Some more administrative stuff: I am unable to edit the dev wiki. Do I need an extra account for that?

6. What about a sub forum for the German language? This would improve both our communication as well as the visibility for this language's sub project.

Thanks for reading und Grüße aus Berlin!
Timo
Re: German language
User: kmaclean
Date: 11/21/2007 9:41 pm
Views: 418
Rating: 31

Hi Timo,

>5. Some more administrative stuff: I am unable to edit the dev wiki. Do I need an extra account for that?

Yes. 

Up until about 1-2 months ago, I had mod_security working perfectly to catch spammers on the Trac dev wiki, and allow users to post without signing in.  But when I upgraded the distro on the server, I could not get it to work properly... :(  I need to spend some more time on this.

I will send you an email with a password to allow you to update the wiki. 

>6. What about a sub forum for the German language? This would improve

>both our communication as well as the visibility for this language's sub project.

Certainly, what did you have in mind?

I could add a separate section on the Forums Page called "International" or something like that.  And have specific forums for each language we support.

I'd also like to use the proper labels for each language (I've been too English-centric up until now ...) - should the German forum be called the "Deutsch Forum"?

Ken

Re: German language
User: timobaumann
Date: 11/25/2007 2:28 am
Views: 364
Rating: 30

Hi Ken, 

> I will send you an email with a password to allow you to update the wiki.

may this have slipped from your todo-list?

> > 6. What about a sub forum for the German language?

> Certainly, what did you have in mind?

Well, I think I have figured out that in order to be on par with Dutch/Italian/etc., we'd just have to post one level higher, instead of commenting in the other languages thread. But then again: It would be cooler to have separate forums for each language within the other language forum, so that we can have different threads for each language (which would automatically improve our visibility, as only new threads are shown in the recent posts section). Is it possible with the forum engine to have a hierarchical forum structure? Otherwise we could move "established" languages to the top level (probably with a common prefix so that they all stand next to each other).

I personally don't care, if it's "international" or "other" languages. I kind of thought though, that English is quite an international language by itself? As for the forum labels: I really prefer "German" over "Deutsch". We would only discourage people from other languages to read the contents of the other sub forums. I can probably learn a lot by reading through the italian forums. But only if I can read it. 

Greetings from Berlin!
Timo

Re: German language
User: kmaclean
Date: 11/25/2007 10:33 am
Views: 337
Rating: 43

Hi Timo,

>> I will send you an email with a password to allow you to update the wiki.

>may this have slipped from your todo-list?

My mistake, I sent it to myself ... not to bright of me :) ... I'll resend

>Is it possible with the forum engine to have a hierarchical forum structure? 

I don't think so ... not without some programming at least (to keep the counts, etc).

>Otherwise we could move "established" languages to the top level (probably with a common prefix so that they all stand next to each other).

Take a look at the second "message board" I just created on the forum web page ... see if that looks OK.

I can move the contents of the current "other languages" forum to the new message board (note we will lose the views, rating and thread counts with such a move).

or we can look at integrating them into the main forum (WebGUI allows arbitrary ordering of the forums on a message board, so no prefix would be required)

Ken 

 

Previous