General Discussion

Flat
Re: Other languages
User: kmaclean
Date: 9/28/2007 9:25 am
Views: 568
Rating: 26

Hi Jonas,

>We have quite a lot of tools and corpora, but unfortunately not much time to handle it at the moment. Is there a time plan for when there will be possibilities for people to  donate their recordings etc in other languages?

If you would like us to host your corpora on VoxForge, yes this can be done.  If it is a large corpus, I can set up an FTP link for your to upload your speech, transcriptions, pronunciation dictionary, and tools.  If it is not that large, I can set up a forum for Swedish, and you can upload files as time permits, and I can put them into the SVN repository.

>If I (for example) arrange a phone number etc and take care of the recordings and so on, would it then be possible to admit information and resources from these web pages for such a project?

Yes.  Note that we currently have an automated speech collection script that works with Asterisk and submits the audio automatically to a VoxForge Forum - see the VoxForge IVR project (many thanks to trevarthan for developing this app).  You could modify the prompts to suite your needs. 

thanks,

Ken 

German language
User: ralfherzog
Date: 8/14/2007 11:23 am
Views: 397
Rating: 30
Hello everyone!

I am interested in submitting speech under the GPL in the German language.  I could try to create prompt files in the German language which contain all the phonemes of the German language. I could help you to translate some VoxForge texts from the English language to the German language.  What about a section "de.voxforge.org"?

How is it possible to make VoxForge ready for the German language?
Re: German language
User: nsh
Date: 8/14/2007 1:11 pm
Views: 405
Rating: 36

Well there is phonetic dictionary I suppose and a language model can be easily constructed. So you can start right now with recording transcribed speech. Any free text from guttenberg.org is acceptable. It should be split on sentences.

Once there will be audio data it's possible to train model.

Add a category "German speech files"
User: ralfherzog
Date: 8/14/2007 11:31 pm
Views: 501
Rating: 36
Hi nsh,

OK, I can start with the recording of German speech.  But the question is: Where can I upload my German audio files, the German prompts, the GPL license (in English-language), and the README file (in English-language)?

Maybe there is someone - perhaps Ken - who could add a category "German speech files" under the following hyperlink:

http://voxforge.org/home/downloads/speech

At the moment, there are categories for English, Dutch and Russian. If you could add such a category for the German language, I would submit German audio files with German prompts.

Greetings
Re: Add a category "German speech files"
User: kmaclean
Date: 8/15/2007 7:38 am
Views: 426
Rating: 26

Hi ralfherzog,

Done ... I've added a new forum for submitting German Speech Files.

You should be able to find German translations of the GPL license on the fsf.org site - please  include both German and English versions of it in your submission.

You might also want to talk to the folks at the Simon project (dialog manager that uses Julius).   They are working on creating German Acoustic Models using HTK.

thanks, 

Ken 

Re: German language
User: kmaclean
Date: 8/15/2007 3:47 pm
Views: 488
Rating: 28

Hi Ralf,

I've created a dev site for German at:

http://www.dev.voxforge.org/projects/de (trac site) 

http://www.dev.voxforge.org/svn/de (subversion site) 

This is basically a Subversion site (used for software version control) with a Trac front-end.  Trac is nice because it provides a simple to use wiki environment.  I will send you password so you can log on and make changes (you don't actually need to log on to make changes, but then the wiki won't keep track of who made which changes; there are also some admin functions that require a log on).

With respect to creating a German version of the VoxForge site with something called: de.voxforge.org I need to think about how to structure this so that you could make the updates.  WebGUI (the content management system front-end) is a little difficult to learn at first, but very powerful. 

Once you have a few hours of audio (from different users), we can look at creating something like "de.voxforge.org".

all the best, 

Ken 

 

Re: German language
User: nsh
Date: 8/15/2007 4:12 pm
Views: 3169
Rating: 726

Great, nice to see such progress :)

A few thoughts about German.  CMU people were going to share the framework and a models trained from Vermobile (large German database):

http://www.speech.cs.cmu.edu/sphinx/twiki/bin/view/Sphinx4/GermanAcousticModel

http://sourceforge.net/forum/message.php?msg_id=4279928

I suppose it will take years to make a decision for them :( 

German dictionary is available here:

http://www.ims.uni-stuttgart.de/phonetik/synthesis/

under a restrictive license. But probably we can use it for bootstrapping. Under GPL we have only rules from espeak I suppose. The same situation as with Dutch.

Re: German language
User: timobaumann
Date: 11/21/2007 2:08 pm
Views: 348
Rating: 27
Hi und Hallo everybody!

as you seem to be talking about me (among others), I might just as well answer :-)

I've only started to explore this project a few days ago and am quite enthusiastic about it. I'd like to join the struggle for truly free acoustical models. The absence of *any* freely available models for most languages has been annoying me for a while.

Well, let's get more technical after this initial statement:

1. I've recently built acoustical models from the Kiel Corpus of Read [sic!] Speech, which work allright for me. I'll make them available to anyone who asks. The model is trained using the CMU SphinxTrainer. I'm unably to support HTK for the moment and unfortunately there is no way for me to share the original KCoRS data.

2. I've put my plans for a model based on the Verbmobil (VM) corpus on hold for the moment. I just lack the time to do the necessary perl voodoo and also the processor cycles. The latter will hopefully be resolved in early 2008.

3. We definitely lack a GPLed lexicon. There is the BOMP (the link in the dev section should probably point to http://www.ikp.uni-bonn.de/dt/forsch/phonetik/bomp/BOMP.en.html ), but it's limited to non-military research, thus more restrictive than GPL. The Verbmobil license is even more restrictive.

I don't know if the grapheme to phoneme conversion of the German version of Festival can be of any help to create a lexicon. (It would be cheated to use the German Festival as it is, because it contains BOMP and we would still violate its terms.) I've never used eSpeak, how does its G2P conversion compare to GFestival?

It might be an idea to just ask the author of BOMP, Stefan Breuer, if he would license BOMP under the GPL or if he could allow its use in the voxforge project.

The lexicon is the key issue in building models from the data, so we really have to find a solution, if we ever want to take off.

For the time being we should probably limit our prompt collection to a restricted vocabulary in order to limit the work needed to manually build a preliminary dictionary.

4. I've recorded a tiny digits corpus the other night (50 utterances totalling 191 digits :-). I'll upload that right after finishing this post. I'll add the perl script that I used to create the script, go ahead and record a few digit strings yourself!

5. Some more administrative stuff: I am unable to edit the dev wiki. Do I need an extra account for that?

6. What about a sub forum for the German language? This would improve both our communication as well as the visibility for this language's sub project.

Thanks for reading und Grüße aus Berlin!
Timo
Re: German language
User: kmaclean
Date: 11/21/2007 9:41 pm
Views: 418
Rating: 31

Hi Timo,

>5. Some more administrative stuff: I am unable to edit the dev wiki. Do I need an extra account for that?

Yes. 

Up until about 1-2 months ago, I had mod_security working perfectly to catch spammers on the Trac dev wiki, and allow users to post without signing in.  But when I upgraded the distro on the server, I could not get it to work properly... :(  I need to spend some more time on this.

I will send you an email with a password to allow you to update the wiki. 

>6. What about a sub forum for the German language? This would improve

>both our communication as well as the visibility for this language's sub project.

Certainly, what did you have in mind?

I could add a separate section on the Forums Page called "International" or something like that.  And have specific forums for each language we support.

I'd also like to use the proper labels for each language (I've been too English-centric up until now ...) - should the German forum be called the "Deutsch Forum"?

Ken

Re: German language
User: timobaumann
Date: 11/25/2007 2:28 am
Views: 364
Rating: 30

Hi Ken, 

> I will send you an email with a password to allow you to update the wiki.

may this have slipped from your todo-list?

> > 6. What about a sub forum for the German language?

> Certainly, what did you have in mind?

Well, I think I have figured out that in order to be on par with Dutch/Italian/etc., we'd just have to post one level higher, instead of commenting in the other languages thread. But then again: It would be cooler to have separate forums for each language within the other language forum, so that we can have different threads for each language (which would automatically improve our visibility, as only new threads are shown in the recent posts section). Is it possible with the forum engine to have a hierarchical forum structure? Otherwise we could move "established" languages to the top level (probably with a common prefix so that they all stand next to each other).

I personally don't care, if it's "international" or "other" languages. I kind of thought though, that English is quite an international language by itself? As for the forum labels: I really prefer "German" over "Deutsch". We would only discourage people from other languages to read the contents of the other sub forums. I can probably learn a lot by reading through the italian forums. But only if I can read it. 

Greetings from Berlin!
Timo

Previous