Click here to register.

German

Flat
Txt2Pho
User: kmaclean
Date: 4/16/2008 9:33 pm
Views: 244
Rating: 13    Rate [

+

]

This might be useful for the creation of a German pronunication dictionary: 

TXT2PHO - a TTS front end for the German inventories of the MBROLA project.

However, the software has some restrictive licensing provisions:

Permission is granted to use this software for non-commercial, non-military purposes, with and only with the lexicon and prosody files made available by the author from the HADIFIX for MBROLA project ...

Not sure if that would apply to pronunciations generated with the toolkit. 

Ken 

Reply
Re: Txt2Pho
User: timobaumann
Date: 4/18/2008 10:03 am
Views: 29
Rating: 10    Rate [

+

]

I don't think we can use it.
Using TXT2PHO in order to create a dictionary is close to reading the dictionary it uses (BOMP) directly. And both the dictionary and TXT2PHO itself clearly state they are non-military, which the GPL -- unfortunately -- is not.

Anyway, if we could use it, then we could just as well use BOMP directly.

I've had a first look at Sequitur G2P (which is a trainable g2p-tool) and it's likely that I will be allowed to use another trainable g2p-tool (without name, published in [1]). Thus, I will be able to compare the two and see which performs better. 

So, we need some data to bootstrap these trainable systems. I just checked in some tools that extract pronunciations from the German Wiktionary.

The resulting data has to be post-processed, before we can use it for bootstrapping. In order to priorize that, we could use the word frequency information from Wortschatz-project, for which a Perl-module (EDIT: newer version with fixed frequency extraction) is available.

I hope to be able to setup a webtool that helps to post-process the wiktionary output. Would there be anyone volunteering to actually use that webtool and help in creating the dictionary? Ralf, would you be willing (and able) to help?

Cheers!
Timo

 

[1]: Phonological Constraints and Morphological Preprocessing for Grapheme-to-phoneme Conversion
Vera Demberg, Helmut Schmid and Gregor Möhler, 2007
In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-07), Prague, Czech Republic, June 2007

Reply
Re: Txt2Pho
User: kmaclean
Date: 4/18/2008 11:25 am
Views: 47
Rating: 9    Rate [

+

]

Hi Timo,

Good work!

thanks,

Ken 

Reply
help to create the German dictionary
User: ralfherzog
Date: 4/20/2008 12:03 am
Views: 68
Rating: 10    Rate [

+

]
Hello Timo, I would use this web tool, and help you with the creation of the dictionary.  Greetings, Ralf
Reply
Re: help to create the German dictionary
User: Visitor
Date: 5/14/2008 4:12 pm
Views: 10
Rating: 3    Rate [

+

]

Hi Ralf,

sorry for not getting back to you any earlier.

I've set up a dictionary tool on http://www.ling.uni-potsdam.de/~timo/projekte/voxforge.html . The main task is to paste the entries in the first row on the right (Aussprachen) to the corresponding field on the left.

Now, if it was just that, it would be too easy and too boring...

Often, there are far more variants of the word on the left than there are transcriptions. In these cases it would be nice, if you could add the missing transcriptions (often it is just a matter of appending -ɐ or -ə or whatever.

Sometimes the list on the left contains ridiculous word forms -- just leave the corresponding field empty (or press "Wort entfernen", but the result will be the same). It may also happen, that you are asked for the same word more than once (there are different entries for "bin", "ist" "sind" in the wiktionary and each entry will ask about all different sein-forms). If you are sure you've entered a transcription already, then just ignore it the second time.

Sometimes there are actually more transcribed word forms than words on the left. (Or they are different.) Then you can add a word form on the left with "Wort hinzufügen". Note: Often there are different transcriptions for the same word form (ˈvɛltn̩, ˈvɛltən). Usually you would want to pick the form that would be used most in colloquial speech (here: vɛltn̩).

Also, there may just be erroneous transcriptions (quite often), where people just guessed how IPA works. It's important, that we catch most of these errors. So you might actually want to start out with the Wiktionary Transcription Guideline which shows, how the transcription *should* be.

To enter IPA symbols into the textfields directly, just type the keys listed on the right (for ŋ type N) and they will automagically be transformed to IPA. (This works in Firefox, I don't have Windows, so I can't check Internet Explorer.)

Please input your e-mail address or another kind of ID into the first textfield. This way we can later compare who's the most hard working transcriber!

Cheers, Timo

Reply
Re: help to create the German dictionary
User: timobaumann
Date: 5/14/2008 4:14 pm
Views: 11
Rating: 3    Rate [

+

]

clickable link: http://www.ling.uni-potsdam.de/~timo/projekte/voxforge.html

UPDATE: It's important that you transcribe, how something would be spoken in colloquial standard German. By the way, what region of Germany are you from? ;-)

Reply
Re: help to create the German dictionary
User: nsh
Date: 5/15/2008 1:50 am
Views: 10
Rating: 3    Rate [

+

]
Another good way and a popular nowdays method to get a dictionary is the following. You select a phoneset, build an LTS system that will generate variants and then use forced-alignment against the recording to check are pronuncations valid or not. This way you'll ensure automatically that you dictionary is correct.

Also you would probably be interested to look on Unilex dictionary available from CSTR to check how the modern dictionary looks.
Reply
Add