This might be useful for the creation of a German pronunication dictionary:
TXT2PHO - a TTS front end for the German inventories of the MBROLA project.
However, the software has some restrictive licensing provisions:
Permission is granted to use this software for non-commercial, non-military purposes, with and only with the lexicon and prosody files made available by the author from the HADIFIX for MBROLA project ...
Not sure if that would apply to pronunciations generated with the toolkit.
Ken
I don't think we can use it.
Using TXT2PHO in order to create a dictionary is close to reading the dictionary it uses (BOMP) directly. And both the dictionary and TXT2PHO itself clearly state they are non-military, which the GPL -- unfortunately -- is not.
Anyway, if we could use it, then we could just as well use BOMP directly.
I've had a first look at Sequitur G2P (which is a trainable g2p-tool) and it's likely that I will be allowed to use another trainable g2p-tool (without name, published in [1]). Thus, I will be able to compare the two and see which performs better.
So, we need some data to bootstrap these trainable systems. I just checked in some tools that extract pronunciations from the German Wiktionary.
The resulting data has to be post-processed, before we can use it for bootstrapping. In order to priorize that, we could use the word frequency information from Wortschatz-project, for which a Perl-module (EDIT: newer version with fixed frequency extraction) is available.
I hope to be able to setup a webtool that helps to post-process the wiktionary output. Would there be anyone volunteering to actually use that webtool and help in creating the dictionary? Ralf, would you be willing (and able) to help?
Cheers!
Timo
[1]: Phonological Constraints and Morphological
Preprocessing for Grapheme-to-phoneme Conversion
Vera Demberg, Helmut Schmid and Gregor Möhler, 2007
In Proceedings of the 45th Annual Meeting of the Association for
Computational Linguistics (ACL-07), Prague, Czech Republic, June 2007