A friend of mine might be willing to contribute some German speech. However, she would like to read something that is entertaining to read. That means that she would prefer to read a normal text and not simply prompts in the Java submission app.
Would this be helpful? Obviously it creates some extra work, since she would upload for instance 15 minutes of speech all in one recording. Someone else then needs to chop up this recording into fragments that are short enough to compile an acoustic model with.
If it is still useful, can someone give me a link to a text that is interesting enough both from her perspective and from the perspective of VoxForge. I already know the Gutenberg project, but would appreciate it if someone could point me towards a specific text from this project or from somewhere else.
How about "Alice's Abenteuer im Wunderland" (http://www.gutenberg.org/files/19778/19778-h/19778-h.htm)?
Actually there are only 614 German books on Gutenberg.org and they can be found here: http://www.gutenberg.org/browse/languages/de
The 'advanced search' feature allows to further filter these books by category, author etc.
In case your friend delivers any speech, I gladly would split and transcribe it since I do not have the necessary hardware to submit speech of my own. Just tell me where or how to get it and I'll do my best.
Thanks for the suggestion. I like it, I hope she will too. Otherwise I'll as her to pick something else. It will take a while before the first recording will be ready, because I'm now on holidays and also I cannot order her to do it asap, but I'll let you know.
As I said, it might take a while...
I just uploaded the first batch of audio files. My friend was so kind to already record speech in little chunks. However, I don't know if they are the right size or not. I don't know what the optimum length more or less is.
I can upload to additional batches, but first I'd like to hear if I exported them correctly. I hardly ever use Audacity and the menu also changed since the last time I used it. Furthermore, I never ever exported into FLAC, but I thought it would make more sense. Before she ever used Audacity I adjusted the recording settings according to the VoxForge website, so those should be okay. She gave me the project files, so I can export again with different settings if anything is wrong.
Of course I listened and it sounded okay. I did not fill in all the data in the README file, because I didn't know where to check after opening a project file.
Also, I was prompted for metadata for each and every track... is there some way to avoid that? It was rather annoying to hit enter 230 times...
Let me know if I had to do something differently or if everything is okay and then I will upload the remaining audio.
>I just uploaded the first batch of audio files.
>My friend was so kind to already record speech in little chunks. However, I
>don't know if they are the right size or not. I don't know what the optimum
>length more or less is.
usually around 20-30 words, but I think it should be OK.
>I did not fill in all the data in the README file, because I didn't know where
>to check after opening a project file.
If you look to the left of the box containing the waveform display, it shows the audio as "Mono, 48000Hz, 16-bit PCM"
>Also, I was prompted for metadata for each and every track... is there
>some way to avoid that?
I have not figured that our either... very annoying.
>or if everything is okay and then I will upload the remaining audio.
Looks good to me :)
Good, I will upload the rest sometime before the end of this month.
I didn't want to break that promise, so I just finished uploading all the files. The README file is a bit more complete this time (the recording settings in the first batch were the same, so feel free to use the updated README file for the first batch as well).
Since my friend was nice enough to record Alice in Wonderland in little chunks, the only thing that remains to be done is to chop up the text from the book itself correspondingly. I hope there is a native speaker who is prepared to do this (it's a task that I should avoid due to physical reasons).
Looking forward to seeing this incorporated in the German corpus!
> I just finished uploading all the files.
Here is the link to the audio:
Alice1.tar.gz 10-Apr-2010 09:17 138M
Alice2.tar.gz 30-Apr-2010 10:52 254M
Alice3.tar.gz 30-Apr-2010 14:03 42.1M
Hi! I downloaded Alice3.tar.gz, and listened to Alice3-02.flac.
In the corresponding text file "Text from Gutenberg.txt", the word "Murmelthier" appears. This is something like "Hochdeutsche Mundart, year 1798" (not Standard German). In Standard German, we write Murmeltier (without h).
I would like to add the words that appear in the text "Alice's Abenteuer im Wunderland" to Ralf's German dictionary if they are not yet included. But I want to use the current orthography (Murmeltier), not the orthography from 1798 (Murmelthier).
You can see that this word is already included in my PLS dictionary:
The "r" in Alice3-02.flac is pronounced explicitly, the current version of Ralf's German dictionary contains an explicit "r" (mʊʀməltiːʀ). So you should be able to train the word Murmeltier (contained in Ralf's German dictionary) with the file Alice3-02.flac because the pronunciation /mʊʀməltiːʀ/ is OK. Only the <grapheme> element isn't a match.
When I speak the word Murmeltier, I pronounce as follows: /mʊʀməltiːɐ̯/ (the "r" at the end of the word is a different "r" because it follows the long vowel iː). You can see that there are two different kinds of "r". This means that my dictionary is not perfect (because it contains only the pronunciation /mʊʀməltiːʀ/, and not the pronunciation /mʊʀməltiːɐ̯/. But for specific speakers (like the speaker of Alice3-02.flac), it is already perfect.
So a problem at the moment is the orthography from 1798. How could we convert the orthography automatically?