Age range: adult;
Pronunciation dialect: British English (mother tongue is Dutch).
Microphone: USB Desktop boom microphone Logitech;
Audio Card: USB Desktop boom microphone Logitech;
Audio Recording Software: Audacity rel 1.2.6;
O/S: Windows XP.
File type: wav;
Sampling rate: 48kHz;
Sample rate format: 16bit;
Number of channels: 1.
ar-01 Once there was a young rat named Arthur who never could make up his mind.
ar-02 Whenever his friends asked him if he would like to go out with them,
ar-03 he would only answer, "I don't know;" he wouldn't say yes or no either.
ar-04 He would always shirk making a choice. His Aunt Helen said to him,
ar-05 "Now look here! No one is going to care for you if you carry on like this.
ar-06 You have no more mind than a blade of grass."
ar-07 One rainy day the rats heard a great noise in the loft.
ar-08 The pine rafters were all rotten, so that the barn was rather unsafe.
ar-09 At last the joists gave way and fell to the ground.
ar-10 The walls shook, and all the rats' hair stood on end with fear and horror.
ar-11 "This won't do," said the captain; "I'll send out scouts to search for a new home."
ar-12 Within five hours the ten scouts came back and said,
ar-13 "We found a stone house where there is room for us all.
ar-14 There is a kindly horse named Nelly, a cow, a calf, and a garden with an elm tree."
ar-15 The rats crawled out of their little houses and stood on the floor in a long line.
ar-16 Just then the old rat saw Arthur. Stop. he ordered coarsely.
ar-17 "You are coming, of course." "I'm not certain," said Arthur, undaunted,
ar-18 "The roof may not come down yet."
ar-19 "Well," said the old rat, "we can't wait for you to join us. Right about face! March!"
ar-20 Arthur stood and watched them hurry away.
ar-21 "I think I'll go tomorrow," he said calmly to himself, "but then again I don't know;
ar-22 it's so nice and snug here,". That night there was a big crash.
ar-23 In the foggy morning some men with some boys and girls rode up and looked at the barn.
ar-24 One of them moved a board and saw a rat quite dead, half in and half out of his hole.
Copyright (C) 2007 Free Software Foundation
These files are free software; you can redistribute them and/or
modify them under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
These files are distributed in the hope that they will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
--- (Edited on 2/11/2007 2:44 pm [GMT-0600] by Robin) ---
Thanks for the submissions!
I had a small problem with last night's run, but it has been fixed. I installed some new backup hard drives, and while creating the mounts, I was reviewing my firewall rules, and noticed an entry that I thought (wrongly) should not be there ... now I know why it was there :)
You latest submission (VF-1) will run tonight.
--- (Edited on 2/12/2007 1:49 pm [GMT-0500] by kmaclean) ---
Thanks for your work behind the scenes Ken!
Sending in a couple of recordings is the least I can do. I did not (yet) check out any of the nightly builds, so don't worry about minor delays on my account.
ps sometimes I wonder if I pronounce words correctly and I definitely was unsure with "etc" (in vf1 I think). I opted for "etcetera". I guess that's okay? If not let me know and I can 'patch' that. I think such little things might be esp. hard on non-natives, but perhaps 'difficult' prompts can benefit from a tiny 'read this first message' (also names and surnames can be tricky, but perhaps that is less of an issue for the final result?)
--- (Edited on 2/12/2007 3:01 pm [GMT-0600] by Robin) ---
If you are wondering about pronunciations, the VoxForge Dictionary might provide you with some indication as to the pronunciation. For example, the word "etc" shows up as follows in the dictionary:
ETC [ETC] eh t s eh dx er ax
ETCETERA [ETCETERA] eh t s eh dx er ax
You really don't need to know how the phonemes are pronounced in this particular example, because you can see that 'ETC' and 'ETCETERA' contain the same phonemes, and therefore should be pronounced the same.
For other words you are not sure how to pronounce, you can look at their component phonemes and search for similar strings of phonemes until you find a word you know how to pronounce.
For example, for the word "windward", you would look it up in the dictionary and find:
WINDWARD [WINDWARD] w ih n d w er d
You would then search for the string "w er d" and find the word "word"
WORD [WORD] w er d
So now you know you would pronounce the word windward as "wind" + "word".
Note that this is not clearcut in all instances, because some dialects pronounce the "ward" in the word "windward" like the "ward" in the word "award", see this dictionary entry:
AWARD [AWARD] ax w ao r d
Therefore, it all depends on the target users of the speech recognition system and what their own particular dialect is. And if we are targeting an Acoustic Model to this particular dialect, we might add an entry to the dictionary like this:
WINDWARD [WINDWARD] w ih n d w ao r d
But in the non-native speaker case, where you might not have any idea how to pronounce a word, the dictionary is a good start.
Another approach might be to listen to the audio from someone else's submission to see how they pronounce it.
With respect creating some pronunciation indicators for non-native speakers, I can't promise anything anytime soon, but will add it to the issue tracker, so that at least it is in the pipe to get looked at (see ticket # 142).
Hope that helps,
--- (Edited on 2/12/2007 9:00 pm [GMT-0500] by kmaclean) ---
Those tips are definitely useful! Perhaps instead of the 'read this first messages' it is enough to mention something like:
"Unsure about pronunciation? Check out the VoxForge
Dictionary or listen to the recordings of a native speaker."
Somewhere after the next line in the instructions perhaps?
"Leave a one second pause before and after you speak (these pauses will help in determining noise levels in your recordings). Speak normally, not too fast or too slow, and clearly. Speak as you would if you were reading the text aloud to someone else, with appropriate pauses corresponding to the punctuation in the text."
Well I'll leave that up to you. I realize there is plenty of work to do!
--- (Edited on 2/13/2007 2:35 am [GMT-0600] by Robin) ---
thanks for the feedback,
--- (Edited on 2/13/2007 11:01 am [GMT-0500] by kmaclean) ---