VoxForge
Re: Librivox contributions and dates/numbers
Tony, since I was challenged on a particular detail of one of my responses I tailored the example to suit that situation. I maintain that a recognizer would have a problem distinguishing between SAY ONE and SAY WON since they have identical phoneme representations. But, I don't have any data at this point to back that up, so I will leave it as a claim at this point, ready to be withdrawn later in the face of evidence.
Your example is a good one to bring us back to the original topic. Of course in a Librivox context the audio will already have been decided for you, TEN SIXTY SIX or ONE THOUSAND [AND] SIXTY SIX.
So the master lexicon will need to contain a subset of 1066, TEN, SIXTY, SIX, THOUSAND, AND, ONE, TEN_SIXTY_SIX and so on. I'm just concerned that the 1066/TEN_SIXTY_SIX approach (which of course is perfectly valid) for numbers means a perpetually growing lexicon.
--- (Edited on 5/20/2012 12:24 pm [GMT-0500] by colbec) ---