Librivox contributions and dates/numbers
In reviewing a possible audio file I came across a lot of dates in one section, 1800, 1839 and so on.
This raises the issue of whether in a prompt context it is better to deal with these numbers in the text2prompts stage, (ensuring that 1800 becomes "eighteen hundred" for example) or including '1800' as a separate word in the lexicon.
The downside of the latter is that potentially you end up with a lot of numbers in your lexicon, eventually more numbers than words. The pre-treatment seems to be more efficient.
Is there an industry standard or even Voxforge preference for this?
--- (Edited on 5/18/2012 10:55 am [GMT-0500] by colbec) ---