VoxForge
> So it will not be better to randomize submitted prompt from a larger text file
>senteces base for the java applet?
At VoxForge, the SpeechSubmission app randomizes the start point for the prompts to be presented to the user from a larger text file of prompt sentences for English. For example, if there are 500 prompts, then the app randomly selects a number from 1 to 500, and starts reading 10 prompts from the line in the prompts file that corresponds to the random number. We do this because we are trying to get good triphone coverage. This will help us create a *general purpose* acoustic model - since we don't know how a potential user might use the acoustic model.
If you don't have a good idea where you want to use your acoustic model, then you should also work to create a *general purpose* acoustic model. This means that yes, you *should* randomize the prompts from a larger text file of sentences.
The SpeechSubmission app already does this. You just need to create a file containing a list of prompt sentences (10-15 words long), with a "prompt ID" as the first word on each line (the prompts ID is used to name the wav file the user records form the prompts).
Hope this clarifies things,
Ken
Yes we want to help you to creare a general purpose acustic model.
Is there already a simple script that split a text file and format in prompt sentences with a prompt ID?
Lua
>Is there already a simple script that split a text file and format in prompt
>sentences with a prompt ID?
Sorry, I forgot that the SpeechSubmission app will automatically add prompt IDs to a text file (using a pre-defined prefix, and an incrementing number). So there is no need to add your prompt ID to each line.
There is no script to split a text file ... though this can be done rather quickly manually. Try to break them where there might be a natural pause in the original sentence (i.e. break at commas or periods, etc.).
Ken
Hi, we have found this python tool that is very helpful in automatize sentence extraction from large text and it has others useful features:
http://nltk.sourceforge.net/index.php/Main_Page
Hi,
any new about the italian submission page?
We want to tell to italian volouteers to start to submit contributions from the java applets.
Cheer,
Lua
Hi Lua,
I'm in the final testing phase and it should be ready shortly (maybe as early as tomorrow - if all the tests come back OK).
Ken
Hi Lua,
The Italian translations to the Speech Submission Java App are live. Please review it and let me know if anything needs to be changed.
thanks,
Ken
There is a mistake:
genderSelection[1] = "Machio";
should be:
genderSelection[1] = "Maschio";
You should add the Napolitan dialect (dialetto napoletano) too.