Made a blog post on your wonderful project at http://flossexperiences.wordpress.com/2010/04/30/voxforge-speech-recognition/ . Hope you can address some of the concerns I have outlined or atleast have some bugs opened. I would have opened the bugs myself but only if openid would have been implemented at the site.
Good luck with the project :)
Please note that I don't normally answer questions posted on another site... I you have specific questions, please ask them here.
>a. Repository on Edge of networks :- The raw speech file has to be
>uploaded to a server somewhere in Europe. It would have been lot easier
>or better if the project ties up with network mirror providers and one could
>upload the file to the nearest mirror.
This is a small project with no funds, and the files are actually located on a hosting server somewhere in the US (1&1).
>b. Repository of language :- I don’t know if people have thought of this.
>From what little I know there are atleast 5-6 types of known English.
>Indian English being a case in point.
The speech submission applet asks users to select their dialect.
>It would have been interesting as to whether the types of English affects
>the efficiency of the acoustic models which the developers are trying to
Yes it does... the general rule is that speech recognition engine best recognize speech that they were trained with... so if the audio used in training the acoustic model is all Indian English, it will recognize utterances from Indian English speakers much better than American English.
>I do see that there is a survey of Microphones for recording but it needs
>to do little bit more of that. Share with people which of these are
>better/worse alongwith pricing.
Are you volunteering to start this up? this is open source, and all projects start with a need that someone wants filled...
>Also explain about noise Many of my friends are into designing low-noise
>desktops (which is a niche market) and the cheapest systems are at 50
>K (Indian prices) . This is again a barrier.
We try to collect as much speech as we can in as many different environments/hardware setups as possible.
>d. Processing :- If you look at the uploads page, there is something
>called Processing . It would have been good if they were to share what
>the processing is all about.
That was just a flag to let me know that the particular submission was processed - i.e. segmented into 15-25 words utterances with matching transcriptions so that the acoustic model training process can work.
>e. Size :- Throughout the site, there is no average or even some idea of
>how big a single raw audio file would be.
> ftp :- FTP implementations are themselves something that needs to be
FTP is only for audiobooks...
the Read page has a Java applet that prompts users for a recording and uploads the result to the Voxforge repository server.