VoxForge
This is taken from a post I made to the comp.speech.research newsgroup:
Hi, 
 
I am the admin for the VoxForge project.  We are collecting user 
 submitted speech for incorporation into a GPL Acoustic Model ('AM'). 
 Currently we have a Julius/HTK AM being created daily, incorporating 
 newly submitted audio on a nightly basis. 
 
I am confused as to which approach to take in the creation of the 
 VoxForge speech corpora.   Up until now, we have been asking users to 
 submit 'clean' speech - i.e. record their submission to ensure that all 
 noise (i.e. non-speech noise such as echo, hiss, ...) is kept to an 
 absolute minimum.  One guy (very ingeniously I thought) records his 
 submissions in his closet or in his car! 
 
But some people, whose opinions I respect, say that I should not be 
 collecting clean speech, but collecting speech in its 'natural 
 environment', warts and all, with echo and hiss and all that (but 
 avoiding other background noise such as people talking or radios or 
 TVs, ...).   On some submissions, the hiss is very noticeable. 
 
What confuses me is that some speech recognition microphones are sold 
 with built-in echo and noise cancellation, and the marketing says that 
 this improves a (commercial)  speech recognizer's performance.  This 
 indicates to me that I should be collecting clean speech, and then use 
 a noise reduction and echo cancellation front-end on the speech 
 recognizer, because that is what commercial speech recognition engines 
 seem to be doing. 
 
And further, if clean speech is required, should I be using noise 
 reduction software on the submitted audio (such as the submission with 
 very pronounced hiss).  My attempts at noise reduction have not been 
 successful, with the resulting 'musical noise' (the low level sound 
 that replaces the removed noise) giving me very poor recognition 
 results. 
 
I was wondering what your thoughts on this might be, 
 
thanks for your time, 
 
--- (Edited on 2/ 6/2007 1:49 pm [GMT-0500] by kmaclean) ---
David Gelbart's reply:
Hi Ken, 
 
Here are my opinions regarding your question. (For those who haven't 
 heard of VoxForge (voxforge.org), Ken and his contributors are striving 
 to  make open source speech recognition more practical by collecting 
 speech data under a free license.   They are currently aiming to help 
 enable desktop command & control and telephone call response (IVR) 
 speech recognition.  In the longer term they hope to help enable 
 dictation.  Ken has heard from me before, so I hope my posting here 
 won't discourage others from joining in.  I think a lively discussion 
 would be very healthy.) 
 
I feel you should aim to collect 'natural' speech, with a set of mics 
 that reflects the range of mic technology that your future users will 
 be using.    Using noise reduction in the front end of the ASR system 
 may be a good idea. If so, you probably should employ it both when 
 training models and when actually using the recognizer (unless one of 
 these situations will have much cleaner speech coming in than the 
 other). If you are having 'musical noise' problems with your noise 
 reduction, you may do better with one of the noise reduction solutions 
 at http://isca-students.org/freeware.   I have never heard any musical 
 noise problems with the Qualcomm-ICSI-OGI package, for example.  I 
 don't think the QIO package's license will suit you, but there are 
 other options there such as CtuCopy (which I haven't tried) which is 
 under GPL. 
 
I think it's a great that you are collecting information on microphone 
 type from people submitting data.   Perhaps you should have some 
 category codes in addition to the specific model name, so that you can 
 automatically identify what parts of your data are using particular 
 categories of microphones.   Why?   Let's say, for example, that you 
 are building a model for dictation.  Dictation works better with 
 headset mics than ordinary desktop mics, so presumably your users would 
 tend to use headset mics. If you choose to include a lot of desktop mic 
 data in the training set along with headset mic data, this will make 
 the models better prepared to deal with a mix of headset and desktop 
 mic users, but if there are only going to be headset mic users you may 
 lose performance from it.   This might happen, in particular, if the 
 greater variance / reduced sharpness in the models due to the inclusion 
 of desktop mic data turns out to be a bigger  performance factor than 
 the extra coverage of human voice types and triphones that you will get 
 from the extra data.   (As an aside, I think speaker adaptive training 
 (SAT) may help during training to preserve model sharpness when mixing 
 different microphone types. it is designed to preserve model sharpness 
 in the face of speaker variation and I suppose this would carry over to 
 many kinds of recording environment variation. Likewise, I know from 
 experience that employing a speaker adaptation technique such as MLLR 
 during recognizer use can improve performance in the face of recording 
 environment variation.) 
 
--- (Edited on 2/ 6/2007 1:50 pm [GMT-0500] by kmaclean) ---
Hi Webmaster,
Voxforge rocks!!!
We have put up a flash based recorder on our website. To see it, please go to http://emandi.mla.iitk.ac.in:9000/kisanblog/loudblog/index.php
and enter guest/guest as login/password
You can then record files in the flash recorder.
As has been previously discussed on these forums, the voxforge project needs something like that.
I offer to provide you with the source code and integrate it into the voxforge site. Please contact me at abhishek[dot]singh[at]simmortel[dot]com
Cheers!
Abhishek. 
--- (Edited on 9/23/2007 1:04 am [GMT-0500] by bailoo) ---
Hi all,
Old link of flash based recorder is dead.
The working link is now moved here:
http://opaals.iitk.ac.in:9000/kisanblog/loudblog/index.php
This project is developed upon open-source Loudblog project.
Thanks,
Rohit
katrohit at gmail dot com
--- (Edited on 3/7/2008 6:53 am [GMT-0600] by Visitor) ---
links to papers from this Kaldi post:
https://www.researchgate.net/publication/221489763_Adding_noise_to_improve_noise_robustness_in_speech_recognition
https://www.microsoft.com/en-us/research/publication/an-investigation-of-deep-neural-networks-for-noise-robust-speech-recognition/
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=940823&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D940823
--- (Edited on 11/17/2016 11:56 am [GMT-0500] by kmaclean) ---