Click here to register.

Acoustic Model Discussions

Flat
Re: Merging Accoustic Models...
User: kmaclean
Date: 2/28/2008 10:19 pm
Views: 53
Rating: 14    Rate [

+

]

Very interesting discussion!

>I've tried some things with sox and compand that removed the "background"

>noise, but unfortunately in the process it also clipped part of my speech (and it

>didn't work consistently amoung speakers).

Julius has something called "spectral substraction" that seems (I've never used it myself) to be used to remove noise from speech input using pre-estimated noise spectrum from file.  Is there something equivalent for Sphinx?

>Why doesn't/hasn't someone taken something like WSJ1 and

>added/adapted it using all of these other speech files (Ie: ones from

>VoxForge, CMU, etc) that are available.

The source audio for the WSJ acoustic models can only be purchased from LDC - it is closed source.  However, it seems like any acoustic models derived therefrom are freely distributable.  

You could theoretically merge the WSJ and VoxForge acoustic models to create the "Super Acoustic Model" you were referring to, but the GPL license on the VoxForge corpus would prevent its *distribution*.  This is because there is no freely distributable source audio for WSJ1, and the creation of a collective work that includes a GPL work (like the VoxForge corpus), must be distributed under the GPL.  However, nothing stops you from merging them and only using them within your organization (as long as you don't distribute the resulting AM).

Ken 

--- (Edited on 2/28/2008 11:19 pm [GMT-0500] by kmaclean) ---

--- (Edited on 2/28/2008 11:20 pm [GMT-0500] by kmaclean) ---

Reply
Re: Merging Accoustic Models...
User: nsh
Date: 2/29/2008 1:44 am
Views: 59
Rating: 14    Rate [

+

]

> Julius has something called "spectral substraction" that seems (I've never used it myself) to be used to remove noise from speech input using pre-estimated noise spectrum from file.  Is there something equivalent for Sphinx?

Ken:There seems to be no equivalent on this. Actually it's only a three line code to remove estimate, there are better free methods availabe, they just need to be integrated.

> My application will be accepting incoming calls from numerous (ie: unlimited different speakers), probably the majority will be cell phones, and the majority will probably be while driving, hence a large amout of background noise (radio, road noise, passengers talking, etc).  In order for this to be successful, I will need to find a way of maintaining a 90% or better recognition rate even under those conditions.  This makes cleaning the incoming audio stream important. 

Another thing I forgot is that you probably need to start with more representative test set then. The one I evalutated is for sure not optimal. Only when you'll have test set that is big enough you could proceed with algorithm optimization. Test set must include noisy calls, I suppose it's just an issue of enabling recording on server

>  Wouldn't that increase the recognition rate overall, or is adaption limited to increasing the recognition rate for only one speaker? 

 Adaptation increase the rate for your environment dropping the issues related to original model environment, so it's for sure a good thing.

>  I also really need to figure out how to build good quality large language models, but I believe I may be able to figure that one out on my own (especially if you have any links handy to information on the subject).

 Ok, I'll look on the links on noise reduction. Update us about your progress too. There are too many things that should work properly. Another one is confidence score for example which you must use in your app to get the hypothesis correctly.

 

 

--- (Edited on 2/29/2008 1:44 am [GMT-0600] by nsh) ---

Reply
Re: Merging Accoustic Models...
User: oeginc
Date: 2/29/2008 8:30 am
Views: 93
Rating: 9    Rate [

+

]

> Another thing I forgot is that you probably need to start with more representative test set then. The one I evalutated is for sure not optimal. Only when you'll have test set that is big enough you could proceed with algorithm optimization. Test set must include noisy calls, I suppose it's just an issue of enabling recording on server

Yes, I understand.. I have actually started created a set of audio that is more representitive of my environment.  I was only getting 70% recognition rate in a controlled environment at the beginning of this thread though, so IMHO there was no need to proceed any further.. Now that I am getting 99% on my control, I can move into a more realistic test suite.

 

--- (Edited on 2/29/2008 8:30 am [GMT-0600] by Visitor) ---

Reply
Add