Audio and Prompts Discussions

Nested
Re: Missing prompts
User: tpavelka
Date: 3/9/2009 9:59 am
Views: 113
Rating: 18

Just to make sure that I am right about the forced Viterbi scores, I pulled the scores from the Voxforge logs for all the files and plotted the accuracies from the phoneme-only recognizer against the Viterbi scores to see if there is any correlation. Here's the result:

http://liks.fav.zcu.cz/tomas/score_vs_corr.png

see the upper left corner, these recordings have very low recognition accuracy but a very high acoustic score.

Here is the data:

http://liks.fav.zcu.cz/tomas/stats_wscores.txt

--- (Edited on 3/9/2009 9:59 am [GMT-0500] by tpavelka) ---

Re: Missing prompts
User: nsh
Date: 3/9/2009 9:03 pm
Views: 271
Rating: 8

> Can I spam here? ;-) If you would like to visit Pilsen, we are organizing a conference in September,

www.tsdconference.org


Funny enough. We did submit a paper to exactly this conference last year and it was rejected mostly due to my bad writing skills and little material :) That's why I argue someone should help us to promote the corpus.

> My experience is that the acoustic score coming out of the Viterbi algorithm is pretty much useless (unless you have a really big mismatch between transcription and the actual utterance). Results from phoneme only recognizer are a bit better, but (as I have shown in the experiment) not by much.

Yes, I proven to be wrong here. Though I often used alignment to find bad transcriptions but I now see it's not the best way.

>  Another thought I had was inspired by the RANSAC algorithm

Thanks to rjmunro for the idea. For me it sounds great not to cleanup the data with some prior knowledge but use generic methods to train good model with prior knowledge that we have garbage. This problem is similar to the generic problems other machine learners have. For example web-collected data has garbage by definition and Google and others assume this as a precondition and use algoritm that are robust to noise. I quickly searched for the articles with such methods in acoustic training but only found links on cleanup strategy for Bayesian classifiers and so on. There is probably a sense to search more, it should be a common problem.

 

--- (Edited on 3/9/2009 9:03 pm [GMT-0500] by nsh) ---

Re: Missing prompts
User: kmaclean
Date: 3/18/2009 12:56 pm
Views: 2402
Rating: 8

I think the search term you are looking for is: "Lightly Supervised Acoustic Model Training".  There is a paper by the same name by Lori Lamel et. al.  which describes the process as follows:

The basic idea is to use a speech recognizer to automatically transcribe unannotated data, thus generating labelled training data.  By iteratively increasing the amount of training data, more accurate acoustic models are obtained, when can then be used to transcribe another set of unannotated data.


 

--- (Edited on 3/18/2009 1:56 pm [GMT-0400] by kmaclean) ---

PreviousNext