Acoustic Model Discussions

Flat
Problems with the word "FILTER"
User: colbec
Date: 7/4/2008 7:50 am
Views: 5969
Rating: 22

I'm using Julius with a simple grammar which is working close to 100% accurate with one annoying exception, the grammar instruction "FILTER ALPHA." I have recorded and re-recorded the prompts, checked the phoneme construction but still that combination tops the error charts.

So I set up a test routine that plugs away at that issue and records results. Using the grammar:

FILTER ( ALPHA BRAVO CHARLIE PAPA )

FOCUS ( ALPHA BRAVO CHARLIE PAPA )

the error results after testing the possibilities 8 times:

+------+--------------+----------+----------+-------+
| thid | thname       | thmax    | thmin    | therr |
+------+--------------+----------+----------+-------+
|  286 | FILTER ALPHA | -20260.4 | -20260.4 |     7 |
|  283 | FILTER PAPA  | -19547.2 | -18447.2 |     2 |
+------+--------------+----------+----------+-------+

So after 8 attempts to say filter alpha only one was correct and the score was extreme. filter papa was also a problem but much less so, recognized 6 times correctly and two times wrongly. When filter papa was recognized the score was in a fairly narrow range which indicates some consistency.

The combinations with FOCUS were all registered correctly every time, and both FOCUS and FILTER were fine with CHARLIE and BRAVO.

Examining  the detailed output from Julius shows that sometimes filter alpha is seen in pass 1 and then the engine decides that is wrong in pass2.

I guess my question is, is my problem with filter, or alpha or filter alpha? Is this a known difficulty, is there a parameter that can sensibily be tweaked in Julius in this case?

FILTER is in my lexicon twice, with "f ih l t ax" and "f ih l t ax r".

Thoughts appreciated. 

--- (Edited on 7/4/2008 7:50 am [GMT-0500] by colbec) ---

Re: Problems with the word "FILTER"
User: nsh
Date: 7/4/2008 3:27 pm
Views: 103
Rating: 9

According to cmudict, it must be

FILTER               F IH L T ER

In the case of similar troubles, you should just submit more variants to the decoder and it will find the proper one.

 

--- (Edited on 7/4/2008 3:27 pm [GMT-0500] by nsh) ---

Re: Problems with the word "FILTER"
User: colbec
Date: 7/4/2008 4:34 pm
Views: 291
Rating: 9

Thanks for your thoughts nsh. I am using the same phoneme structure as the BEEP lex so it is not directly comparable to the cmu lex. My understanding is that as long as I am consistent in the use of one phoneme structure it does not matter which lexicon is used. However it is worth a try and I will switch the lex and report back.

I also considered variant phoneme combinations, but apart from the two mentioned the others seemed too bizarre to be applicable. I will keep looking.

I was wondering if maybe the 'silence' between "filter" and "alpha" would be an issue, whether it is interpreted correctly and consistently in prompt and recognition enunciations. This too I can test by providing more example prompts for the model to digest.

--- (Edited on 7/4/2008 4:34 pm [GMT-0500] by colbec) ---

Re: Problems with the word "FILTER"
User: colbec
Date: 7/9/2008 1:42 pm
Views: 102
Rating: 8
I finally have my grammar working without an error. It has been an interesting journey.

The number of errors generated does not seem to depend on the dictionary used, just the phoneme structure of individual words. However I have standardized my efforts on the CMU dictionary, there is too much work involved in making minor adjustments to records to keep multiple database tables going.

I did try to implement multiple phoneme variants for individual words, but this just seemed to make things worse. Errors such as not identifying a triphone would be thrown, and at the most annoying time, when everything has been created and we are into the testing phase. In fact it was the terminating triphone issue that was largely responsible for the prompt being identified correctly on pass 1 and then rejected on pass 2.

My errors were generated by poor phoneme choice for words. An example is the words APPLE and ALPHA. The initial 'A' sounds like the same sound in each word, but when I used identical initial phonemes for both words the model started throwing errors. It seems that not only is it important to listen to the sound but also consider the following consonant. I also ran into a lot of cross identification with ALPHA and PAPA which was resolved with phoneme adjustment. A+P calls for a different 'A' phoneme than A+L, at least that is what my epidemiology from multiple tests is telling me. Using a different phoneme immediately eliminates my APPLE/ALPHA/PAPA cross identification.

So I guess the issue comes down to my poor understanding of the role of phonemes. Short of moving to Pittsburgh and enrolling in courses at CMU can anyone recommend any good reading?

PS: for anyone importing the CMU dictionary from sourceforge into a MySQL database - the text file format appears to follow a pattern of two spaces separating the two fields. However LOAD DATA INFILE throws 49 warnings. SHOW WARNINGS lists the lines that are a problem and it is short work to standardize those lines to get a clean input.

--- (Edited on 7/9/2008 1:42 pm [GMT-0500] by colbec) ---

Re: Problems with the word "FILTER"
User: kmaclean
Date: 7/9/2008 3:32 pm
Views: 93
Rating: 9

Hi colbec,

>So I guess the issue comes down to my poor understanding of the role of

>phonemes. Short of moving to Pittsburgh and enrolling in courses at CMU

>can anyone recommend any good reading?

There are some resources listed here: Speech Recognition Books/OpenCourseware

I would recommend: SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by Jurafsky et al,  chapter 7 Phonetics.  You should be able to find an plder edition at a university library.

The MIT open courseware site has a course called 6.345 Automatic Speech Recognition (2003) which discusses:

Hope that helps,

Ken

 

--- (Edited on 7/9/2008 4:32 pm [GMT-0400] by kmaclean) ---

Re: Problems with the word "FILTER"
User: kmaclean
Date: 7/9/2008 3:51 pm
Views: 123
Rating: 6

Hi Colbec,

>Using a different phoneme immediately eliminates my

>APPLE/ALPHA/PAPA cross identification

Are you creating triphone acoustic models?  Triphones provide context for a phone - i.e. these allow the speech recognition engine to distiguish between different pronunciations (if there are different pronunciaitons in your dialect...) of the "A" phone in APPLE/ALPHA/PAPA.

In addition, if you don't have enough speech audio in your acoustic model, then "state-tying" (see VoxForge Tutorial: Step 10 - Making Tied-State Triphones) will try to group similar triphones together.  In some cases, not having enough data (and the state tying that results from this) might counteract the benefit of creating a triphone acoustic model.  This might be the source of your problems.

Ken

--- (Edited on 7/9/2008 4:51 pm [GMT-0400] by kmaclean) ---

Re: Problems with the word "FILTER"
User: colbec
Date: 7/9/2008 5:16 pm
Views: 2212
Rating: 7

Thanks Ken, this is helpful. Particularly the reference to step 10 which I have to re-read thoroughly.

Regarding quantity/quality of speech data: from my grammar I prepare a randomly sorted list of prompts which ensures that each valid combination of words "FILTER APPLE", "FILTER ALPHA" is repeated twice, with a sprinkling of other out of vocabulary combinations.This is re-recorded as necessary to ensure the data set is consistent.

The resulting list from my more complex grammar is about 140 prompts. When I run into a problem with the big grammar and I need to test quickly I have a smaller grammar which would contain just a subset of the big grammar, but would still contain at least 40 prompts.

Quantitatively I don't know if this would be sufficient to provide a reliable testing basis or not. But given that I have an error free run right now it must be close. But then maybe I just stumbled on a weird combination that works as long as I don't touch it!

--- (Edited on 7/9/2008 5:16 pm [GMT-0500] by colbec) ---

PreviousNext