Acoustic Model Discussions

Flat
Extending an Acoustic Model's phone set
User: Cedric
Date: 5/21/2013 11:26 pm
Views: 4696
Rating: 13

Hello all,

I am working on a speech assessment system which targets people with speech disorders, currently using pocketsphinx, and I need to recognize specific mispronunciation sounds (such as glottal stops, pharyngeal fricatives and hypernasal consonants) in addition to regular English sounds.

To do this, I would like to train new phones and add them to the default English acoustic model. I want to take advantage of the model in order not to start the training procedure from scratch. For the training, I would use recordings which contain both regular English phones and mispronunciation phones, but I want to learn only the new phones from them. The existing acoustic model could help to generate a better segmentation of the recordings, so that the new phones are trained on the appropriate speech segments.

How can I do this using Sphinx? I guess I have to make some tweaks to sphinxtrain, but I don't understand the training procedure well enough to get started. Any clue, thought or opinion on the matter is welcome!

Thank you in advance for the help,
Cedric

 

--- (Edited on 5/21/2013 11:26 pm [GMT-0500] by ) ---

Re: Extending an Acoustic Model's phone set
User: nsh
Date: 5/23/2013 7:21 pm
Views: 69
Rating: 12

Please find the answer on CMUSphinx forum

https://sourceforge.net/p/cmusphinx/discussion/help/thread/d63376e7/

--- (Edited on 5/24/2013 04:21 [GMT+0400] by nsh) ---

Re: Extending an Acoustic Model's phone set
User: Cedric
Date: 5/23/2013 9:24 pm
Views: 2040
Rating: 10

Below is a copy of my answer on the CMUSphinx forum:

Hello Nickolay, thank you for your answer.

An acoustic model contains context-dependent detectors for phones, not just phones.

Sure, I understand that, but still, all the context-dependent phones that do not contain the new phones in their surrounding context have already been learned and do not have to be reestimated, right? Also, I would think that the available models would help to segment the data and improve the training for the new models.

In any case, my problem is that I have very little data available and it would not be sufficient to train the entire set of English phones in addition to the new phones. Even though I could add speech files from VoxForge to my dataset, if possible, I wanted to make use of the acoustic model provided with pocketsphinx, as it has given me better accuracy results so far than the model from VoxForge.

--- (Edited on 5/23/2013 9:24 pm [GMT-0500] by Cedric) ---

PreviousNext