How to segment audio data manually ? Any guidlines?
User: Binh
Date: 9/30/2013 4:46 am
hi there,

I need some pointers how to segment my audio data to minimize the following error.

ERROR: "main_align.c", line 765: Final state not reached; no alignment for wagner_mann03_dw_wagner_de_008

The reason I am asking is because I tried to segment following german text.

In this picture you can see how I cut it. As you can see the speaker makes a pause beetwen every sentence.

I cut right in the middle of these. This lead to a lot of alignment errors while using Sphinxtrain. Almost 80% of my new data got rejected.

Segmented Data for Speaker 1:

Forced Alignment helps a lot at this point but since I am cutting the audio manually I wonder if I can minimize these problems by following some kind of "cutting guidlines".

So are any general points I have to consider while segmenting audio for training? Like "length should be beetween 5-10 seconds"(from your wiki)

Be aware that I am intentionally NOT sharing the training folder right now because it is really big(whole german voxfoge corpus). 

And because I am asking for more general pointers or "best practice" for segmenting audio for speech recognition training.


P.S. I posted the same request in Sphinx Help.



