General Discussion

Nested
Re: Acoustic model 0.1.2
User: nsh
Date: 9/6/2008 6:06 am
Views: 214
Rating: 8

Well, it's important to have clean data and quantative tests, without them it's impossible to move forward.

Turned by this discussion I started to train sphinx model, it will take a week I suppose on my machine, but probably we'll move training to the cluster.

I already hav found the following problems in prompts:

corno1979-10102006
kylegoetz-10122006
corno1979-10102006-NR - bad PROMPTS

mfread* - no PROMPTS, just prompts.txt

douglaid-20080205 vf-01 instead of vf-1

many PROMPTS has ../../../Audio/MFCC/XXkHz_YYbit/MFCC_0_D/ inside

douglaid-20080203 - incorrect prompt line

mojomove411-20071102-poe/wav/iaf0007 KILTARTAN\342\200\231S - bad word

 And conducted the list of problematic utterances for which alignemnt failed, it would be nice to review them:

douglaid-20080219/wav/vf11-07,
douglaid-20080219/wav/vf11-08,
douglaid-20080219/wav/vf11-11,
knotyouraveragejo-20080428-adv/wav/adv0231,
G-20080425-itf/wav/b0002,
xaviergonz-20080419-uje/wav/a0398,
xaviergonz-20080419-uje/wav/a0404,
ductapeguy-20070308b/wav/bab.0023,
peterwhy-20080503-win/wav/win0151,
chocoholic-20070524/wav/eti0091,
chocoholic-20070524/wav/eti0237,
anonymous-20080204-hnl/wav/ar-24,
anonymous-20080716-sfu/wav/a0340,
knotyouraveragejo-20080502-adv/wav/adv0280,
anonymous-20080630-lhi/wav/a0285,
gilrim-20080120-vgs/wav/b0415,
rjmunro-20080517-win/wav/a0236,
Toyo-20080229-ogz.zip/wav/a0104,
Toyo-20080229-ogz.zip/wav/a0105,
Toyo-20080229-ogz.zip/wav/a0106,
Toyo-20080229-ogz.zip/wav/a0108,
Toyo-20080229-ogz.zip/wav/a0111,
Toyo-20080229-ogz.zip/wav/a0112,
mjmm-20080526-hca/wav/b0074,
mjmm-20080526-hca/wav/b0075,
mjmm-20080526-hca/wav/b0076,
mjmm-20080526-hca/wav/b0077,
mjmm-20080526-hca/wav/b0078,
mjmm-20080526-hca/wav/b0079,
mjmm-20080526-hca/wav/b0080,
mjmm-20080526-hca/wav/b0081,
mjmm-20080526-hca/wav/b0082,
knotyouraveragejo-20070621-sci/wav/sci0150,
nestea247-20080301-sbn/wav/a0310,
corno1979-10102006-NR/wav/cc011,
corno1979-10102006-NR/wav/cc012,
corno1979-10102006-NR/wav/cc016,
corno1979-10102006-NR/wav/cc018,
corno1979-10102006-NR/wav/cc026,
corno1979-10102006-NR/wav/cc033,
corno1979-10102006-NR/wav/cc036,
corno1979-10102006-NR/wav/cc039,
Mark_Reynolds-20070531-cc/wav/cc-27,
cebidae-20080522-nsi/wav/b0385,
gilrim-20080120-ohc/wav/a0495,
gilrim-20080120-ohc/wav/a0500,
xenobyte72-20080530-pgo/wav/b0131,
kayray-20070611-ele/wav/ele0116,
chocoholic-20070612-eti33/wav/eti0278,
bloomtom-20080612-pfg/wav/a0401,
KnitGirl-20071113-dil/wav/b0274,
gilrim-20080120-uxi/wav/a0093,
gilrim-20080120-uxi/wav/a0094,
gilrim-20080120-uxi/wav/a0095,
gilrim-20080120-uxi/wav/a0096,
gilrim-20080120-uxi/wav/a0097,
robertburrelldonkin-20070918-vf16/wav/vf16-22,
cebidae-20080522-npq/wav/a0264,
cebidae-20080522-npq/wav/a0265,
cebidae-20080522-npq/wav/a0267,
Thomas-20080507-iya/wav/a0187,
vince-20071118-tez/wav/b0297,
gilrim-20080120-rzu/wav/rp-10,
vikramjb-20080416-cls/wav/a0398,
vikramjb-20080416-cls/wav/a0403,
vikramjb-20080416-cls/wav/a0404,
vikramjb-20080416-cls/wav/a0405,
vikramjb-20080416-cls/wav/a0406,
guilherme-20080123-pfh/wav/b0150,
knotyouraveragejo-20070620-sci/wav/sci0135,
anonymous-20080425-ojw/wav/b0363,
russellfeeed-20080211-upk/wav/b0025,
russellfeeed-20080211-upk/wav/b0026,
russellfeeed-20080211-upk/wav/b0027,
russellfeeed-20080211-upk/wav/b0028,
russellfeeed-20080211-upk/wav/b0031,
russellfeeed-20080211-upk/wav/b0033,
russellfeeed-20080211-upk/wav/b0034,
kayray-20070527-per07/wav/per0007,
kayray-20070527-per07/wav/per0014,
kayray-20070527-per07/wav/per0057,
kayray-20070527-per07/wav/per0071,
kayray-20070527-per07/wav/per0120,
kayray-20070527-per07/wav/per0141,
kayray-20070527-per07/wav/per0179,
kayray-20070527-per07/wav/per0231,
kayray-20070527-per07/wav/per0319,
kayray-20070527-per07/wav/per0335,
CptOatmeal-20080721-vnh/wav/a0426,
Joel-20080716-qoz/wav/b0074,
Joel-20080716-qoz/wav/b0075,
Joel-20080716-qoz/wav/b0076,
Joel-20080716-qoz/wav/b0077,
Joel-20080716-qoz/wav/b0078,
Joel-20080716-qoz/wav/b0080,
Joel-20080716-qoz/wav/b0081,
Joel-20080716-qoz/wav/b0082,
Joel-20080716-qoz/wav/b0083,
kayray-20070425-per04/wav/per0041,
kayray-20070425-per04/wav/per0073,
kayray-20070425-per04/wav/per0100,
kayray-20070425-per04/wav/per0105,
bloomtom-20080612-vya/wav/rb-31,
GrahamPhillips-20071111-oxp/wav/a0115,
GrahamPhillips-20071111-oxp/wav/a0117,
anonymous-20071127-rln/wav/a0575,
anonymous-20080318-eaq/wav/b0073,
jaiger-20061231-vf7/wav/vf7-25,
starlite-20070614-fur2/wav/fur0136

 

--- (Edited on 9/6/2008 6:07 am [GMT-0500] by nsh) ---

Re: Acoustic model 0.1.2
User: dano
Date: 9/6/2008 6:30 am
Views: 72
Rating: 9

Hi Ken,

all things of scripts and example configurations and doc etc. are all in the file because I just took the ubuntu packages (which are very usable to my opinion.) Maybe I should the files move in the home directory of the package?

--- (Edited on 06-09-2008 1:30 pm [GMT+0200] by dano) ---

Re: Acoustic model 0.1.2
User: dano
Date: 9/6/2008 6:36 am
Views: 66
Rating: 7

Maybe this link is helpful?

http://www.gnu.org/software/autoconf/manual/gettext/Java.html

Unfortunately I have not much Java experience, but this seems not very difficult to me.
 
Daniël

--- (Edited on 06-09-2008 1:36 pm [GMT+0200] by dano) ---

Re: Acoustic model 0.1.2
User: Visitor
Date: 9/6/2008 7:28 am
Views: 76
Rating: 9

I've not a very fast Internet connection so it takes long to download :( so I take some of the recordings.

 

douglaid-20080219:

incorrect prompt lines (the prompt 5 is skipped)

5= 6

6 = 7

until douglaid-20080219/mfc/vf11-16 THE ADDED WEIGHT HAD A VELOCITY OF FIFTEEN MILES PER HOUR (15 and 16 are equal))

 

G-20080425-itf/wav/b0002 a little tap in the beginning

 

xaviergonz-20080419-uje a0398 seems good, record of a0404 begins too late (the p of PERRAULT is not recorded.)

 

ductapeguy-20070308b/wav/bab.0023 seems good.

 

peterwhy-20080503-win/mfc/win0151 seems good, but I think they are two phrases, so he stops a while after lunch.

 

(peterwhy-20080503-win/mfc/win0150 NOR YOU EITHER IF YOU'VE GOT ANY SENSE AT ALL DON'T EVER REFER TO IT AGAIN PLEASE
peterwhy-20080503-win/mfc/win0151 NOW THEN HERE'S OUR BACKWATER AT LAST WHERE WE'RE GOING TO LUNCH LEAVING THE MAIN STREAM
peterwhy-20080503-win/mfc/win0152 THEY NOW PASSED INTO WHAT SEEMED AT FIRST SIGHT LIKE A LITTLE LAND LOCKED LAKE)

 

anonymous-20080204-hnl (sounds like breathing in in the first part)

 

anonymous-20080716 (little tap in sound)

 

anonymous-20080630-lhi (blows in microphone)

 

 

 

--- (Edited on 9/6/2008 7:28 am [GMT-0500] by Visitor) ---

Re: Acoustic model 0.1.2
User: dano
Date: 9/6/2008 9:08 am
Views: 835
Rating: 8

It was me :)

douglaid-20080219 is very serious as 5 6 7 8 9 10 11 12 13 14 15 are wrong.

--- (Edited on 06-09-2008 4:08 pm [GMT+0200] by dano) ---

some additional files.

anonymous-20080630-lhi wav/a0285 blows in microphone

gilrim-20080120-vgs (all) very noisy, but is comprehendable

rjmunro-20080517-winwav/a0236 big tap

Toyo-20080229-ogz.zip very bad: noisy and can not speak English

mjmm-20080526-hca VERY noisy

nestea247-20080301-sbn begins with tap

corno1979-10102006-NR seems good, but isn't it required to have capitals instead of normal sentences? (I don't know, but the other prompts did have.)

Mark_Reynolds-20070531-cc/mfc/cc-27 AND LAID HER ON HER RIGHT SIDE THEN SARAH CONFIRMED THE VET'S DIAGNOSIS instead of

cc-27 AND LAID HER ON HER RIGHT SIDE THEN SARAH CONFIRMED THE VET'S DIAGNOSIS ? all prompts in this file

cebidae-20080522-ns also previous thing, but says 'that' instead of 'last' and the last words are not good spoken.

 

 

 

 

 

 

 

 

 

 

 

--- (Edited on 06-09-2008 10:43 pm [GMT+0200] by dano) ---

--- (Edited on 06-09-2008 11:10 pm [GMT+0200] by dano) ---

Re: Acoustic model 0.1.2
User: nsh
Date: 9/6/2008 2:53 pm
Views: 3049
Rating: 8

Thanks Dano, indeed there is high probability that listed files are broken. The question is what should we do with them - remove, add as fillers, something else.

Training went faster than I expected, I've got a model already, you can download sphinx voxforge model with setup scripts and logs here:

http://www.mediafire.com/?jxy1bkznozb

At least now we have estimation of the model accuracy, on the 1/10 test set with a custom trigram lm trained on the test prompts it has the following quality:

 TOTAL Words: 28112 Correct: 25767 Errors: 3158
TOTAL Percent correct = 91.66% Error = 11.23% Accuracy = 88.77%
TOTAL Insertions: 813 Deletions: 415 Substitutions: 1930

 Not bad, but I suppose we can raise the accuracy to 97% if we'll try to optimize training.

 Here is another list of suspicious prompts:

 douglaid-20080219/wav/vf11-07,
douglaid-20080219/wav/vf11-08,
knotyouraveragejo-20080426-adv/wav/adv0190,
knotyouraveragejo-20080426-adv/wav/adv0308,
kayray-20070611-leo/wav/leo0210,
knotyouraveragejo-20080502-adv/wav/adv0280,
Toyo-20080229-ogz.zip/wav/a0111,
mjmm-20080526-hca/wav/b0074,
mjmm-20080526-hca/wav/b0075,
mjmm-20080526-hca/wav/b0076,
mjmm-20080526-hca/wav/b0078,
mjmm-20080526-hca/wav/b0079,
mjmm-20080526-hca/wav/b0080,
mjmm-20080526-hca/wav/b0081,
mjmm-20080526-hca/wav/b0082,
leonMire-20080526-lev/wav/lev0063,
corno1979-10102006-NR/wav/cc020,
corno1979-10102006-NR/wav/cc029,
Mark_Reynolds-20070531-cc/wav/cc-27,
kayray-20070608-rhi/wav/rhi0094,
safi-20071118-swr/wav/b0216,
starlite-20070605-che/wav/che0142,
kayray-20070611-ele/wav/ele0262,
robertburrelldonkin-200709011-vf11/wav/vf11-26,
KnitGirl-20071113-dil/wav/b0274,
gilrim-20080120-uxi/wav/a0093,
gilrim-20080120-uxi/wav/a0096,
gilrim-20080120-uxi/wav/a0101,
ttm-20071024-poe/wav/js0002,
topherfangio-20080604-jvb/wav/a0105,
ductapeguy-20080423-ang/wav/sto0020,
tis-20080416-tou/wav/voy0155,
knotyouraveragejo-20080525-mt2/wav/mtn0261,
vikramjb-20080416-cls/wav/a0398,
vikramjb-20080416-cls/wav/a0399,
vikramjb-20080416-cls/wav/a0400,
vikramjb-20080416-cls/wav/a0402,
vikramjb-20080416-cls/wav/a0403,
vikramjb-20080416-cls/wav/a0404,
vikramjb-20080416-cls/wav/a0405,
vikramjb-20080416-cls/wav/a0406,
CptOatmeal-20080721-vnh/wav/a0426,
Joel-20080716-qoz/wav/b0074,
Joel-20080716-qoz/wav/b0075,
Joel-20080716-qoz/wav/b0076,
Joel-20080716-qoz/wav/b0077,
Joel-20080716-qoz/wav/b0078,
Joel-20080716-qoz/wav/b0080,
Joel-20080716-qoz/wav/b0081,
Joel-20080716-qoz/wav/b0082,
Joel-20080716-qoz/wav/b0083,
anonymous-20071127-rln/wav/a0575,
anonymous-20080318-eaq/wav/b0073,
anonymous-20080318-eaq/wav/b0078,
anonymous-20080318-eaq/wav/b0079,
jaiger-20061231-vf7/wav/vf7-25,

--- (Edited on 9/6/2008 2:53 pm [GMT-0500] by nsh) ---

Re: Acoustic model 0.1.2
User: kmaclean
Date: 9/9/2008 11:00 am
Views: 83
Rating: 7

Hi nsh & Dano,

Good work guys! 

>The question is what should we do with them - remove, add as fillers,

>something else.

I will look at these (and any others you may have...) and either correct them (if it is just a section of audio that is causing problems) or just move them to "problem" directory in Subversion (and update the master prompts files) so we always have list of the ones we removed.

thanks,

Ken

--- (Edited on 9/9/2008 12:00 pm [GMT-0400] by kmaclean) ---

Re: Acoustic model 0.1.2
User: kmaclean
Date: 9/9/2008 11:07 am
Views: 115
Rating: 7

HI nsh,

>Training went faster than I expected, I've got a model already, you can

>download sphinx voxforge model with setup scripts and logs here:

>http://www.mediafire.com/?jxy1bkznozb

Awesome!

I will add this to the downloads page.

thanks,

Ken

 

 

--- (Edited on 9/9/2008 12:07 pm [GMT-0400] by kmaclean) ---

Re: Acoustic model 0.1.2
User: kmaclean
Date: 9/9/2008 11:13 am
Views: 99
Rating: 7

HI nsh,

>At least now we have estimation of the model accuracy, on the 1/10 test

>set with a custom trigram lm trained on the test prompts it has the

>following quality:

> TOTAL Words: 28112 Correct: 25767 Errors: 3158
>TOTAL Percent correct = 91.66% Error = 11.23% Accuracy = 88.77%
>TOTAL Insertions: 813 Deletions: 415 Substitutions: 1930

> Not bad, but I suppose we can raise the accuracy to 97% if we'll try to

>optimize training.

Do these numbers include the problem prompts too, or did you omit them?  i.e. is all we have to do to get to 97% is remove or correct the offending submission prompts?

thanks,

Ken

 

--- (Edited on 9/9/2008 12:13 pm [GMT-0400] by kmaclean) ---

Re: Acoustic model 0.1.2
User: kmaclean
Date: 9/9/2008 11:35 am
Views: 67
Rating: 6

Hi Daniël,

 

>Maybe this link is helpful?

Yes, thanks,

I am glad to see that GNU gettext po files are implemented using Sun's own Java Internationalization mechanism.

>but this seems not very difficult to me

well... as they say: "the devil is in the details "

Ken

--- (Edited on 9/9/2008 12:35 pm [GMT-0400] by kmaclean) ---

PreviousNext