Click here to register.

Acoustic Model Discussions

Re: Filler words in transcripts
User: ercani
Date: 9/4/2009 4:55 pm
Views: 127
Rating: 2

Hi, thanks for your reply.

I am using linux mint, but now I am in windows mode. I have recorded those transcript files under windows notepad with Unicode, then I opened them in linux under vi. Then I saw that end of every line, there is a ^M  chracter. I have erased that chracter under vi in linux. Then I saved as unicode. Transcripts looks normal now, but I got those errors.

I will save as utf-8 then try again. Then let you know about it.

Thanks for your help

--- (Edited on 9/4/2009 4:55 pm [GMT-0500] by ercani) ---

Turkish dictionary myam.dic
User: ralfherzog
Date: 9/5/2009 12:10 pm
Views: 158
Rating: 1

Hi ercani! Yes, try to save the dictionary as UTF-8. By the way, I have just imported the Turkish dictionary myam.dic into simon.

--- (Edited on 2009-09-05 12:10 pm [GMT-0500] by ralfherzog) ---

Re: Turkish dictionary myam.dic
User: ercani
Date: 9/7/2009 11:32 am
Views: 88
Rating: 2

Hi Ralfherzog.

Thanks for your email.

I apalogize that I have erased those files from rapid link because of those files have copyright and commercial. But people can get the main idea of building turkish dictionaries when they read this discussion.

I hope you understand my situation.

Best regards,


--- (Edited on 9/7/2009 11:32 am [GMT-0500] by ercani) ---

Re: Turkish dictionary myam.dic
User: ralfherzog
Date: 9/7/2009 12:27 pm
Views: 358
Rating: 1

Hi ercani,

Thanks for the info. It is good that you have deleted those files.



--- (Edited on 2009-09-07 12:27 pm [GMT-0500] by ralfherzog) ---

Re: Turkish dictionary myam.dic
User: ercani
Date: 10/6/2009 1:06 pm
Views: 123
Rating: 2


I got some error messages after I fix utf-8 coding and run :

I built the acoustic files by running scripts/ -ctl

Then scripts/

At the end, we got this error:


Training for 3 Gaussian(s) completed after 5 iterations
MODULE: 90 deleted interpolation
Skipped for continuous models
MODULE: 99 Convert to Sphinx2 format models
Can not create models used by Sphinx-II.
If you intend to create models to use with Sphinx-II models, please
rerun with:
$ST::CFG_HMM_TYPE = '.semi.' or
$ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd'

Then I decided to try to run decode, inspite of the above "cannot create
models used by Sphinx-II".

I ran ../pocketsphinx/ -task myam -langmod ../language_model

Then we ran ./scripts_pl/ -ctl etc/myam_test.fileids

This just made the feats once again, just like it did during the first command
above. I just used the same file as etc/myam_train.fileids, except renamed it
to etc/myam_test.fileids

Then I ran the preliminary decode:

And I got this:

MODULE: DECODE Decoding using models previously trained
Decoding 512 segments starting at 0 (part 1 of 1)
0% FATAL_ERROR: "batch.c", line 461: PocketSphinx decoder init

This step had 3 ERROR messages and 1 WARNING messages. Please
check the log file for details.
Failed to start /home/kapil/Work/ercan/myam/bin/pocketsphinx_batch
Aligning results to find error rate
Can't open
Can't open
/home/kapil/Work/ercan/myam/etc/myam_test.transcription for


I uploaded the task folder here:

pls let me know how I can fix it ?




--- (Edited on 10/6/2009 1:06 pm [GMT-0500] by ercani) ---

Re: Turkish dictionary myam.dic
User: nsh
Date: 10/6/2009 5:33 pm
Views: 90
Rating: 2

The following error in the decode log:

ERROR: "dict.c", line 556: '=PLCHLDR0=': Unknown phone 'SIL'
ERROR: "dict.c", line 243: Failed to add DUMMY(SIL) entry to dictionary

means that your phoneset is in lowercase. To solve this issue add the following line to etc/sphinx_decode.cfg

$DEC_CFG_EXTRA_ARGS = "-dictcase yes";

--- (Edited on 10/7/2009 02:33 [GMT+0400] by nsh) ---

Re: Turkish dictionary myam.dic
User: ercani
Date: 10/8/2009 9:36 am
Views: 1656
Rating: 1


I still have problem in decoding with 101.5% wer. I think the problem is realated to using ps on a 64bit pc with 32bit os. I will try it on a P4 machine with 32bit os.

thanks and regards

--- (Edited on 10/8/2009 9:36 am [GMT-0500] by ercani) ---