Italian

Nested
didn't understand plain ascii problem
User: occimanete
Date: 11/17/2009 2:02 am
Views: 13599
Rating: 12

After months i am still investigating.

How should I do to use èéìíùúàáòó in the pronunciation dictionary or wherever I have to?

Better explained: I have a lexicon from wich i get words and pronunciation in utf8. This because I use the accents wovels aforementioned.  any file i derive from it becamen an utf8. If i use it as a non utf-8, I loose the wovels. I read somewhere in this site I need to encode those in a ASCII sequence of letters. So I don't understan.

lets take as an example an extract of my lexicon:

CAFè       [CAFè]       k a f E1
CAFFè       [CAFFè]       k a f f E1

should i encode somehow the wovel in the word CAFè as something like E1 to get CAFE1 and CAFFE1?

so everywhere i should use these word as spelled with these coding?

I am Really lost.


thank folks.

Michele

PS. this lexicon is based on phonetic list i explained to use in a reply to this post.

Re: didn't understand plain ascii problem
User: nsh
Date: 11/17/2009 2:13 am
Views: 207
Rating: 14

There is no such problem. Phones should be plain ascii. Words could be UTF-8 for example.

> should i encode somehow the wovel in the word CAFè as something like E1 to get CAFE1 and CAFFE1?


No

 

Re: didn't understand plain ascii problem
User: Visitor
Date: 11/17/2009 12:17 pm
Views: 177
Rating: 13

So then, what do you think can be the problem in the following situation:

HDMan -A -D -T 1 -m -w wlist_prz -n monophones1_prz -i -l dlog_prz dict_prz lexicon/onemarket_lexicon 

writes in the 'dlog_prz' log file

No HTK Configuration Parameters Set

Output dictionary dict_prz opened
Source dictionary lexicon/onemarket_lexicon opened 
Dictionary dict_prz created - 198 words processed, 198 missing

the wlist_prz contains the list of word from prompts in uppercase (no accents words are present in there) like in the tutorial. the onemarket_lexicon containt the 500.000K festival dictionary normalized in the lexicon form fot HTK...


I don't really know what's the matter then. I thought i guess they are ordered as i ran the unix utlity sort to be sure. still the same problem. the file are in utf8 but you just said no matter about that for HTK.

any idea?

 

Re: didn't understand plain ascii problem
User: nsh
Date: 11/17/2009 12:45 pm
Views: 195
Rating: 14

Well, you need to learn to provide the data required to reproduce the problem you are asking about. We can't guess what mistake you did on your local machine.

You can always share the files on some sharing service and give us a link.

 

Re: didn't understand plain ascii problem
User: occimanete
Date: 11/17/2009 1:59 pm
Views: 223
Rating: 13

Hi nsh,

this is the wlist_prz

http://www.wikifortio.com/785813/wlist_prz

and here is the
http://www.wikifortio.com/796263/onemarket_lexicon

(it's 23MB+).

if it's useful this is the original prompt
http://www.wikifortio.com/832033/prompts_prz

the command is as before and so is the result.

 

thanks. Oc

Re: didn't understand plain ascii problem
User: nsh
Date: 11/17/2009 5:36 pm
Views: 152
Rating: 12

Your onemarket_lexicon is incorrectly sorted. Use

LANG= LC_ALL= sort onemarket_lexicon > onemarket_lexicon.sorted


to sort it properly

Re: didn't understand plain ascii problem
User: occimanete
Date: 11/18/2009 12:36 am
Views: 147
Rating: 15

Now it passes that step. Great shell insight.

Thank you.

Michele

Re: didn't understand plain ascii problem
User: occimanete
Date: 11/19/2009 4:03 am
Views: 164
Rating: 12

Here I go again one step forward and two backward.

Some of my phones, partially made with numerical values, are truncated with just the alphabetical part, so as an example a and a1 became both a. The dlog for HDMan tells me that the resulting dictionary is completely without any such phone.

 dlog_prz
As by '4.6 Strings and Names' from the HTBook i tried to make quote any phone in the source dictionary (i.e. the lexicon) and tried to pass  the -C HFMan.config option to HDMan, where HDMan.config is:

QUOTEHCAR=\"

 

I tried by quoting onemarket.phones (quoted). Here the wlist_prz and the onemarket.phone.quoted

the final result in dict_prz has no such phones, or better said they are truncated.

Occimanete

Re: didn't understand plain ascii problem
User: nsh
Date: 11/19/2009 5:16 am
Views: 166
Rating: 13

you just need to remove 'rs cmu' from global.ded. read htkbook about that.

Re: didn't understand plain ascii problem
User: occimanete
Date: 11/19/2009 10:32 am
Views: 359
Rating: 13

thanks again,

I had read it but did not undersantd what the book meant for stress making, there was no example of that.

now it works, in fact. Resourceful as usual.

Occimanete

PreviousNext