Italian Language - voxforge.org

Italian

Nested

Italian Language

User: topocheparla
Date: 6/4/2007 7:51 am

Views: 45171
Rating: 52

Hi, i'm interested to help voxforge, but my english is poor...

is there italian section with instructions?

thanks!

i hope voxforge will be a great project

--- (Edited on 6/4/2007 7:51 am [GMT-0500] by topocheparla) ---

Re: Italian Language

User: Robin
Date: 6/5/2007 4:56 am

Views: 595
Rating: 31

Hi,

To answer your question first: no there's no italian section yet. VoxForge was started in the English language and is still quite young.

However, this seems to be a recurring issue (and for good reasons) so I think it wouldn't hurt to talk about how to eventually add other languages. I think it would be great to have a recipe that explains all the requirements for adding other languages to the project.

People whose interest lies more in - for instance - italian (you're not the first one to mention italian!) could then already do some preparatory work. By the time Italian truly get's added some of the work has then already been done!

Things that should be in this recipe for sure would be:

getting a list of words that includes phonetic representations of the words like the VoxForge dictionary.
getting texts that are free from copyright that could serve as reading material.
translations of (parts of) the website.

It all depends on your personal skills to figure out where to start. Some work might already have been done at for instance a university where they do research on phonetics. So it's wise not to start on a word list immediately, but first search for an existing one!

There is also a lot of info on the VoxForge-website (esp. in the dev section).

Obviously officially adding another language is in the end a decision for Ken (the project founder), since it requires a lot of work in the background!

Robin

--- (Edited on 6/5/2007 4:56 am [GMT-0500] by Robin) ---

Re: Italian Language

User: Visitor
Date: 6/17/2007 2:00 pm

Views: 461
Rating: 33

tks!

Grazie! ;)

Re: Italian Language

User: Manuel
Date: 7/22/2007 7:41 am

Views: 409
Rating: 41

Hi, i'm italian too. Reading the tutorial to make my own acoustic model I don't understand how can I create statistical representation of phonemes.

It's clear how to make grammar file, and other tutorial steps, but not how to create the acoustic model.

I would create a simple acoustic model for italian word, it's possible?

I'm a programmer, studying at University of Bologna, and I'm preparing my degree thesis about speech recognition, and I have to make something work on italian world.

Tks

Manuel

Re: Italian Language

User: nsh
Date: 7/22/2007 7:58 am

Views: 494
Rating: 40

Well, start with the simple things. First of all you need a large collection of Italian texts (100 Mb for example). Do you have such a big collection? If yes you can proceed further otherwise you can just concentrate on it.

Re: Italian Language

User: Manuel
Date: 7/22/2007 11:49 am

Views: 405
Rating: 49

Hi, i'm italian too. I'm studying at University of Bologna, and I'm a programmer. I'm interest to make an acoustic model with italian phonemes, but reading tutorial I not understand how to make it.

The problem is how to create a statistical representation of phonemes.

Tks

Re: Italian Language

User: kmaclean
Date: 7/22/2007 3:12 pm

Views: 414
Rating: 43

Hi Manuel,

>I don't understand how can I create statistical representation of phonemes.

The HTK toolkit lets you train your hmm-based phonemes automatically - but you need transcribed speech for this to work.

Steps:

You need to create a phone list for Italian,
then generate a pronunciation dictionary entry for every word in your transcribed speech files, and
then train your (monophone) hmm-based acoustic models.

In English, the steps look like this:

Create a phone list.

(in the VoxForge tutorial, we actually skipped this step because all the required phones are already included in the pronunciation dictionary)

The VoxForge (actually originated from the CMU phone set) is as follows:

        Phoneme Example Translation
        ------- ------- -----------
        AA      odd     AA D
        AE      at      AE T
        AH      hut     HH AH T
        AO      ought   AO T
        AW      cow     K AW
        AY      hide    HH AY D
        B       be      B IY
        CH      cheese  CH IY Z
        D       dee     D IY
        DH      thee    DH IY
        EH      Ed      EH D
        ER      hurt    HH ER T
        EY      ate     EY T
        F       fee     F IY
        G       green   G R IY N
        HH      he      HH IY
        IH      it      IH T
        IY      eat     IY T
        JH      gee     JH IY
        K       key     K IY
        L       lee     L IY
        M       me      M IY
        N       knee    N IY
        NG      ping    P IH NG
        OW      oat     OW T
        OY      toy     T OY
        P       pee     P IY
        R       read    R IY D
        S       sea     S IY
        SH      she     SH IY
        T       tea     T IY
        TH      theta   TH EY T AH
        UH      hood    HH UH D
        UW      two     T UW
        V       vee     V IY
        W       we      W IY
        Y       yield   Y IY L D
        Z       zee     Z IY
        ZH      seizure S IY ZH ER

So you need to create a similar phone list in Italian (the IPA web site can help in this regard, or maybe another speech recognition project in Italian)

Create a pronunciation dictionary

For each word in your training set (i.e. the sentences you used to prompt your users who submitted speech for your speech corpus) you need its pronunciation using phonemes. Here is a portion of the VoxForge pronunciation dictionary:

AARP            [AARP]          ey ey aa r p iy
ABA             [ABA]           ey b iy ey
ABACK           [ABACK]         ax b ae k
ABACUS          [ABACUS]        ae b ax k ax s
ABALON          [ABALON]        ae b ax l aa n
ABALONE         [ABALONE]       ae b ax l ow n iy
ABANDON         [ABANDON]       ax b ae n d ih n
ABANDONED       [ABANDONED]     ax b ae n d ih n d
ABANDONING      [ABANDONING]    ax b ae n d ih n ih ng
ABBREVIATED     [ABBREVIATED]   ax b r iy v iy ey t ih d
ABBREVIATION    [ABBREVIATION]  ax b r iy v iy ey sh ih n
ABBY            [ABBY]          ae b iy
ABC             [ABC]           ey b iy s iy
ABC'S           [ABC'S]         ey b iy s iy z
ABCS            [ABCS]          iy b iy s iy z
ABDOMINALS      [ABDOMINALS]    ae b d aa m ih n ax l z
ABDUCTING       [ABDUCTING]     ae b d ah k t ih ng
ABDUCTION       [ABDUCTION]     ae b d ah k sh ih n

Note that the words are in upper case, the return word is also in upper case and in brackets, and the phones are in lower case.

You need to do the same in Italian, for each word in your training set.

Train your Acoustic Model.

In this context, this means that you use the HTK toolkit to generate statistical representations for each phone, based on the word in your training set. In English, your hmms would look something like this:

~h "b"
<BEGINHMM>
<NUMSTATES> 5
<STATE> 2
<MEAN> 25
 -9.124349e-01 6.825594e+00 4.190366e+00 6.915018e+00 6.278219e+00 6.211351e+00 6.080202e+00 8.280239e-01 7.751886e-01 1.188034e-01 -2.286278e+00 -2.037417e+00 -5.154014e-02 -1.411842e-01 1.359426e-01 7.536004e-02 1.828612e-02 1.083132e-01 8.064213e-02 6.554011e-02 5.534951e-03 -3.300069e-02 -1.040055e-02 1.726186e-01 1.074358e-01
<VARIANCE> 25
 6.946013e+00 9.476726e+00 6.426389e+00 8.900808e+00 8.562872e+00 5.247358e+00 8.789542e+00 9.086433e+00 9.272338e+00 1.021655e+01 8.668521e+00 1.017453e+01 9.018427e-01 1.225605e+00 1.132353e+00 1.225746e+00 1.055387e+00 9.162133e-01 9.871734e-01 1.061771e+00 1.182593e+00 1.325286e+00 1.340984e+00 9.980333e-01 5.850468e-01
<GCONST> 7.204273e+01
<STATE> 3
<MEAN> 25
 1.670979e+00 2.505412e+00 3.361752e+00 2.959995e+00 2.192761e+00 2.234684e+00 4.598285e-01 6.712853e-02 -7.422704e-01 -1.477473e+00 -1.300686e+00 -8.829353e-01 2.932750e+00 -1.085336e+00 1.465379e-01 -1.024826e+00 -9.668781e-01 -2.956798e+00 -3.674928e+00 -6.180806e-01 -1.165014e+00 -1.551422e+00 1.459589e-01 -1.145165e-02 3.425349e+00
<VARIANCE> 25
 2.775954e+01 2.442891e+01 9.882823e+00 2.289949e+01 2.621673e+01 3.309447e+01 4.353169e+01 1.994825e+01 2.369977e+01 2.078222e+01 1.078901e+01 1.184826e+01 1.814732e+00 4.001577e+00 2.052232e+00 3.576971e+00 5.154440e+00 6.247412e+00 4.224275e+00 3.561308e+00 4.634731e+00 1.263823e+00 2.618247e+00 2.138073e+00 1.512457e+00
<GCONST> 9.653378e+01
<STATE> 4
<MEAN> 25
 1.058882e+01 1.385496e+00 8.322063e-01 1.207590e+00 1.215214e+00 -7.297173e+00 -8.178091e+00 2.753822e-01 -3.762378e+00 -6.590958e+00 -1.468036e+00 -2.938320e+00 2.796497e-01 -2.095785e-01 -1.001576e-01 1.865974e-02 -5.384719e-02 -6.179357e-01 -4.035245e-01 4.215330e-02 -2.601456e-01 -1.829550e-01 -2.622822e-02 -2.242988e-01 2.178501e-01
<VARIANCE> 25
 1.652969e+01 4.435868e+01 1.719629e+01 6.380357e+01 7.536614e+01 6.076683e+01 5.961767e+01 3.608961e+01 4.442945e+01 1.993280e+01 4.157676e+01 2.804121e+01 2.284771e+00 2.194077e+00 1.651372e+00 2.075975e+00 2.312554e+00 5.300534e+00 3.836717e+00 2.152288e+00 2.561902e+00 1.781796e+00 2.014969e+00 1.707738e+00 2.076164e+00
<GCONST> 1.004418e+02
<TRANSP> 5
 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
 0.000000e+00 8.082747e-01 1.917253e-01 0.000000e+00 0.000000e+00
 0.000000e+00 0.000000e+00 6.367275e-01 3.632726e-01 0.000000e+00
 0.000000e+00 0.000000e+00 0.000000e+00 7.520868e-01 2.479133e-01
 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
<ENDHMM>
~h "d"
<BEGINHMM>
<NUMSTATES> 5
<STATE> 2
<MEAN> 25
 -1.067137e+00 4.886644e+00 2.682094e+00 6.750027e+00 6.457639e+00 6.229094e+00 5.297256e+00 -2.129066e-01 3.815716e-01 -7.126016e-01 -2.884563e+00 -1.832386e+00 7.712548e-03 -7.304223e-01 2.831668e-01 -3.501370e-01 -7.342540e-01 -2.799944e-01 4.564904e-02 2.276214e-01 1.384630e-01 4.671212e-02 -1.844966e-01 -2.142331e-01 7.479197e-01
<VARIANCE> 25
 8.934610e+00 1.320769e+01 1.053300e+01 1.804137e+01 1.219705e+01 1.129104e+01 1.721161e+01 1.467160e+01 1.430175e+01 1.481090e+01 1.149455e+01 9.083491e+00 1.831863e+00 3.232245e+00 1.539536e+00 1.744226e+00 2.540962e+00 2.710148e+00 2.181852e+00 2.404683e+00 2.769586e+00 1.280586e+00 1.451528e+00 1.790569e+00 3.939657e+00
<GCONST> 8.637133e+01
<STATE> 3
<MEAN> 25
 2.718689e+00 -2.744554e+00 7.256757e-02 1.812361e+00 1.016949e-01 -2.560019e-01 -1.885446e+00 -4.865013e+00 -4.525404e+00 -2.596621e+00 -1.807474e+00 -1.480970e+00 1.222863e+00 -5.446100e-01 5.466800e-01 -1.001800e+00 -7.867664e-01 -1.223161e+00 -2.112964e+00 -1.139215e+00 -1.483523e+00 -8.174815e-01 -1.465670e-01 -4.309444e-01 2.095388e+00
<VARIANCE> 25
 2.393951e+01 3.441933e+01 2.272727e+01 3.303073e+01 2.547261e+01 2.950558e+01 4.406739e+01 4.921661e+01 6.163840e+01 3.156588e+01 1.728885e+01 2.407177e+01 3.362633e+00 5.228514e+00 3.342825e+00 3.542599e+00 4.699482e+00 3.152497e+00 5.631856e+00 5.698840e+00 4.839462e+00 2.097089e+00 1.823990e+00 1.847656e+00 7.886878e+00
<GCONST> 1.042911e+02
<STATE> 4
<MEAN> 25
 3.030438e+00 -2.106693e+00 2.608706e+00 8.074592e-02 8.320825e-01 -8.720042e-01 -4.455779e+00 -3.824380e+00 -3.882696e+00 -1.690570e+00 -1.894887e+00 -2.615440e+00 2.946242e-01 3.876723e-02 4.528299e-01 -6.694716e-01 5.406591e-01 -8.197967e-01 -1.044559e+00 9.537272e-01 -1.756284e-01 -9.122517e-02 9.268219e-01 3.083803e-01 1.007540e+00
<VARIANCE> 25
 4.675860e+01 3.011730e+01 3.514589e+01 5.922066e+01 4.235344e+01 2.218645e+01 5.816761e+01 3.788612e+01 2.974471e+01 1.639678e+01 1.083809e+01 2.301572e+01 3.725070e+00 4.032299e+00 4.137799e+00 4.301898e+00 5.162062e+00 4.180051e+00 9.591230e+00 6.350677e+00 9.550439e+00 4.142642e+00 2.124282e+00 3.202255e+00 4.503259e+00
<GCONST> 1.069599e+02
<TRANSP> 5
 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
 0.000000e+00 6.490452e-01 3.509548e-01 0.000000e+00 0.000000e+00
 0.000000e+00 0.000000e+00 5.191061e-01 4.808940e-01 0.000000e+00
 0.000000e+00 0.000000e+00 0.000000e+00 2.762414e-01 7.237586e-01
 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
<ENDHMM>
[...]

Note that each line starting with "~h" represents the start of a statistical description of a hmm for a particular phone.

You can do this for Italian, and for most other languages.

Summary

to create an Italian acoustic model:

- create an Italian phone set,

- create an Italian pronunciation dictionary for the words in your training set,

- generate acoustic models using the process described in the VoxForge Tutorial.

This will allow you to create monophone acoustic models (up to step 8).

To create tied-state triphone acoustic models, you will need to create 'questions' (see the tree.hed script in step 10). I just used the one included with the HTK toolkit, and am not familiar with creating one for another language.

Hope this helps,

Ken

Re: Italian Language

User: nsh
Date: 7/22/2007 3:26 pm

Views: 383
Rating: 38

And, btw, Italian phoneset and dictionary they are both available from Italian festival project:

http://www.pd.istc.cnr.it/TTS/It-FESTIVAL.htm

of course they are synthesis-oriented, but for beginning it's not a big dial.

Re: Italian Language

User: Manuel
Date: 7/30/2007 8:24 am

Views: 432
Rating: 47

>First of all you need a large collection of Italian texts (100 Mb for example). Do you have such a big collection?

What do you mean exactly for italian texts?

Can I find it on Festival Project's web site?

Thanks

Manuel

Re: Italian Language

User: nsh
Date: 7/30/2007 9:47 am

Views: 430
Rating: 40

Texts are just texts: books, newspapers and so on. In theory they should be free but copyrighted texts are also acceptable. They are required to build language model but it's only required for decoding not for training.

Once you'll have text put them somewhere so I can download them.

To be honest for me it seems easier to train sphinx model than htk one, probably Ken will correct me. So if you'll install sphinx3 and Sphinxtrain I can help you with Italian setup. We have a dictionary and a phoneset. You just need to record small text (say, 200 utterances in wav files). We'll build acoustic model then.

[ «Previous Page | 1 2 3 4 | Next Page» ]

Previous •


Username	Password