Italian Language
User: topocheparla
Date: 6/4/2007 7:51 am
Views: 40113
Rating: 50

Hi, i'm interested to help voxforge, but my english is poor...

is there italian section  with instructions?


i hope voxforge will be a great project

--- (Edited on 6/4/2007 7:51 am [GMT-0500] by topocheparla) ---

Re: Italian Language
User: Robin
Date: 6/5/2007 4:56 am
Views: 590
Rating: 29



To answer your question first: no there's no italian section yet.  VoxForge was started in the English language and is still quite young.


However, this seems to be a recurring issue (and for good reasons) so I think it wouldn't hurt to talk about  how to eventually add other languages. I think it would be great to have a recipe that explains all the requirements for adding other languages to the project.


People whose interest lies more in - for instance - italian (you're not the first one to mention italian!) could then already do some preparatory work. By the time Italian truly get's added some of the work has then already been done!


Things that should be in this recipe for sure would be:

  • getting a list of words that includes phonetic representations of the words like the VoxForge dictionary.
  • getting texts that are free from copyright that could serve as reading material.
  • translations of (parts of) the website.

It all depends on your personal skills to figure out where to start. Some work might already have been done at for instance a university where they do research on phonetics. So it's wise not to start on a word list immediately, but first search for an existing one!


There is also a lot of info on the VoxForge-website (esp. in the dev section).


Obviously officially adding another language is in the  end a decision for Ken (the project founder), since it requires a lot of work in the background!



--- (Edited on 6/5/2007 4:56 am [GMT-0500] by Robin) ---

Re: Italian Language
User: Visitor
Date: 6/17/2007 2:00 pm
Views: 457
Rating: 32


Grazie! ;) 

Re: Italian Language
User: Manuel
Date: 7/22/2007 7:41 am
Views: 407
Rating: 40

Hi, i'm italian too. Reading the tutorial to make my own acoustic model I don't understand how can I create statistical representation of phonemes.

It's clear how to make grammar file, and other tutorial steps, but not how to create the acoustic model.

I would create a simple acoustic model for italian word, it's possible?

I'm a programmer, studying at University of Bologna, and I'm preparing my degree thesis about speech recognition, and I have to make something work on italian world. 



Re: Italian Language
User: nsh
Date: 7/22/2007 7:58 am
Views: 490
Rating: 38
Well, start with the simple things. First of all you need a large collection of Italian texts (100 Mb for example). Do you have such a big collection? If yes you can proceed further otherwise you can just concentrate on it.
Re: Italian Language
User: Manuel
Date: 7/22/2007 11:49 am
Views: 402
Rating: 48

Hi, i'm italian too. I'm studying at University of Bologna, and I'm a programmer. I'm interest to make an acoustic model with italian phonemes, but reading tutorial I not understand how to make it.

The problem is how to create a statistical representation of phonemes.




Re: Italian Language
User: kmaclean
Date: 7/22/2007 3:12 pm
Views: 414
Rating: 43

Hi Manuel,

>I don't understand how can I create statistical representation of phonemes.

The HTK toolkit lets you train your hmm-based phonemes automatically - but you need transcribed speech for this to work. 


  • You need to create a phone list for Italian,
  • then generate a pronunciation dictionary entry for every word in your transcribed speech files, and
  • then train your (monophone) hmm-based acoustic models.

In English, the steps look like this:

Create a phone list. 

(in the VoxForge tutorial, we actually skipped this step because all the required phones are already included in the pronunciation dictionary) 
The VoxForge (actually originated from the CMU phone set) is as follows:
        Phoneme Example Translation
------- ------- -----------
AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY
CH cheese CH IY Z
D dee D IY
DH thee DH IY
ER hurt HH ER T
EY ate EY T
F fee F IY
G green G R IY N
IH it IH T
IY eat IY T
JH gee JH IY
K key K IY
L lee L IY
M me M IY
N knee N IY
NG ping P IH NG
OW oat OW T
OY toy T OY
P pee P IY
R read R IY D
S sea S IY
SH she SH IY
T tea T IY
TH theta TH EY T AH
UH hood HH UH D
UW two T UW
V vee V IY
W we W IY
Y yield Y IY L D
Z zee Z IY
ZH seizure S IY ZH ER

So you need to create a similar phone list in Italian (the IPA web site can help in this regard, or maybe another speech recognition project in Italian)

Create a pronunciation dictionary

For each word in your training set (i.e. the sentences you used to prompt your users who submitted speech for your speech corpus) you need its pronunciation using phonemes.  Here is a portion of the VoxForge pronunciation dictionary:

AARP            [AARP]          ey ey aa r p iy
ABA [ABA] ey b iy ey
ABACK [ABACK] ax b ae k
ABACUS [ABACUS] ae b ax k ax s
ABALON [ABALON] ae b ax l aa n
ABALONE [ABALONE] ae b ax l ow n iy
ABANDON [ABANDON] ax b ae n d ih n
ABANDONED [ABANDONED] ax b ae n d ih n d
ABANDONING [ABANDONING] ax b ae n d ih n ih ng
ABBREVIATED [ABBREVIATED] ax b r iy v iy ey t ih d
ABBREVIATION [ABBREVIATION] ax b r iy v iy ey sh ih n
ABBY [ABBY] ae b iy
ABC [ABC] ey b iy s iy
ABC'S [ABC'S] ey b iy s iy z
ABCS [ABCS] iy b iy s iy z
ABDOMINALS [ABDOMINALS] ae b d aa m ih n ax l z
ABDUCTING [ABDUCTING] ae b d ah k t ih ng
ABDUCTION [ABDUCTION] ae b d ah k sh ih n

Note that the words are in upper case, the return word is also in upper case and in brackets, and the phones are in lower case.

You need to do the same in Italian, for each word in your training set.

Train your Acoustic Model. 

In this context, this means that you use the HTK toolkit to generate statistical representations for each phone, based on the word in your training set.  In English, your hmms would look something like this:
~h "b"
<MEAN> 25
-9.124349e-01 6.825594e+00 4.190366e+00 6.915018e+00 6.278219e+00 6.211351e+00 6.080202e+00 8.280239e-01 7.751886e-01 1.188034e-01 -2.286278e+00 -2.037417e+00 -5.154014e-02 -1.411842e-01 1.359426e-01 7.536004e-02 1.828612e-02 1.083132e-01 8.064213e-02 6.554011e-02 5.534951e-03 -3.300069e-02 -1.040055e-02 1.726186e-01 1.074358e-01
6.946013e+00 9.476726e+00 6.426389e+00 8.900808e+00 8.562872e+00 5.247358e+00 8.789542e+00 9.086433e+00 9.272338e+00 1.021655e+01 8.668521e+00 1.017453e+01 9.018427e-01 1.225605e+00 1.132353e+00 1.225746e+00 1.055387e+00 9.162133e-01 9.871734e-01 1.061771e+00 1.182593e+00 1.325286e+00 1.340984e+00 9.980333e-01 5.850468e-01
<GCONST> 7.204273e+01
<MEAN> 25
1.670979e+00 2.505412e+00 3.361752e+00 2.959995e+00 2.192761e+00 2.234684e+00 4.598285e-01 6.712853e-02 -7.422704e-01 -1.477473e+00 -1.300686e+00 -8.829353e-01 2.932750e+00 -1.085336e+00 1.465379e-01 -1.024826e+00 -9.668781e-01 -2.956798e+00 -3.674928e+00 -6.180806e-01 -1.165014e+00 -1.551422e+00 1.459589e-01 -1.145165e-02 3.425349e+00
2.775954e+01 2.442891e+01 9.882823e+00 2.289949e+01 2.621673e+01 3.309447e+01 4.353169e+01 1.994825e+01 2.369977e+01 2.078222e+01 1.078901e+01 1.184826e+01 1.814732e+00 4.001577e+00 2.052232e+00 3.576971e+00 5.154440e+00 6.247412e+00 4.224275e+00 3.561308e+00 4.634731e+00 1.263823e+00 2.618247e+00 2.138073e+00 1.512457e+00
<GCONST> 9.653378e+01
<MEAN> 25
1.058882e+01 1.385496e+00 8.322063e-01 1.207590e+00 1.215214e+00 -7.297173e+00 -8.178091e+00 2.753822e-01 -3.762378e+00 -6.590958e+00 -1.468036e+00 -2.938320e+00 2.796497e-01 -2.095785e-01 -1.001576e-01 1.865974e-02 -5.384719e-02 -6.179357e-01 -4.035245e-01 4.215330e-02 -2.601456e-01 -1.829550e-01 -2.622822e-02 -2.242988e-01 2.178501e-01
1.652969e+01 4.435868e+01 1.719629e+01 6.380357e+01 7.536614e+01 6.076683e+01 5.961767e+01 3.608961e+01 4.442945e+01 1.993280e+01 4.157676e+01 2.804121e+01 2.284771e+00 2.194077e+00 1.651372e+00 2.075975e+00 2.312554e+00 5.300534e+00 3.836717e+00 2.152288e+00 2.561902e+00 1.781796e+00 2.014969e+00 1.707738e+00 2.076164e+00
<GCONST> 1.004418e+02
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
0.000000e+00 8.082747e-01 1.917253e-01 0.000000e+00 0.000000e+00
0.000000e+00 0.000000e+00 6.367275e-01 3.632726e-01 0.000000e+00
0.000000e+00 0.000000e+00 0.000000e+00 7.520868e-01 2.479133e-01
0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
~h "d"
<MEAN> 25
-1.067137e+00 4.886644e+00 2.682094e+00 6.750027e+00 6.457639e+00 6.229094e+00 5.297256e+00 -2.129066e-01 3.815716e-01 -7.126016e-01 -2.884563e+00 -1.832386e+00 7.712548e-03 -7.304223e-01 2.831668e-01 -3.501370e-01 -7.342540e-01 -2.799944e-01 4.564904e-02 2.276214e-01 1.384630e-01 4.671212e-02 -1.844966e-01 -2.142331e-01 7.479197e-01
8.934610e+00 1.320769e+01 1.053300e+01 1.804137e+01 1.219705e+01 1.129104e+01 1.721161e+01 1.467160e+01 1.430175e+01 1.481090e+01 1.149455e+01 9.083491e+00 1.831863e+00 3.232245e+00 1.539536e+00 1.744226e+00 2.540962e+00 2.710148e+00 2.181852e+00 2.404683e+00 2.769586e+00 1.280586e+00 1.451528e+00 1.790569e+00 3.939657e+00
<GCONST> 8.637133e+01
<MEAN> 25
2.718689e+00 -2.744554e+00 7.256757e-02 1.812361e+00 1.016949e-01 -2.560019e-01 -1.885446e+00 -4.865013e+00 -4.525404e+00 -2.596621e+00 -1.807474e+00 -1.480970e+00 1.222863e+00 -5.446100e-01 5.466800e-01 -1.001800e+00 -7.867664e-01 -1.223161e+00 -2.112964e+00 -1.139215e+00 -1.483523e+00 -8.174815e-01 -1.465670e-01 -4.309444e-01 2.095388e+00
2.393951e+01 3.441933e+01 2.272727e+01 3.303073e+01 2.547261e+01 2.950558e+01 4.406739e+01 4.921661e+01 6.163840e+01 3.156588e+01 1.728885e+01 2.407177e+01 3.362633e+00 5.228514e+00 3.342825e+00 3.542599e+00 4.699482e+00 3.152497e+00 5.631856e+00 5.698840e+00 4.839462e+00 2.097089e+00 1.823990e+00 1.847656e+00 7.886878e+00
<GCONST> 1.042911e+02
<MEAN> 25
3.030438e+00 -2.106693e+00 2.608706e+00 8.074592e-02 8.320825e-01 -8.720042e-01 -4.455779e+00 -3.824380e+00 -3.882696e+00 -1.690570e+00 -1.894887e+00 -2.615440e+00 2.946242e-01 3.876723e-02 4.528299e-01 -6.694716e-01 5.406591e-01 -8.197967e-01 -1.044559e+00 9.537272e-01 -1.756284e-01 -9.122517e-02 9.268219e-01 3.083803e-01 1.007540e+00
4.675860e+01 3.011730e+01 3.514589e+01 5.922066e+01 4.235344e+01 2.218645e+01 5.816761e+01 3.788612e+01 2.974471e+01 1.639678e+01 1.083809e+01 2.301572e+01 3.725070e+00 4.032299e+00 4.137799e+00 4.301898e+00 5.162062e+00 4.180051e+00 9.591230e+00 6.350677e+00 9.550439e+00 4.142642e+00 2.124282e+00 3.202255e+00 4.503259e+00
<GCONST> 1.069599e+02
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
0.000000e+00 6.490452e-01 3.509548e-01 0.000000e+00 0.000000e+00
0.000000e+00 0.000000e+00 5.191061e-01 4.808940e-01 0.000000e+00
0.000000e+00 0.000000e+00 0.000000e+00 2.762414e-01 7.237586e-01
0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00

Note  that each line starting with "~h" represents the start of a statistical description of a hmm for a particular phone.

You can do this for Italian, and for most other languages.



to create an Italian acoustic model:

- create an Italian phone set,
- create an Italian pronunciation dictionary for the words in your training set,
- generate acoustic models using the process described in the VoxForge Tutorial.

This will allow you to create monophone acoustic models (up to step 8). 

To create tied-state triphone acoustic models, you will need to create 'questions' (see the tree.hed script in step 10).  I just used the one included with the HTK toolkit, and am not familiar with creating one for another language.

Hope this helps,



Re: Italian Language
User: nsh
Date: 7/22/2007 3:26 pm
Views: 383
Rating: 38

And, btw, Italian phoneset and dictionary they are both available from Italian festival project:

 of course they are synthesis-oriented, but for beginning it's not a big dial.


Re: Italian Language
User: Manuel
Date: 7/30/2007 8:24 am
Views: 430
Rating: 47

>First of all you need a large collection of Italian texts (100 Mb for example). Do you have such a big collection?

What do you mean exactly for italian texts?

Can I find it on Festival Project's web site?



Re: Italian Language
User: nsh
Date: 7/30/2007 9:47 am
Views: 424
Rating: 38

Texts are just texts: books, newspapers and so on. In theory they should be free but copyrighted texts are also acceptable. They are required to build language model but it's only required for decoding not for training.

Once you'll have text put them somewhere so I can download them.

To be honest for me it seems easier to train sphinx model than htk one, probably Ken will correct me. So if you'll install sphinx3 and Sphinxtrain I can help you with Italian setup. We have a dictionary and a phoneset. You just need to record small text (say, 200 utterances in wav files). We'll build acoustic model then.