Re: Italian Language
User: nsh
Date: 8/1/2007 11:11 am
Views: 373
Rating: 30

Hm, what version are you using exactly? I prefer latest one from nightly build or from svn checkout.

Where did you get that, it's not a part of archive. Actually model testing is not so easy right now partially because of unsufficient data partially because there is no language model yet since we have no text collection.

It's possible to test acoustic model with finite state grammar actually but it's much better to build language model first.


Re: Italian Language
User: nsh
Date: 8/1/2007 11:58 am
Views: 413
Rating: 29

Btw, I've found a site where you can download texts:

Re: Italian Language
User: nsh
Date: 8/2/2007 4:09 pm
Views: 373
Rating: 31

Well, I've updated scripts with a language model and updated prompts for full phoneset. Now you can test recognition, it even has non-zero WER but of course quality is _very_ poor.

New files are available at:

Probably there is sense to commit them to voxforge and start audio update. We can create the similar bootstrap structure for spanish, french, welsh, dutch, polish and czech btw. 

Re: Italian Language
User: kmaclean
Date: 8/3/2007 4:22 pm
Views: 431
Rating: 30

hi nsh,

thanks, this is great!

I will look at how we should set up the infrastructure for Italian and the other languages when I get back from vacation (around Aug 11).


Re: Italian Language
User: kmaclean
Date: 8/15/2007 8:10 pm
Views: 576
Rating: 28

Hi nsh,

I've created a dev site for Italian at:

I've also uploaded the Sphinx Acoustic Model creation scripts to the svn trunk for this repository.

If you create the same for the other languages you mentioned, I'll create repositories for the remaining languages you mentioned: spanish, french, welsh, polish and czech.  Dutch and English already have repositories created, but whenever you have a chance, Sphinx AM creation scripts for those too would be greatly appreciated.


Ken not found
User: SunFish7
Date: 9/9/2007 11:47 pm
Views: 376
Rating: 26

Did you sort this out eventually?

 I also don't have it in my install, and I can't find any online download for SphinxTrain other than the nightly build at sourceforge.

 I'm stuck.

 Please can someone throw me a line?  I really want to get this up and running.



[email protected]

Re: not found
User: nsh
Date: 9/10/2007 12:03 am
Views: 401
Rating: 29

Once again, you need sphinxtrain nightly build, sphinx3 nightly build, svn checkout of italian tree from voxforge.  I suppose you miss sphinx3 one.

Re: Italian Language
User: kmaclean
Date: 7/22/2007 3:12 pm
Views: 414
Rating: 43

Hi Manuel,

>I don't understand how can I create statistical representation of phonemes.

The HTK toolkit lets you train your hmm-based phonemes automatically - but you need transcribed speech for this to work. 


  • You need to create a phone list for Italian,
  • then generate a pronunciation dictionary entry for every word in your transcribed speech files, and
  • then train your (monophone) hmm-based acoustic models.

In English, the steps look like this:

Create a phone list. 

(in the VoxForge tutorial, we actually skipped this step because all the required phones are already included in the pronunciation dictionary) 
The VoxForge (actually originated from the CMU phone set) is as follows:
        Phoneme Example Translation
------- ------- -----------
AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY
CH cheese CH IY Z
D dee D IY
DH thee DH IY
ER hurt HH ER T
EY ate EY T
F fee F IY
G green G R IY N
IH it IH T
IY eat IY T
JH gee JH IY
K key K IY
L lee L IY
M me M IY
N knee N IY
NG ping P IH NG
OW oat OW T
OY toy T OY
P pee P IY
R read R IY D
S sea S IY
SH she SH IY
T tea T IY
TH theta TH EY T AH
UH hood HH UH D
UW two T UW
V vee V IY
W we W IY
Y yield Y IY L D
Z zee Z IY
ZH seizure S IY ZH ER

So you need to create a similar phone list in Italian (the IPA web site can help in this regard, or maybe another speech recognition project in Italian)

Create a pronunciation dictionary

For each word in your training set (i.e. the sentences you used to prompt your users who submitted speech for your speech corpus) you need its pronunciation using phonemes.  Here is a portion of the VoxForge pronunciation dictionary:

AARP            [AARP]          ey ey aa r p iy
ABA [ABA] ey b iy ey
ABACK [ABACK] ax b ae k
ABACUS [ABACUS] ae b ax k ax s
ABALON [ABALON] ae b ax l aa n
ABALONE [ABALONE] ae b ax l ow n iy
ABANDON [ABANDON] ax b ae n d ih n
ABANDONED [ABANDONED] ax b ae n d ih n d
ABANDONING [ABANDONING] ax b ae n d ih n ih ng
ABBREVIATED [ABBREVIATED] ax b r iy v iy ey t ih d
ABBREVIATION [ABBREVIATION] ax b r iy v iy ey sh ih n
ABBY [ABBY] ae b iy
ABC [ABC] ey b iy s iy
ABC'S [ABC'S] ey b iy s iy z
ABCS [ABCS] iy b iy s iy z
ABDOMINALS [ABDOMINALS] ae b d aa m ih n ax l z
ABDUCTING [ABDUCTING] ae b d ah k t ih ng
ABDUCTION [ABDUCTION] ae b d ah k sh ih n

Note that the words are in upper case, the return word is also in upper case and in brackets, and the phones are in lower case.

You need to do the same in Italian, for each word in your training set.

Train your Acoustic Model. 

In this context, this means that you use the HTK toolkit to generate statistical representations for each phone, based on the word in your training set.  In English, your hmms would look something like this:
~h "b"
<MEAN> 25
-9.124349e-01 6.825594e+00 4.190366e+00 6.915018e+00 6.278219e+00 6.211351e+00 6.080202e+00 8.280239e-01 7.751886e-01 1.188034e-01 -2.286278e+00 -2.037417e+00 -5.154014e-02 -1.411842e-01 1.359426e-01 7.536004e-02 1.828612e-02 1.083132e-01 8.064213e-02 6.554011e-02 5.534951e-03 -3.300069e-02 -1.040055e-02 1.726186e-01 1.074358e-01
6.946013e+00 9.476726e+00 6.426389e+00 8.900808e+00 8.562872e+00 5.247358e+00 8.789542e+00 9.086433e+00 9.272338e+00 1.021655e+01 8.668521e+00 1.017453e+01 9.018427e-01 1.225605e+00 1.132353e+00 1.225746e+00 1.055387e+00 9.162133e-01 9.871734e-01 1.061771e+00 1.182593e+00 1.325286e+00 1.340984e+00 9.980333e-01 5.850468e-01
<GCONST> 7.204273e+01
<MEAN> 25
1.670979e+00 2.505412e+00 3.361752e+00 2.959995e+00 2.192761e+00 2.234684e+00 4.598285e-01 6.712853e-02 -7.422704e-01 -1.477473e+00 -1.300686e+00 -8.829353e-01 2.932750e+00 -1.085336e+00 1.465379e-01 -1.024826e+00 -9.668781e-01 -2.956798e+00 -3.674928e+00 -6.180806e-01 -1.165014e+00 -1.551422e+00 1.459589e-01 -1.145165e-02 3.425349e+00
2.775954e+01 2.442891e+01 9.882823e+00 2.289949e+01 2.621673e+01 3.309447e+01 4.353169e+01 1.994825e+01 2.369977e+01 2.078222e+01 1.078901e+01 1.184826e+01 1.814732e+00 4.001577e+00 2.052232e+00 3.576971e+00 5.154440e+00 6.247412e+00 4.224275e+00 3.561308e+00 4.634731e+00 1.263823e+00 2.618247e+00 2.138073e+00 1.512457e+00
<GCONST> 9.653378e+01
<MEAN> 25
1.058882e+01 1.385496e+00 8.322063e-01 1.207590e+00 1.215214e+00 -7.297173e+00 -8.178091e+00 2.753822e-01 -3.762378e+00 -6.590958e+00 -1.468036e+00 -2.938320e+00 2.796497e-01 -2.095785e-01 -1.001576e-01 1.865974e-02 -5.384719e-02 -6.179357e-01 -4.035245e-01 4.215330e-02 -2.601456e-01 -1.829550e-01 -2.622822e-02 -2.242988e-01 2.178501e-01
1.652969e+01 4.435868e+01 1.719629e+01 6.380357e+01 7.536614e+01 6.076683e+01 5.961767e+01 3.608961e+01 4.442945e+01 1.993280e+01 4.157676e+01 2.804121e+01 2.284771e+00 2.194077e+00 1.651372e+00 2.075975e+00 2.312554e+00 5.300534e+00 3.836717e+00 2.152288e+00 2.561902e+00 1.781796e+00 2.014969e+00 1.707738e+00 2.076164e+00
<GCONST> 1.004418e+02
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
0.000000e+00 8.082747e-01 1.917253e-01 0.000000e+00 0.000000e+00
0.000000e+00 0.000000e+00 6.367275e-01 3.632726e-01 0.000000e+00
0.000000e+00 0.000000e+00 0.000000e+00 7.520868e-01 2.479133e-01
0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
~h "d"
<MEAN> 25
-1.067137e+00 4.886644e+00 2.682094e+00 6.750027e+00 6.457639e+00 6.229094e+00 5.297256e+00 -2.129066e-01 3.815716e-01 -7.126016e-01 -2.884563e+00 -1.832386e+00 7.712548e-03 -7.304223e-01 2.831668e-01 -3.501370e-01 -7.342540e-01 -2.799944e-01 4.564904e-02 2.276214e-01 1.384630e-01 4.671212e-02 -1.844966e-01 -2.142331e-01 7.479197e-01
8.934610e+00 1.320769e+01 1.053300e+01 1.804137e+01 1.219705e+01 1.129104e+01 1.721161e+01 1.467160e+01 1.430175e+01 1.481090e+01 1.149455e+01 9.083491e+00 1.831863e+00 3.232245e+00 1.539536e+00 1.744226e+00 2.540962e+00 2.710148e+00 2.181852e+00 2.404683e+00 2.769586e+00 1.280586e+00 1.451528e+00 1.790569e+00 3.939657e+00
<GCONST> 8.637133e+01
<MEAN> 25
2.718689e+00 -2.744554e+00 7.256757e-02 1.812361e+00 1.016949e-01 -2.560019e-01 -1.885446e+00 -4.865013e+00 -4.525404e+00 -2.596621e+00 -1.807474e+00 -1.480970e+00 1.222863e+00 -5.446100e-01 5.466800e-01 -1.001800e+00 -7.867664e-01 -1.223161e+00 -2.112964e+00 -1.139215e+00 -1.483523e+00 -8.174815e-01 -1.465670e-01 -4.309444e-01 2.095388e+00
2.393951e+01 3.441933e+01 2.272727e+01 3.303073e+01 2.547261e+01 2.950558e+01 4.406739e+01 4.921661e+01 6.163840e+01 3.156588e+01 1.728885e+01 2.407177e+01 3.362633e+00 5.228514e+00 3.342825e+00 3.542599e+00 4.699482e+00 3.152497e+00 5.631856e+00 5.698840e+00 4.839462e+00 2.097089e+00 1.823990e+00 1.847656e+00 7.886878e+00
<GCONST> 1.042911e+02
<MEAN> 25
3.030438e+00 -2.106693e+00 2.608706e+00 8.074592e-02 8.320825e-01 -8.720042e-01 -4.455779e+00 -3.824380e+00 -3.882696e+00 -1.690570e+00 -1.894887e+00 -2.615440e+00 2.946242e-01 3.876723e-02 4.528299e-01 -6.694716e-01 5.406591e-01 -8.197967e-01 -1.044559e+00 9.537272e-01 -1.756284e-01 -9.122517e-02 9.268219e-01 3.083803e-01 1.007540e+00
4.675860e+01 3.011730e+01 3.514589e+01 5.922066e+01 4.235344e+01 2.218645e+01 5.816761e+01 3.788612e+01 2.974471e+01 1.639678e+01 1.083809e+01 2.301572e+01 3.725070e+00 4.032299e+00 4.137799e+00 4.301898e+00 5.162062e+00 4.180051e+00 9.591230e+00 6.350677e+00 9.550439e+00 4.142642e+00 2.124282e+00 3.202255e+00 4.503259e+00
<GCONST> 1.069599e+02
0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
0.000000e+00 6.490452e-01 3.509548e-01 0.000000e+00 0.000000e+00
0.000000e+00 0.000000e+00 5.191061e-01 4.808940e-01 0.000000e+00
0.000000e+00 0.000000e+00 0.000000e+00 2.762414e-01 7.237586e-01
0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00

Note  that each line starting with "~h" represents the start of a statistical description of a hmm for a particular phone.

You can do this for Italian, and for most other languages.



to create an Italian acoustic model:

- create an Italian phone set,
- create an Italian pronunciation dictionary for the words in your training set,
- generate acoustic models using the process described in the VoxForge Tutorial.

This will allow you to create monophone acoustic models (up to step 8). 

To create tied-state triphone acoustic models, you will need to create 'questions' (see the tree.hed script in step 10).  I just used the one included with the HTK toolkit, and am not familiar with creating one for another language.

Hope this helps,



Re: Italian Language
User: nsh
Date: 7/22/2007 3:26 pm
Views: 383
Rating: 38

And, btw, Italian phoneset and dictionary they are both available from Italian festival project:

 of course they are synthesis-oriented, but for beginning it's not a big dial.


Re: Italian Language
User: Manuel
Date: 9/9/2007 8:39 am
Views: 386
Rating: 26

I've not found any phoneme list. 

I dowloaded Festival, but where can I found the list of phoneme for the italian language?