Hi! "In the 16Khz file the program has changed about 8500 characteres. In the 8Khz fil the program has changed only 6 (?)." - The reason is that I had edited master_prompts_8kHz-16bit, but not master_prompts_16kHz-16bit. Now, both files are in the correct encoding. Greetings, Ralf
I'm happy to be useful :-P
Have you tested the files and downloaded them (in order to delete them from spanish svn). Can I delete them?
Hi Ivan! Yes, you can delete the files. I am new to svn, and I don't know how to delete files from the Voxforge subversion system. Regards, Ralf
In the train/wav subdirectorys I have uploaded the sounds again converting the stereo wav files to mono (using sox -c 1 fichero.wav ficherosal.wav), and changing the prompts files to utf-8 characteres. I have uploaded ubanov*, buhochileno4 and txita1 directorys.
Ken may be you upload the files to the spanish voice repository (in order to be possible to download the files from the Listen option of voxforge).
Another thing, I'm going to include a reference about the encoding in the spanish Read or Listen page (asking the people to use UTF-8 charset).
>Ken may be you upload the files to the spanish voice repository (in order
>to be possible to download the files from the Listen option of voxforge).
>Another thing, I'm going to include a reference about the encoding in the
>spanish Read or Listen page (asking the people to use UTF-8 charset).
>How is it possible to train a speech model when the character encoding is
The use of UTF-8 is really more to get rid of headaches that occur when trying to display international character sets on a web site.
It does not really have much to do with acoustic model training, since Sphinx, Julius/HTK, ... use ASCII internally (which I assume is the reason why the SAMPA computer readable phonetic alphabet was created).