Spanish

Nested
needed: conversion script PLS/IPA to HTK/ASCII
User: ralfherzog
Date: 12/1/2008 12:04 pm
Views: 404
Rating: 16

Hello ubanov!

Thanks for your offer to help. I think that I have found a solution to the encoding problem (thanks to nsh) with the following commands in the Ubuntu terminal:

ubuntu@ubuntu-desktop:~$ svn checkout http://[email protected]/svn/de/Trunk/Prompts

cd Prompts

gedit master_prompts_8kHz-16bit

svn commit

I am just doing a simple search & replace with gedit. There is no need to write a script. After searching the corrupt characters and replacing them with the valid special characters (ä,ö,ü,ß), it is just neccessary to save the file with the character encoding UTF-8. The results are shown in the German timeline.

Well, I have a similar problem. How is it possible to convert the german PLS/IPA pronounciation dictionary into HTK/ASCII format? If you want, you could help me with the conversion. My thoughts are: Using XSLT/XPath. Or someone could write a C++ script. Or a Perl script. Or search & replace with gedit. At the moment, I don't have the neccesary programming skills. But I am trying to find a solution. I would appreciate any help.

Obviously, you know how to write a C++ script. If you want to help, you are welcome.

It is very comfortable to create the german pronounciation dictionary using the IPA. But in the end, we need just ASCII. And for the conversion, a script would be fine.

Such a script could be useful for other languages, too, of course. Is there a Spanish IPA dictionary available that is licensed under the GPL? If yes, you could use that script for your own language (I assume that your mother language is Spanish).

Regards, Ralf

Re: needed: conversion script PLS/IPA to HTK/ASCII
User: ubanov
Date: 12/1/2008 5:35 pm
Views: 333
Rating: 16

Hi,

At the end I finished the filter program (searching information about the characters in google), and I have executed the program against german master prompts files. As I can't upload anything to german svn directory, the resulting files are in the following directory: "svn checkout http://www.dev.voxforge.org/svn/es/Trunk/german".

In the 16Khz file the program has changed about 8500 characteres. In the 8Khz fil the program has changed only 6 (?).

Download the files and review them. When you have download the files tell me to delete those files. If you can use them well, if not, just delete them. :-)

Regards.

 

German master_prompts_16kHz-16bit
User: ralfherzog
Date: 12/4/2008 8:00 am
Views: 110
Rating: 13

Hi! "In the 16Khz file the program has changed about 8500 characteres. In the 8Khz fil the program has changed only 6 (?)." -  The reason is that I had edited master_prompts_8kHz-16bit, but not master_prompts_16kHz-16bit. Now, both files are in the correct encoding. Greetings, Ralf

Re: German master_prompts_16kHz-16bit
User: ubanov
Date: 12/4/2008 1:28 pm
Views: 106
Rating: 14

I'm happy to be useful :-P

Have you tested the files and downloaded them (in order to delete them from spanish svn). Can I delete them?

Regards,

    Ivan

Re: German master_prompts_16kHz-16bit
User: ralfherzog
Date: 12/4/2008 5:41 pm
Views: 330
Rating: 14

Hi Ivan! Yes, you can delete the files. I am new to svn, and I don't know how to delete files from the Voxforge subversion system. Regards, Ralf

Re: Audio in mono format and notes about encoding
User: kmaclean
Date: 12/8/2008 12:12 pm
Views: 2080
Rating: 12

Hi Ivan,

>Ken may be you upload the files to the spanish voice repository (in order

>to be possible to download the files from the Listen option of voxforge).

done

>Another thing, I'm going to include a reference about the encoding in the

>spanish Read or Listen page (asking the people to use UTF-8 charset).

thanks!

Ken

PreviousNext