Re: Your goals: HTK, Sphinx, Julius. My goals: PLS, SSML
>I will take a look into the Introduction and Overview of W3C Speech Interface Framework
Good link, I didn't see that particular one...
>I am looking for "a format for describing transcribed audio."
I am not sure there is a any such format on the W3C site for this. The LDC might have something. With XML, one could be created fairly easily. But in VoxForge's case, there would be quite a few scripting changes on the acoustic model creation backend that would required to implement such a thing.
> Perhaps it is VoiceXML, I am not sure at the moment, I will read about it.
VoiceXML is a language to describe spoken dialogs... think of spoken interactive voice response (IVR) systems in a telephone environment (which is what VoiceXML was originally designed for). For example, when I call my ISP, I used to use keypad sequences to get routed to the help desk. Now I call their number, and just say "Internet technical support" on my phone, and get routed to the help desk queue.
A VoiceXML browser "abstracts" away all the differences between the different implementation of:
There have been a few open source implementations of VoiceXML that implemented the text to speech and the telephony components. But most attempts to implement the speech recognition portion failed - because it is very difficult to do. jvoiceXML is amazing since they got the speech rec component working (though I have not tried it out myself). I think using JSAPI was an excellent way to avoid having to work out the details of a particular speech rec or tts engine, but I am not sure of where Sun's JSAPI licensing is currently at.
>I solved one problem, and then the next problem occurred. I stopped
>trying. Maybe I will try it again.
Don't give up yet, if that is what you are interested in. It takes some effort. A bit of understanding of a scripting language is also very helpful.