Re: Your goals: HTK, Sphinx, Julius. My goals: PLS, SSML
>And I am convinced that in the long-term XML related standards are the way
>to go. Maybe not today, but in a few years.
>I would like to submit much more speech samples (prompts) in the English
>and in the German language employing the SSML, even if there isn't any
>demand at the moment.
Please note that SSML is only a markup language for directing what a text-to-speech engine says. I don't think it used as a format for describing transcribed audio submitted for the creation of acoustic models.
>I started to read the HTK book. It is really not easy.
The HTK book is a difficult read... I have only read the first few chapters, and now only use it as a reference - I don't have the math skills to understand all the formulas and how they interact. But if you look at HTK as a "black-box", and only focus on the minimum command set required to compile an acoustic model, then you can do quite a bit with trial and error - which essentially was my approach when I started out... :)
You might be interested in the W3C VoiceXML standard, which essentially merges subsets of the SSML, CCXML, SRGML specifications. This doc: "Voice Browsers, Introduction" provides a good overview of how they all should work together.
The jvoicexml project has implemented a working VoiceXML browser, which essentially provides a VoiceXML dialog manager front-end to Sphinx and Festival. They might provide bindings to Asterisk (IP PBX). Note that jvoicexml uses the JSAPI and JTAPI "standards" to accomplish this.