Google Summer of Code Ideas Page

Language Models for Dictation Applications
User: kmaclean
Date: 3/5/2007 4:42 pm
Compile a Written Corpus that can be used for the creation of Language Models for Dictation Applications.  Text from the ebooks submitted to Project Gutenberg could be used, or any other text released under an FSF approved license.

Sphinx, Julius and HTK all use ARPA format language models.  Might be able to use the SRILM (SRI Language Modeling Toolkit), the CMU-Cambridge Statistical Language Modeling Toolkit, or the Language Modeling tools included in HTK