Using films with subtitles, like someone suggested, is obviously destined to fail. There's too much noise, and the subtitle does not match the actual speech word-for-word.
However, audio books could work. Low noise factor, and everything in the book is read in verbatim. Not to mention the fact that people who talk in audio books have very pleasing voices. Imagine a Stephen Fry TTS...
>audio books could work.
Copyright is the main issue
We have used public domain audio books (like Librivox), there are compression issues, but the main thing is that we need to segment the speech so that it can be used for acoustic model training and the process is not fully automated yet.