Scripts to train acoustic models using audiobooks from Librivox
to Reuse Speech from Other Open Source/Social Projects
There is more then enough speech on the Internet to create a commercial quality FOSS speech corpus and acoustic models. The problem is that it is a very time-consuming process to convert such speech into a format that can be usable for the creation of acoustic models. Automating our current manual process for segmenting an Audiobook (from LibriVox for example), and applying the same algorithms to other potential sources of speech (audio or video blogs, etc.) would go a long way to improving FOSS speech recognition.
This project is to create a series of scripts to train acoustic models using audiobooks from Librivox.
The high level steps are as follows:
1) Get a list of speakers and number of hours spoken by each speaker.
2) Write the scripts to download all the audio and text
3) Write scripts to clean up the text so that it matches the audio. In the first case this would be removing the Gutenberg preamble and adding the spoken Librivox preamble, and looking at what can be done about chapter headings, etc.
4) Build acoustic and language models using one of the following speech Recognition Engines:
5) Use an "automated transcription script" to highlight any problems with the transcriptions, and if so go back to stage 3 and fix them up.
6) Decide on a sensible split of data between train, eval and test.
7) Make three releases. The first would be the audio and text (in original forms), the second the scripts that performs steps 3-5 above (so that others may improve) and thirdly the acoustic model release.
8) complete Acoustic Model creation scripts for the other Speech Recognition Engines not selected in step 4.