Acoustic Model Discussions

Nested
Librivox data
User: Visitor
Date: 10/12/2006 1:28 pm
Views: 10010
Rating: 37

Someone on Slashdot pointed out that LibriVox data might be useful for you.  The full text of public domain books is often available online, which could serve as a transcript.  Segmentation of the audio into smaller chunks might be needed for training. (I'm not sure if it would be; I'm used to training with speech files separated into sentences but I don't know if that's necessary.) But if so, maybe an automated forced alignment against the text could be used to do that.

 

--- (Edited on 10/12/2006 1:28 pm [GMT-0500] by Visitor) ---

Re: Librivox data
User: kmaclean
Date: 10/12/2006 9:03 pm
Views: 477
Rating: 31

Thanks for the reference!

Librivox is definitely another source of audio for us.  We've been looking at other ways to get speech audio into the project and have created an ever expanding list of links in the VoxForge Dev Wiki.  I've added Librivox.

The work in this case, as you pointed out, would be in the segmenting of the audio data.  I've tried a large audio file with time stamped prompts in HTK and the processing time seemed much too long.  HTK was much more effecient with smaller speech files, with no time stamps in the prompts.  I have not tried either approach with Sphinx.

There could be two approaches to doing this, the first would be to create how-tos for people, who don't want to submit their voice, to segment audio books.  The second, which you have already mentioned, would be an automated segmentation script looking for pauses in the speech file, and creating a line of prompts corresponding to the contents of the segmented speech audio file.

Another concern would be the quality of the recordings - we need uncompressed audio, which is why WAV was chosen as format.  It may be that we need to talk to Librivox, and others, about getting their users to submit uncompressed audio to VoxForge and compressed audio to their own sites.

lots to think about, thanks again,

Ken 



--- (Edited on 10/12/2006 10:03 pm [GMT-0400] by kmaclean) ---

Re: Librivox data
User: Robin
Date: 5/2/2007 6:36 am
Views: 407
Rating: 23

Hi Ken,

 

I don't know how far you are in automating the segmentation of librivox recordings, but I think that perhaps it's useful anyway to get back to the people from librivox anyway. As I see it, it's a waste if someone records a chapter of a book, uploads an mp3 it to librivox and then disposes of the original recordings without even being aware of the possibility of donating this data to voxforge as well.

 

The people at Librivox seemed very friendly and cooperative, so that shouldn't be the problem. However thread in the forum that deals with this matter probably doesn't attract that many readers anymore.

 

Since we need hundreds of hours (eventually) for really good acoustic models, I think we could benefit greatly from a more structural approach.

 

That would in my humble opinion only require a tiny message on for example this page:

http://librivox.org/wiki/moin.cgi/HowToSendYourRecording

 

 

Something like:

"If you need to delete your recordings after uploading them to Librivox, consider donating your recording in an uncompressed form to the VoxForge speech recognition project too. See there site for more info."

 

With a link to a page on voxforge.org specially set up for Librivox users. 

 

What do you think?

 

Cheers,

 

Robin 

--- (Edited on 5/2/2007 6:36 am [GMT-0500] by Robin) ---

Re: Librivox data
User: kmaclean
Date: 5/2/2007 8:34 am
Views: 331
Rating: 23

Hi Robin,

Excellent idea! 

The Librivox community is amazingly supportive.  I agree that the newsgroup post no longer has the required exposure. 

I'll email Hugh McGuire and see if they are OK with such an addition to their "HowToSendYourRecording" page,

thanks,

Ken 

--- (Edited on 5/2/2007 9:34 am [GMT-0400] by kmaclean) ---

Re: Librivox data
User: kmaclean
Date: 5/2/2007 1:58 pm
Views: 302
Rating: 19

Hugh said I should put it to the librivox Community ... here is the link:

    Adding VoxForge link to "HowToSendYourRecording"

Ken 

 

--- (Edited on 5/2/2007 2:58 pm [GMT-0400] by kmaclean) ---

Re: Librivox data
User: kmaclean
Date: 5/3/2007 10:23 am
Views: 359
Rating: 20

We now have a link on the LibriVox site! ... here it is:

http://librivox.org/wiki/moin.cgi/HowToSendYourRecording

Thanks to Robin for this idea. 

If anyone else has any suggestions or ideas for promoting the VoxForge site, please let me know.

Thanks,

Ken

--- (Edited on 5/3/2007 11:23 am [GMT-0400] by kmaclean) ---

Re: Librivox data
User: kmaclean
Date: 5/3/2007 2:58 pm
Views: 2779
Rating: 30

Hi Robin 

>I don't know how far you are in automating the segmentation of librivox recordings,

Actually, we are doing pretty good on this front - most of it is automated (see this web page for details: Automated Audio Segmentation Using Forced Alignment). 

The last piece to automate is to create pronunciations for words that are not in the VoxForge pronunciation dictionary (i.e. out of vocabulary words).   Tony Robinson helped out in this regard (see this post), and I was able to find this script to do it: t2p: Text-to-Phoneme Converter Builder

Once I test this out, and create a script for the whole process, we will be in pretty good shape for mass conversions of LibriVox audio books (uncompressed or compressed).

Ken 

 

--- (Edited on 5/3/2007 3:58 pm [GMT-0400] by kmaclean) ---

Previous