Audio and Prompts Discussions

Nested
Re: DVD closed captioning as a source of speech
User: whoneedselta
Date: 2/11/2008 5:55 pm
Views: 271
Rating: 36

Thanks a lot, I was somewhat short on bibliography.

-wnlt 

--- (Edited on 2/11/2008 5:55 pm [GMT-0600] by whoneedselta) ---

--- (Edited on 2/11/2008 5:59 pm [GMT-0600] by whoneedselta) ---

Re: DVD closed captioning as a source of speech
User: kmaclean
Date: 2/14/2008 9:30 pm
Views: 242
Rating: 28

Hi Nick,

Sorry for the delay in getting back to you ...

>a) a place where we can submit free titles (coupled with the url that they are

>hosted - Possible Audio Sources is just that - yes)

OK, I will set you and bilal up on Trac - we can create a separate Wiki page for this too.

(if anyone else needs access to Trac (VoxForgeDev) please let me know - I had mod_security working perfectly at one time to allow anonymous access, but an upgrade in O/S caused problems with this and comment spammers started taking over the site ...)

>b) a place where dvd2data_set, avi_srt2data_set scripts are hosted

I can give you access to SVN to upload these scripts.

>d) a place where already ripped AND authored data sets are uploaded/hosted 

I think we can do this too, as long as there are no Copyright issues. 

Just to make sure I am clear on this ... when you say data set, are you referring to speech corpora (i.e. speech audio files and their associated text transcriptions)?  How big are these data sets?  

>c) a place where we can submit ripped data-sets for community authoring, that is to say

The VoxForge forum system might be useful for this.

>Point c) is covered by the offline tools, but an online community-authoring tool would definetely rock !

That is a long-term goal - i.e. to permit users to edit/fix transcriptions on-line. 

>2) Fix Timing Bugs

One question, why do you need timing data?  With HTK acoustic model training (I have not worked much with Sphinx, but I assume it works the same in this respect), you only need to segment the audio into 10-15 word sentences, and it figures out the phoneme locations automatically.

Ken 

--- (Edited on 2/14/2008 10:30 pm [GMT-0500] by kmaclean) ---

Re: DVD closed captioning as a source of speech
User: whoneedselta
Date: 2/15/2008 1:31 pm
Views: 261
Rating: 20

1)...when you say data set, are you referring to speech corpora (i.e. speech audio files and their associated text transcriptions)?  How big are these data sets? 

*Yes I am referring to speech corpora(audio + transcriptions and maybe + .voc +.language_model)

*Size example (zeitigeist -  http://zeitgeistmovie.com/ - officially available on the net

-> 1.9Gb (Uncompressed 16-bit PCM audio, Stereo, 48000Hz)

.wav storage (in this sample rate and number of channels) is not necessery - we should have a policy on this - most speech databases are Mono-16KHz if I'm not mistaken

-> 1287 utterances

some of them may be compined (due to the fact that they occur in the same sentence - i.e the end-time of the first matches the start-time of the second)

while others may be discarded due to their bad sound condition (loud music, fx or too much-noise)

2)...One question, why do you need timing data?  With HTK acoustic model training (I have not worked much with Sphinx, but I assume it works the same in this respect), you only need to segment the audio into 10-15 word sentences, and it figures out the phoneme locations automatically.

When I was talking about timing-bugs, I was refering to subtitles' timing which in some cases although VISUALLY correct, can benefit from some milliseconds added or substracted to achieve a high-quality utterance.

3)...OK, I will set you and bilal up on Trac - we can create a separate Wiki page for this too.

For account details (username,password) you can contact me at [email protected].

Thank you very much,

Nick 

--- (Edited on 2/15/2008 1:31 pm [GMT-0600] by whoneedselta) ---

--- (Edited on 2/15/2008 1:33 pm [GMT-0600] by whoneedselta) ---

Re: DVD closed captioning as a source of speech
User: kmaclean
Date: 2/20/2008 10:51 pm
Views: 2738
Rating: 25

Hi Nick,

>*Yes I am referring to speech corpora(audio + transcriptions and maybe + .voc +.language_model)

>*Size example (zeitigeist -  http://zeitgeistmovie.com/ - officially available on the net

>-> 1.9Gb (Uncompressed 16-bit PCM audio, Stereo, 48000Hz)

Unfortunately, I am not sure if their license is compatible with GPL ...  from the zeitgeist website:

TERMS: THIS MOVIE IS 'COPYRIGHT- GMP LLC 2008'
HOWEVER, WE ALLOW AND ENCOURAGE IT TO BE DUPLICATED AND GIVEN AWAY FREE.
IT IS NOT FOR RESALE WITHOUT APPROVAL FROM THE CREATOR.
THERE ARE MANY OUT THERE ABUSING THE ALTRUISTIC NATURE OF THIS WORK.
PROFITING FROM THE RESALE OF "ZEITGEIST THE MOVIE" WILL NOT BE TOLERATED.

My understanding is that GPL is not compatible with a license that prevents the sale of the work.  You will likely need to get approval from the author to use the audio in this particular movie in the VoxForge Speech Corpus.  You might tell them that you will be segmenting the audio, and can jumble the segments, making it difficult (though not impossible) to recreate the original.  This is what I did for a recording done by Robert Scott for Mojomove411.

A file this size will likely need to be broken up for uploading via ftp.

>.wav storage (in this sample rate and number of channels) is not necessery

>- we should have a policy on this - most speech databases are

>Mono-16KHz if I'm not mistaken

We've got a subversion directory that stores the audio in its "original" form (up to 48kHz:16bit; though we convert from stereo to mono to save space), and then downsample to mono 8kHz:16bit and 16kHz:16bit per sample for creating acoustic models.  We hope to switch to FLAC format some time in the future.

Ken 

--- (Edited on 2/20/2008 11:51 pm [GMT-0500] by kmaclean) ---

PreviousNext