General Discussion

Flat
Corpora downloading
User: giuliopaci
Date: 10/29/2010 9:03 am
Views: 6908
Rating: 15

Hi to all!

I'm interested in getting the audio (and transcripts) files for several languages. What's the preferred method to obtain them?

I'd like to have subversion read-only access if possible, so that I can perform updates of a working directory. Is it possible? Is there any similar alternative?

Cheers, Giulio.

Re: Corpora downloading
User: kmaclean
Date: 10/29/2010 8:09 pm
Views: 220
Rating: 14

>What's the preferred method to obtain them?

 

http://www.repository.voxforge1.org/downloads/

>I'd like to have subversion read-only access if possible,

the VoxForge repository is the best place to get the audio - since it has no bandwidth limits

Re: Corpora downloading
User: giuliopaci
Date: 11/2/2010 6:07 am
Views: 180
Rating: 14

 

>>What's the preferred method to obtain them?

>http://www.repository.voxforge1.org/downloads/

>>I'd like to have subversion read-only access if possible,

>the VoxForge repository is the best place to get the audio - since it has no bandwidth limits

So, is it fine to run something like the following command-line periodically?

wget -I downloads -Nkr -l inf http://www.repository.voxforge1.org/downloads/

Do you know a better option to mimic "svn update" (This command line will always download 7 html files for each directory, even if there's no change in the repository)?

Re: Corpora downloading
User: kmaclean
Date: 11/2/2010 7:41 pm
Views: 160
Rating: 16

>wget -I downloads -Nkr -l inf http://www.repository.voxforge1.org/downloads/

The -N parm is the most important since it only downloads updates after you make the initial download.  

Why do you want to download all languages?

 

Re: Corpora downloading
User: giuliopaci
Date: 11/3/2010 4:03 am
Views: 2916
Rating: 16

I don't want to download all languages. At least not now: I think I will use similar command-lines to download a few languages.

I'm mostly interested in male/female classification and speech/non-speech classification, so I'm thinking about downloading a few hours of audio samples for each language.

I would also like to investigate about automatic language recognition, but I don't think I will find time for this.

PreviousNext