download french corpus
User: Marion
Date: 5/6/2009 9:45 am
Views: 19710
Rating: 22


I was looking at the French submited speech data, and I saw that only a part of it was in the Voxforge repository for download. The rest seems to be in the upload directory, which is access restricted, so it is not very easy to recover all the corpus except manually from the download page.

Is there a specific reason for that, or is there a way to get the corpus easily? I saw this post where Ken says:

Unfortunately I have not moved any German audio to subversion.

However, here is quick and dirty way to get the audio:

1.  $wget -r -l2 -A "ralfherzog*" 

this will create a directory called

2. search the directory for *.zip files using Gnome's search tool, and drag the results to the directory you want.

I'm not a wget expert but I don't think it's going to get files which are not in the specified directory. Any help?

Thanks a lot!


Re: download french corpus
User: nsh
Date: 5/6/2009 5:09 pm
Views: 196
Rating: 25

Hello Marion

> Is there a specific reason for that

Ken is a bit busy nowdays :) let's not distract him

I think

wget -r -l2

will just work for you.

Re: download french corpus
User: Marion
Date: 5/7/2009 5:31 am
Views: 386
Rating: 22

It's what I did but as I said before, most of the corpus is in the updload directory, and to access it you need the complete address to each zip file, like, so you can't just do

$wget -r -l3

I found a solution using WinHTTrack but wget should have worked too, it's just that you have to download a lot of stuff and then erase all but the zip files you want.

I just wanted to point out that not all French data is in the repository, but I perfectly understand that you don't have time to process all!

Thanks anyway for the answer and this project, this is great!


Re: download french corpus
User: kmaclean
Date: 5/25/2009 9:24 pm
Views: 451
Rating: 24

Hi Marion,

>I just wanted to point out that not all French data is in the repository,

All the French submissions are now in the repository:


Re: download french corpus
User: samuel buffet
Date: 7/2/2009 11:32 am
Views: 215
Rating: 21

Hi Ken,

ooooops, I wanted to download the French corpus today but I've made a mistake and I've downloaded much more than expected.

I hope not to have been the cause of trouble for your server.


Sorry about that.


Re: download french corpus
User: kmaclean
Date: 7/2/2009 1:06 pm
Views: 1164
Rating: 26

Hi Samuel,

No worries, that is why the VoxForge respository is on a separate server from the website front-end.


there is someone? french contributions
User: zeus
Date: 3/21/2012 5:40 pm
Views: 239
Rating: 23

I am a French amateur roboticist and I would like to include voice recognition in a robot.
I wonder if the project was still open? (before contributions)
I can contribute to several tens of hours and use my network, however, before you can use my network I will need to show concretely that his works a minimum.
If I make a contribution of 10 hours are will it be possible to make a demo compelling enough?
PS sorry I do not speak English very well

Re: there is someone? french contributions
User: kmaclean
Date: 3/22/2012 12:03 am
Views: 221
Rating: 26

>I wonder if the project was still open? (before contributions)

I don't understand what you are asking...

> I will need toshow concretely that his works a minimum.

Please clarify...

If it is easier for you, post in French - Google translate can help me with my rudimentary knowledge of french...

Re: there is someone? french contributions
User: zeus
Date: 3/22/2012 10:22 am
Views: 279
Rating: 22

 Il  n’y a pas beaucoup de messages et pas d’ajout de nouveau fichier depuis l’année dernières.

Je demandais si il y avait encore des personnes active ici, car enregistré des messages audio prend du temps, et je ne veux pas que ce temps serve à personne.


J’aimerais travailler en 2 étapes

étape 1

prononcer des phrases et les afficher dans un fichier texte (même si il y a beaucoup d’erreurs, le but est d’avoir un petit résultats pour motiver des amis à m’aider )

étape 2

je ne sais pas encore comment faire, mais j’aimerais récupérer les phrase prononcé avec des probabilités de mot ou syllabe … afin de pouvoir des phrases cohérentes.


Je pense utiliser pocketsphinx installé sur Rasberry PI (dédié), je ne sais pas du tout, pour le moment, comment utiliser pocketsphinx et comment les modèles acoustiques sont utiliser. J’espère pouvoir faire assez d’enregistrement pour avoir un model acoustique français pour valider l’étape 1 le moi prochain .

There are not many messages and no addition of new file from last year.
I wondered if there were still people working here, as recorded audio messages takes time, and I do not want this time to serve anyone.

I want to work in 2 stages
step 1
pronounce sentences and display them in a text file (although there are many errors, the goal is to have a few results to motivate friends to help me)
step 2
I do not know how, but I'd get the sentence pronounced with probabilities of word or syllable ... so that coherent sentences.

I think using pocketsphinx installed on Rasberry IP (dedicated), I do not know at all, for now, and how pocketsphinx how to use the acoustic models are used. I hope to make enough to have a recording acoustic model to validate the French stage 1 the next me.


Re: there is someone? french contributions
User: kmaclean
Date: 3/22/2012 10:50 pm
Views: 251
Rating: 21

>I wondered if there were still people working here,

yes, there is a backlog in speech processing (basically downsampling the speech to 16kHz-16bit ), but the speech is still being collected.

>I think using pocketsphinx installed on Rasberry IP 

not sure that Rasberry PI can power speech recognition... best to as on the Pocket Sphinx forum