Click here to register.

General Discussion

svn checkout
User: dandutk
Date: 11/6/2006 12:43 am
Views: 8233
Rating: 12
Where can I svn co the repo I see at

--- (Edited on 11/ 6/2006 12:43 am [GMT-0600] by dandutk) ---

Re: svn checkout
User: kmaclean
Date: 11/6/2006 5:06 pm
Views: 310
Rating: 21

Sorry, I don't have that set up yet.   What are you looking to do?


--- (Edited on 11/ 6/2006 6:06 pm [GMT-0500] by kmaclean) ---

Re: svn checkout
User: dandutk
Date: 11/8/2006 1:25 am
Views: 427
Rating: 31

all right could you soon make a tarball of the scripts and lexicon and put it on the download page in the meantime


--- (Edited on 11/ 8/2006 1:25 am [GMT-0600] by dandutk) ---

Re: svn checkout
User: kmaclean
Date: 11/8/2006 11:52 am
Views: 347
Rating: 31

You can get the lexicon on the Acoustic Model builds page:

[   ] VoxForge_Dictionnary_build726.tgz               

These build scripts aren't really for "general consumption" yet - regardless, I created a new scripts folder on the VoxForge Repository, and created a snapshot of the scripts folder from the dev site:

[   ] VoxForge_Scripts_Snapshot.tgz


P.S. added ticket #115 to make sure the request for SVN access doesn't get lost. 



--- (Edited on 11/ 8/2006 1:24 pm [GMT-0500] by kmaclean) ---

--- (Edited on 12/26/2006 2:59 pm [GMT-0500] by kmaclean) ---

Re: svn checkout
User: kmaclean
Date: 12/26/2006 1:54 pm
Views: 270
Rating: 17
Added a link to a new VoxForge Repository location that is updated nightly with some of the contents of the Subversion Trunk directory (includes the Scripts directory) on the VoxForge Dev page.

--- (Edited on 12/26/2006 2:55 pm [GMT-0500] by kmaclean) ---

Re: svn checkout
User: trevarthan
Date: 4/5/2007 10:45 pm
Views: 339
Rating: 12

Hey Ken,

Have you thought of any possible solutions for the bandwidth issue? I feel handicapped because I don't have easy access to the "source code" (audio in this case + scripts) that you use to generate the AM.

How large is your working copy currently?

Hmmm.... my only useful suggestions are sourceforge and/or bittorrent.

--- (Edited on 2007-04-05 23:45:56 [GMT-0400] by trevarthan) ---

Re: svn checkout
User: kmaclean
Date: 4/6/2007 9:14 am
Views: 360
Rating: 18

Hi Jesse,

I've got a SourceForge account already set up, but it is currently not up to date.  It only includes the scripts and the mfc files (which are much smaller than wav files) - but this is all you need to 'compile' the VoxForge Acoustic Models.  SourceForge is not clear on the maximum storage they allow, so I was thinking that 100Gig in wav audio files (our first release target) would be too big.  I plan to add a nightly update of the scripts and mfc to SourceForge.

Regardless, all the scripts and Audio are now updated nightly (and have been for a while now...) on the VoxForge Repository server (a 1&1 account) - it mimics the directory structure in SVN, but uses tarballs for the audio (the tarballs do not include svn data).   I have loads of bandwidth (2TB/month), so that is the preferred download method.

The Downloads directory ( looks like this:

[DIR] Nightly_Builds/         05-Apr-2007 04:19      -  
[DIR] Tags/ 29-Dec-2006 16:35 -
[DIR] Trunk/ 26-Dec-2006 13:02 -
[DIR] builds/ 16-Oct-2006 11:19 -
[DIR] large_audio_files/ 13-Mar-2007 23:34 -
[DIR] mp3_test/ 09-Mar-2007 00:32 -
[DIR] software/ 29-Mar-2007 13:53 -
[DIR] speech_corpus/ 17-Oct-2006 11:06 -

For Audio, you would be interested in the Trunk/Audio directory (updated nightly from the VoxForge Website SVN server):

[DIR] MFCC/                   13-Dec-2006 22:49      -  
[DIR] Main/ 13-Dec-2006 22:48 -
[DIR] Original/ 13-Dec-2006 22:48 -

For the Scripts, you would be interested in the Truck/Scripts directory (updated nightly from the VoxForge Website SVN server):

[   ] AudioBook_scripts.tgz   05-Apr-2007 04:19   110M  
[   ] Audio_scripts.tgz 05-Apr-2007 04:18 34k
[   ] HTK.tgz 05-Apr-2007 04:18 20.2M
[   ] Metrics_scripts.tgz 01-Feb-2007 05:49 11k
[   ] Mirroring_scripts.tgz 05-Apr-2007 04:18 20k
[   ] Testing_scripts.tgz 09-Mar-2007 03:57 485k

I could also set you up with SVN access to the VoxForge SVN server if you would prefer (it would take  some work to get WebDAV going), but even so, I would prefer than you only checkout scripts and maybe the mfcs ... downloading a big chunk of wav files would slow the server down noticeably for other users  (I have not had a chance to look at bandwidth limitation by user or port yet ...), and I only have 100Gig of monthly bandwidth on this server.  


--- (Edited on 4/ 6/2007 10:14 am [GMT-0400] by kmaclean) ---

Re: svn checkout
User: trevarthan
Date: 4/6/2007 10:30 am
Views: 279
Rating: 20

OK, so since the high bandwidth server isn't under our control we probably can't run SVN or rsync on it, right?

What if we broke the project into two separate svn projects. Source code, scripts, etc go into one project, and audio into the other.

Then we can open up the source code project to developer access without worrying about bandwidth and we can make the audio project downloadable from the high bandwidth server via something like `wget --mirror`? This way we get the following advantages:

1.) Can still get incremental updates of the audio in it's native form without downloading the whole thing all over again (wget runs on win32 also, so it's multiplatform, if a bit out of the mainstream)

2.) We won't bog down voxforge's bandwidth with audio downloads

3.) We can still develop our scripts in a sane SVN environment.

Sounds like a win-win-win to me. We could either provide the audio in a non-gzipped format on the high bandwidth server, or we could continue providing it the way it is now and write a script to automatically unpack it into a usable dir structure.

What do you think?

--- (Edited on 2007-04-06 11:30:05 [GMT-0400] by trevarthan) ---

Re: svn checkout
User: kmaclean
Date: 4/8/2007 9:27 pm
Views: 304
Rating: 14

Public VoxForge Subversion Repository URL is now located here: 

The Corresponding Trac site is now located here:

You can checkout the source code for the Scripts used to generate the VoxForge Acoustic Models using the following command:

    $ svn checkout


  • The VoxForge Speech Corpus must be downloaded from the VoxForge Speech Corpus directory (which is located on the VoxForge Repository server and which is updated nightly as a series of gzipped tarfiles).
  • The VoxForge Lexicon must be downloaded from the Lexicon Directory on the VoxForge Repository server.
  • The Master Prompts file used to generate the VoxForge Acoustic Models is located here.
  • The 'VoxForge/Trunk/AcousticModels' directory is used for development purposes only, and is not accurate or up to date.  For the most current Acoustic Models, see the Nightly Builds Directory.


--- (Edited on 4/ 8/2007 10:27 pm [GMT-0400] by kmaclean) ---

--- (Edited on 4/12/2007 11:31 am [GMT-0400] by kmaclean) ---

--- (Edited on 5/3/2007 2:18 pm [GMT-0400] by kmaclean) ---

Re: svn checkout
User: gongdusheng
Date: 4/10/2007 5:06 am
Views: 315
Rating: 2
Sorry I hammered your server downloading audio files.  I'm on Taiwan time, though, so hopefully not too many people noticed.  I'm trying to use your data with Sphinx4 but the amount of data I downloaded two weeks ago wasn't sufficient (I think).  I'll try with the 1GB of data I just downloaded before trying to get the whole shebang.  If I can cook up a script to generate the acoustic and language models, I'll be sure to contribute it.  Perhaps you are willing to build Sphinx4 models in addition to Julius models?

--- (Edited on 4/10/2007 5:06 am [GMT-0500] by gongdusheng) ---