Recently I tried to start retraining of the sphinx models with recent improvements that were made. The hardest step in training is actually preparation of data, putting it into right folders and organizing in proper format.
The first issue I've met is the following: some archives available for download has name as topfolder:
Some others have etc and wav as topfolders directly like
This creates some trouble for scripts it's better to avoid. What's the best way to fix that, should we just modify the script and repackage everything?
--- (Edited on 1/20/2010 02:20 [GMT+0300] by nsh) ---
>What's the best way to fix that, should we just modify the script and
The problem originates with the move from a set of scripts containing a hideous combination of Perl and make commands (to execute Linux Gzip/Tar commands), to a Perl script that only uses the Perl Tar/GZip/Zip packages for creating tar files (revision 2691) on April 19, 2009.
Therefore anything before April 19, 2009 has the submission name as a root directory (and etc & wav as subdirectories), whereas anything on or after that date has etc and wav as root directories.
I am assuming that this makes things a bit more complicated if you want to extract a bunch of files all at once in the same directory, so the preferred approach would be to have the submission name as the root directory for all submissions...
Should not be a big change, but the uploading could take a long time (a few days to a week at a throttled bandwidth so as not to kill response time on the VoxForge webserver, and I'll have to watch my upload bandwidth limits... might have to split it across Jan/Feb).
Please let me know if this makes sense,
--- (Edited on 1/20/2010 10:02 pm [GMT-0500] by kmaclean) ---
--- (Edited on 1/20/2010 10:51 pm [GMT-0500] by kmaclean) ---
As a quick work-around to this issue: use Nautilus to extract the tarfiles that don't have a root directory... Nautilus will create one for you. You can do a multiple select and extract (right-click) for multiple tarfiles.
I can't figure out a way to do this from the command line using the tar command (i.e. something like "tar -zcf"), so I will fix the ones on the repository server using a script (so they will be consistent), and rsync them with the acoustic model creation server some other time.
--- (Edited on 1/27/2010 11:53 pm [GMT-0500] by kmaclean) ---
Please update me when it will be done, I need to proceed with training.
--- (Edited on 1/31/2010 05:16 [GMT+0300] by nsh) ---
Ok, here is the next problem. The following files:
Have ../../../Audio... in their PROMPTS file. It would be nice to repack them.
--- (Edited on 2/7/2010 04:01 [GMT+0300] by nsh) ---
>Ok, here is the next problem. The following files:
>Have ../../../Audio... in their PROMPTS file. It would be nice to repack
See ticket 21 for details.
--- (Edited on 2/10/2010 3:37 pm [GMT-0500] by kmaclean) ---
Oh great! Thank you so much
Here is the next problem. The following files have prompts instead of PROMPT
For script simplicity and consistency it would be nice to convert them to upper case. I wanted to do it myself but gave up to checkout few gigs from svn.
--- (Edited on 2/11/2010 01:44 [GMT+0300] by nsh) ---
At least Ken please check svn access setup, because it's hard to commit new model into the svn. I just get 403 Error.
--- (Edited on 2/15/2010 02:15 [GMT+0300] by nsh) ---
>At least Ken please check svn access setup, because it's hard to commit
>new model into the svn. I just get 403 Error.
I'm having some problems committing the last changes your requested... it should be fixed soon.
--- (Edited on 2/14/2010 10:05 pm [GMT-0500] by kmaclean) ---