Acoustic Model Discussions

Flat
Packing of the audio files
User: nsh
Date: 1/19/2010 5:20 pm
Views: 10321
Rating: 4

Hi Ken

Recently I tried to start retraining of the sphinx models with recent improvements that were made. The hardest step in training is actually preparation of data, putting it into right folders and organizing in proper format.

The first issue I've met is the following: some archives available for download has name as topfolder:

Aaron-20080318-liy

Aaron-20080318-liy/etc

Aaron-20080318-liy/wav


Some others have etc and wav as topfolders directly like

AdrianMcNear-20091016-psv

This creates some trouble for scripts it's better to avoid. What's the best way to fix that, should we just modify the script and repackage everything?

 

 

--- (Edited on 1/20/2010 02:20 [GMT+0300] by nsh) ---

Re: Packing of the audio files
User: kmaclean
Date: 1/20/2010 9:02 pm
Views: 127
Rating: 4

Hi nsh,

>What's the best way to fix that, should we just modify the script and

>repackage everything?

The problem originates with the move from a set of scripts containing a hideous combination of Perl and make commands (to execute Linux Gzip/Tar commands), to a Perl script that only uses the Perl Tar/GZip/Zip packages for creating tar files (revision 2691) on April 19, 2009. 

Therefore anything before April 19, 2009 has the submission name as a root directory (and etc & wav as subdirectories), whereas anything on or after that date has etc and wav as root directories. 

I am assuming that this makes things a bit more complicated if you want to extract a bunch of files all at once in the same directory, so the preferred approach would be to have the submission name as the root directory for all submissions... 

Should not be a big change, but the uploading could take a long time (a few days to a week at a throttled bandwidth so as not to kill response time on the VoxForge webserver, and I'll have to watch my upload bandwidth limits... might have to split it across Jan/Feb).

Please let me know if this makes sense,

thanks,

Ken

--- (Edited on 1/20/2010 10:02 pm [GMT-0500] by kmaclean) ---

--- (Edited on 1/20/2010 10:51 pm [GMT-0500] by kmaclean) ---

Re: Packing of the audio files
User: kmaclean
Date: 1/27/2010 10:53 pm
Views: 121
Rating: 5

As a quick work-around to this issue: use Nautilus to extract the tarfiles that don't have a root directory... Nautilus will create one for you.  You can do a multiple select and extract (right-click) for multiple tarfiles.

I can't figure out a way to do this from the command line using the tar command (i.e. something like "tar -zcf"), so I will fix the ones on the repository server using a script (so they will be consistent), and rsync them with the acoustic model creation server some other time.

Ken

--- (Edited on 1/27/2010 11:53 pm [GMT-0500] by kmaclean) ---

Re: Packing of the audio files
User: nsh
Date: 1/30/2010 8:16 pm
Views: 988
Rating: 5

Thanks Ken

Please update me when it will be done, I need to proceed with training.

 

--- (Edited on 1/31/2010 05:16 [GMT+0300] by nsh) ---

Re: Packing of the audio files
User: kmaclean
Date: 2/2/2010 8:18 pm
Views: 119
Rating: 5

>Please update me when it will be done, I need to proceed with training.

completed for all languages (see Ticket #473)

Ken

--- (Edited on 2/2/2010 9:18 pm [GMT-0500] by kmaclean) ---

Re: Packing of the audio files
User: nsh
Date: 2/6/2010 7:01 pm
Views: 1224
Rating: 5

Ok, here is the next problem. The following files:

 

atterer-01202007-a
atterer-01202007-b
atterer-02052007-vf5
atterer-21012007-vf22
granthulbert-ar-01032007
granthulbert-cc-01032007
granthulbert-rp-01032007
ilopezc-20060321-rainbow
jaiger-20061231-vf7
jaiger-20061231-vf8
jaiger-20070103-vf10
jaiger-20070103-vf9
jaiger-20070209-vf11
jaiger-20070209-vf12
jaiger-20070209-vf13
jaiger-20070209-vf14
jaiger-20070209-vf15
jaiger-vf16-20070214
jaiger-vf17-20070214
jaiger-vf18-20070214
jaiger-vf19-20070220
jaiger-vf20-20070220
jimmowatt-20070308-hoe
kmaclean-12062006
kmaclean-12062006-a
robin-20030302-vf10
robin-20070201
robin-20070211
robin-20070212
robin-20070212-vf1
robin-20070212-vf2
robin-20070217-vf3
robin-20070224-vf4
robin-20070224-vf5
robin-20070224-vf6
robin-20070226-vf7
robin-20070301-vf8
robin-20070301-vf9
robin-20070302-vf11
robin-20070310-vf12
robin-20070310-vf13
robin-20070326-vf14
robin-20070326-vf15
robin-20070330-vf16
robin-20070330-vf17
robin-20070401-vf18
robin-20070401-vf19
robin-20070402-vf20
robin-20070405-vf21
robin-20070409-vf22
robin-20070411-vf23
trevarthan-070403
trevarthan-070403-vf3


Have ../../../Audio... in their PROMPTS file. It would be nice to repack them.

 

--- (Edited on 2/7/2010 04:01 [GMT+0300] by nsh) ---

Re: Packing of the audio files
User: kmaclean
Date: 2/10/2010 2:37 pm
Views: 126
Rating: 7

>Ok, here is the next problem. The following files:

>Have ../../../Audio... in their PROMPTS file. It would be nice to repack

>them.

Fixed.

See ticket 21 for details.

Ken

--- (Edited on 2/10/2010 3:37 pm [GMT-0500] by kmaclean) ---

Re: Packing of the audio files
User: nsh
Date: 2/10/2010 4:44 pm
Views: 1078
Rating: 4

Oh great! Thank you so much


Here is the next problem. The following files have prompts instead of PROMPT

./csawtell-10112006/etc/prompts
./jaiger-10212006-NR/etc/prompts
./jaiger-11052006/etc/prompts
./jaiger-11282006/etc/prompts
./Adminvox-05232006/etc/prompts
./an4/etc/prompts
./crxssi-10112006/etc/prompts
./jaiger-12032006-5/etc/prompts
./kmaclean-06122006/etc/prompts
./kmaclean-06152006/etc/prompts
./Adminvox-05262006/etc/prompts
./jaiger-10212006/etc/prompts
./jaiger-12032006-4/etc/prompts
./cmu_us_bdl_arctic/etc/prompts
./kmaclean-06092006/etc/prompts
./jaiger-12032006-3/etc/prompts
./jaiger-12032006-6/etc/prompts
./cmu_us_jmk_arctic/etc/prompts
./jaiger-10272006/etc/prompts
./cmu_com_kal_ldom/etc/prompts
./cmu_us_rms_arctic/etc/prompts
./cmu_us_ksp_arctic/etc/prompts
./cmu_us_slt_arctic/etc/prompts
./cmu_us_awb_arctic/etc/prompts
./cmu_us_clb_arctic/etc/prompts


For script simplicity and consistency it would be nice to convert them to upper case. I wanted to do it myself but gave up to checkout few gigs from svn.

--- (Edited on 2/11/2010 01:44 [GMT+0300] by nsh) ---

Re: Packing of the audio files
User: nsh
Date: 2/14/2010 5:15 pm
Views: 93
Rating: 5

At least Ken please check svn access setup, because it's hard to commit new model into the svn. I just get 403 Error.

 

--- (Edited on 2/15/2010 02:15 [GMT+0300] by nsh) ---

Re: Packing of the audio files
User: kmaclean
Date: 2/14/2010 9:05 pm
Views: 98
Rating: 4

>At least Ken please check svn access setup, because it's hard to commit

>new model into the svn. I just get 403 Error.

I'm having some problems committing the last changes your requested... it should be fixed soon.

Ken

--- (Edited on 2/14/2010 10:05 pm [GMT-0500] by kmaclean) ---

PreviousNext