Acoustic Model Discussions

Nested
Convert Audio to MP3 and Compare Results with Original Wav
User: tpavelka
Date: 4/7/2009 4:51 am
Views: 7118
Rating: 4

While browsing the VoxForge site I have come across this experiment:

http://www.voxforge.org/home/dev/mp3-compare

The results were kind of surprising, because there is no difference in the files that I could hear, so I figured that it should be the same for MFCC parametrisation (which can be viewed as an extremelly lossy compression and thus should throw away any differences between wav and mp3).

To get some insight I have generated some spectrograms using HCopy's filterbank analysis (which can be viewed as part of the MFCC process) to see if there is any difference. Although they are some visible differences, the most important finding is that the mp3 copression (or maybe the encoding back into wavs, I do not know which one) throws away parts of the recordings, namely the end parts. This part may be up to half a second long and may affect not just the ending silence, but in some cases (com_4311.wav) even the speech. Could this be the reason for the difference in the test?

The spectrograms can be downloaded here, the included Perl scripts may not work under Unix(due to the use of some Windows only commands).

--- (Edited on 4/7/2009 4:51 am [GMT-0500] by tpavelka) ---

Re: Convert Audio to MP3 and Compare Results with Original Wav
User: kmaclean
Date: 4/9/2009 8:53 am
Views: 89
Rating: 3

Hi tpavelka,

Thanks for this analysis!

I would be really great if we could use speech files recorded with lossy audio codecs (like MP3, OGG...) for the creation of acoustic models - Librivox has so much speech like this, it could keep us busy for years...

>Although they are some visible differences, the most important finding is

>that the mp3 copression (or maybe the encoding back into wavs, I do not

>know which one) throws away parts of the recordings, namely the end

>parts. This part may be up to half a second long and may affect not just

>the ending silence, but in some cases (com_4311.wav) even the

>speech. Could this be the reason for the difference in the test?

Could this be more a result of the lame tool cutting silence off at the end of the recording to save space in the resulting mp3?

Ken

--- (Edited on 4/9/2009 9:53 am [GMT-0400] by kmaclean) ---

Re: Convert Audio to MP3 and Compare Results with Original Wav
User: Visitor
Date: 4/9/2009 9:07 am
Views: 2005
Rating: 3

Hi, it's either lame or sox or their respective settings. Unfortunatelly I do not heve either of these installed so I cannot easilly check. For the spectrograms I just downloaded your converted files. It might be the cutting off of silence but apparently it does not work very well, see the file com_4311.wav.

Tomas

--- (Edited on 4/9/2009 9:07 am [GMT-0500] by Visitor) ---

Re: Convert Audio to MP3 and Compare Results with Original Wav
User: Visitor
Date: 4/16/2009 2:19 pm
Views: 62
Rating: 3

Could this process be automated somehow?


I know librivox breaks up files by chapter so if we had a chapter of text and a audio file chapter, is there a way to auto align everything somehow and break it up into sentences?


I would be interested in spending a few days investigating this but am unsure on the best way to start.

--- (Edited on 4/16/2009 2:19 pm [GMT-0500] by Visitor) ---

Re: Convert Audio to MP3 and Compare Results with Original Wav
User: tpavelka
Date: 4/16/2009 2:33 pm
Views: 2155
Rating: 3

Is this what you are looking for?

Automated Audio Segmentation Using Forced Alignment

--- (Edited on 4/16/2009 2:33 pm [GMT-0500] by tpavelka) ---

PreviousNext