User: royerfa
Date: 2/27/2008 4:22 am
Views: 4193
Rating: 25

Thanhs a lot for this tutorial,

It is a really good help to start using SRE.

I do the tutorial and in fact I am not really satisfied of the Julian Result. He recognize less than one sentence on four.

Quite bad result no.

I record the sample using audacity at a sample rate of 98000Hz. Maybe it is the cause of my problem, what do do think ?

But I don't forget to change the sampling rate in Jconf.

What shoulld I do to improve the recognition.



Re: Congratulation
User: kmaclean
Date: 2/27/2008 1:58 pm
Views: 268
Rating: 26

Hi Fab,

>I record the sample using audacity at a sample rate of 98000Hz.

I am assuming you meant that you recorded your audio at 96000Hz. 

I think that Julian/Julius only supports audio up to 48kHz, but I can't figure out where I read this... At one time, Julius only supported audio up 16kHz-16bps, but I think with release 3.5 or 4.0 this was increased to 48kHz.

Manuel also had the same problem here, but I don't think we ever really resolved it.

You can downsample your audio - see this page for a script to help you do this, and re-train your acoustic models.

With respect to using higher sampling rates for speech, the following excerpt from SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, By  Daniel Jurafsky and  James H. Martin, second edition draft chapters (I don't think the draft chapters are on-line anymore, however, the book is well worth the price if you are interested in Speech Recognition) is very helpful:

Recall that the ?rst step in processing speech is to convert the analog representations (?rst air pressure, and then analog electric signals in a microphone), into a digital signal. This process of analog-to-digital conversion has two steps: sampling and quantization. A signal is sampled by measuring its amplitude at a particular time; the sampling rate is the number of samples taken per second. In order to accurately measure a wave, it is necessary to have at least two samples in each cycle: one measuring the positive part of the wave and one measuring the negative part.

More than two samples per cycle increases the amplitude accuracy, but less than two samples will cause the frequency of the wave to be completely missed. Thus the maximum frequency wave that can be measured is one whose frequency is half the sample rate (since every cycle needs two samples). This maximum frequency for a given sampling rate is called the Nyquist frequency.

Most information in human speech is in frequencies below 10,000 Hz; thus a 20,000 Hz sampling rate would be necessary for complete accuracy. But telephone speech is ?ltered by the switching network, and only frequencies less than 4,000 Hz are transmitted by telephones. Thus an 8,000 Hz sampling rate is suf?cient for telephone-bandwidth speech like the Switchboard corpus.  A 16,000 Hz sampling rate (sometimes called wideband) is often used for microphone WIDEBAND speech.

Even an 8,000 Hz sampling rate requires 8000 amplitude measurements for each second of speech, and so it is important to store the amplitude measurement ef?ciently. They are usually stored as integers, either 8-bit (values from -128–127) or 16 bit (values from -32768–32767). This process of representing real-valued numbers as integers is called quantization because there is a minimum granularity (the quantum size) and all values which are closer together than this quantum size are represented identically.

Re: Congratulation
User: kmaclean
Date: 2/27/2008 3:06 pm
Views: 199
Rating: 23

Looking at the man for Julius v4.0, you can dynamically downsample your audio with the "-48" parameter in jconfig:

-48    Record  input  with  48kHz sampling, and down-sample it to 16kHz on-the-fly. This option is  valid  for  16kHz  model  only.  The  down-sampling routine was ported from sptk.  (Rev. 4.0)

This seems to imply that Julius only accepts 16kHz AMs, but I am still not sure on this.  More research is required.


Re: Congratulation
User: royerfa
Date: 2/29/2008 4:22 am
Views: 224
Rating: 26

Ok thanks for the advice.

I am going to change the sample rate in the doubt.

When reading the HTK book, I find something that maybe doesn't match with the VoxForge tuto.

You see the config file p30 in the HTKbook. they exlplain that in default ENORMALISE is True, and they say this variable should be set to False to be used with lived audio.


What do you think about this parameter ?


Yes it was 96K and not 98Khz


Re: Congratulation
User: royerfa
Date: 2/29/2008 7:35 am
Views: 211
Rating: 24

In fact, It seems that the frequency sampling doesn't change anythings.

I already have a very poor Recognition.

Witch  recognition rate do I have to attend from this tutorial ?  

I am going to try to follow the next steps in the HTKbook to test my recogniser with the HTK tools.



Re: Congratulation
User: kmaclean
Date: 2/29/2008 11:14 am
Views: 182
Rating: 21

Hi royerfa,

>Witch  recognition rate do I have to attend from this tutorial ? 

see Step 3 - Recording the data - Configuring Audacity Preferences

>I am going to try to follow the next steps in the HTKbook to test my recogniser with the HTK tools.

see Testing Your Acoustic Model with HTK & Julius