Speech Recognition Engines

Flat
Comparing open source speech recognition engines
User: kmaclean
Date: 8/28/2007 9:44 am
Views: 10820
Rating: 23
Here is an email thread I had with Anna:
 
Hello Ken,

My name is Anna and I'm a Spanish computer sciences student.
I'm performing a project in order to finish my studies. It is based in testing several opensource image and voice recognition programs.

I'm trying to test Julius an Julian.
Searching for acoustic models I found the project voxforge but I would like to know which grammar file or language model do you use in order to test with the acoustic models.

Thank you very much in advance,

--
Anna

--- (Edited on 8/28/2007 10:44 am [GMT-0400] by kmaclean) ---

Re: Comparing open source speech recognition engines
User: kmaclean
Date: 8/28/2007 9:44 am
Views: 285
Rating: 22
Hi Anna,

The VoxForge QuickStart contains sample grammar files to get Julius working with the VoxForge Acoustic Models.

Will you be publishing your results?

thanks,

Ken

--- (Edited on 8/28/2007 10:44 am [GMT-0400] by kmaclean) ---

Re: Comparing open source speech recognition engines
User: kmaclean
Date: 8/28/2007 9:45 am
Views: 339
Rating: 18
Hi Ken!

Thank you very much for your response!
I've been trying with the sample grammar that you mentioned but I still have problems.
Please forgive my ignorance!
I tested with your last acoustic model submission  (orca-20070706) using the hmmdefs, dictionary and grammar included in the QuickStart but I receive an error, you can see the output in the attached file. I don't know what I'm doing wrongly.

About publishing the results I hope do it! I will publish all the documentation once it will be finished. In principle the documentation will be in Catalan but I can translate the Julius/Julian part to English. I will let you know.

A final question, do you know if exists another sample more extensive?

Thank you a lot!!

--- (Edited on 8/28/2007 10:45 am [GMT-0400] by kmaclean) ---

Re: Comparing open source speech recognition engines
User: kmaclean
Date: 8/28/2007 9:45 am
Views: 321
Rating: 21
Hi Anna,

I am not sure I understand why you are trying to execute orca-20070706's submission using the Quickstart ... orca-20070706 is not an acoustic model.  It is just a set of speech audio files and transcriptions!  Users submit audio to the VoxForge site the we then incorporate into the VoxForge acoustic model.

are you using it as a source of English speech to be recognized by Julius using the VoxForge acoustic model?

  • Julius is for dictation applications.  If you want to use Julius, you need a language model (which VoxForge has not created yet). 
  • If you want Julian (grammar-based recognition) to recognize the orca-20070706 submission, you will need to create a grammar using the words in that submission.
I guess I am confused as to what you are trying to do.

--- (Edited on 8/28/2007 10:45 am [GMT-0400] by kmaclean) ---

Re: Comparing open source speech recognition engines
User: kmaclean
Date: 8/28/2007 9:45 am
Views: 302
Rating: 28

Hi Ken,

Sorry for the misunderstanding.
A premise for my study is that I should use only tools non created by me, for this reason I'm searching for a created language model, but I see that it will not be possible.

I found a language model in this website for CMU Sphinx in ARPA format but I don't know if it could be used for julius or Julian and how.

I want to use Julius or Julian in order to recognise some speech audio files, not from dictation and make a profiling during its execution.
I could use the QuickStart but I would like to use an audio file as a input, not by microphone, it is possible?

Thank you again,

Anna 

--- (Edited on 8/28/2007 10:45 am [GMT-0400] by kmaclean) ---

Re: Comparing open source speech recognition engines
User: kmaclean
Date: 8/28/2007 9:46 am
Views: 361
Rating: 28
Hi Anna,

>I'm searching for a created language model,
The VoxForgeDevWiki has information on where to access other acoustic models and language models for HTK/Julius and Sphinx.  Keith Vertanen's site contains lots of AMs and LMs and testing information that might be useful to you.

>I could use the QuickStart but I would like to use an audio file as a input, not by microphone, it is possible?
yes - read the Julian manual, but as I said, you need to create a grammar file with all the words from the speech recording.  This would not be a good test if you want to compare with Sphinx or other SRs.

I would like to post this thread on the VoxForge website to help others with similar questions.  Please let me know if this would be OK.

thanks,

Ken

--- (Edited on 8/28/2007 10:46 am [GMT-0400] by kmaclean) ---

Re: Comparing open source speech recognition engines
User: kmaclean
Date: 8/28/2007 9:46 am
Views: 297
Rating: 24

Thank you for all the information, Ken.

I will try with these links and I will let you know.
About publishing the thread I think it's a great idea! Please feel free to do it!

Best regards and thanks for your help!

Anna 

--- (Edited on 8/28/2007 10:46 am [GMT-0400] by kmaclean) ---

Re: Comparing open source speech recognition engines
User: kmaclean
Date: 9/19/2007 7:15 pm
Views: 376
Rating: 18

This paper might be of interest to those trying to compare Sphinx and HTK:

A Comparison of Public-Domain Software Tools for Speech Recognition (2003)

K. Samudravijaya and Maria Barot

School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai, India

HTK and Sphinx are two freely downloadable software packages with the capability of implementing a large vocabulary, speaker independent, continuous speech recognition system in any language. While HTK has been in use by various groups for about a decade, and has gone through the refinement cycles necessary for a commercial software, Sphinx was released about a year ago and is still undergoing development in a university environment. However, due to certain advanced features and the license for unrestricted use, Sphinx appears to be more attractive. These two software packages have been compared by implementing a Hindi speech recognition system. Although recognition accuracies of the two systems are comparable, we observe that the acoustic modeling of Sphinx is superior.

(my emphasis added).

Ken 

--- (Edited on 9/19/2007 8:15 pm [GMT-0400] by kmaclean) ---

Re: Comparing open source speech recognition engines
User: nsh
Date: 9/19/2007 11:06 pm
Views: 836
Rating: 25

It probably was wow effect back in 2003 :) It's very hard to build good acoutic model in HTK due to complicated process and bad defaults but things like discriminative training (MMI/MPE), distributed training and complicated topologies as well as many many more features make HTK far more superior thank sphinx. Sphinx is easy to start with and drive but HTK can be really perfect only if used properly.

--- (Edited on 9/19/2007 11:06 pm [GMT-0500] by nsh) ---

Re: Comparing open source speech recognition engines
User: carlosdfresh
Date: 5/19/2010 2:44 pm
Views: 105
Rating: 4

Hi nsh!

I looking for a brief tutorial or step-by-step procedure to do discriminative training  using MMI, MPE(MWE), MCE, etc; withing the HTK tutorial/demo in order to explain and understand more datailed that techniques.

 

thans a lot!

--- (Edited on 5/19/2010 2:44 pm [GMT-0500] by carlosdfresh) ---

PreviousNext