Speech Recognition Engines

comparaison of the different recognition device
User: Jean delfort
Date: 4/6/2007 5:43 am
Views: 7119
Rating: 18

Hi guys,

 I'm willing to create a speech recognition device.

Of course i'm only looking at the open source project, but I don't exactely know which one to use yet.

I already know that the 4 mains one are: Sphinx, ISIP, HTK, and Julius..

But wich one should I choose knowing that I need to create a continuous speech recognizer, in english. 

I would really like to have your opinion.

Best regards

--- (Edited on 4/ 6/2007 5:43 am [GMT-0500] by Visitor) ---

Re: comparaison of the different recognition device
User: kmaclean
Date: 4/6/2007 9:35 am
Views: 392
Rating: 37


You might take a look at Arthur Chan's article on "Why there is no Open Source Dictation" on this post,

Note that HTK has licensing restrictions, so you can't distribute the source or binaries of the toolkit.  However, you can distribute Acoustic Models or Language Models generated with the toolkit.  Being a former commercial product, HTK has the best documentation.

Julius has no distribution restrictions, but uses Acoustic Models and Language Models generated using the HTK toolkit.

ISIP has no distribution restrictions, but is not as popular as Sphinx.  They have excellent tutorials on Speech Recognition.

That leaves Sphinx if you want a truly open source solution, from a speech recognition engine and acoustic model creation perspective.  However, based on Arthur Chan's article (and he used to be a Sphinx maintainer),  Sphinx was not really designed with Dictation in mind. 

Only Julius was designed with Dictation in mind.  But Julius is only distributed with Japanese Acoustic and Language Models.  Which brings us to one of the reasons for the creation of the VoxForge web site (see the VoxForge About page for more ...).

Please consider donating some of your speech.



--- (Edited on 4/ 6/2007 10:35 am [GMT-0400] by kmaclean) ---

--- (Edited on 4/ 6/2007 10:49 am [GMT-0400] by kmaclean) ---

Re: comparaison of the different recognition device
User: Jean Delfort
Date: 5/22/2007 11:48 am
Views: 317
Rating: 24

Hi Ken,

Thanks for your clear response.

I've been using sphinx4 for the last weeks. Working pretty well.

I still have one question though.

Do you know where I could find a document explaining the difference beetween the different speech recognition system we were talking about. I'm talking about implementation difference, algorithm's and performance's.

It would be great if you have this kind of information.

 Thank you very much for your time.

Best regards.


--- (Edited on 5/22/2007 11:48 am [GMT-0500] by Visitor) ---

Re: comparaison of the different recognition device
User: kmaclean
Date: 5/30/2007 10:31 am
Views: 2824
Rating: 28

Hi Jean, 

For a recent comparison of HTK (note that Julius uses HTK acoustic models) with Sphinx, see Keith Vertanen's site.  He created acoustic models for Sphinx and HTK using the Wall Street Journal WSJ0 corpus, and gives the results:

You might try running Julius with the HTK models to get an idea as to how Julius might compare with HTK & Sphinx.

For an older comparison of Sphinx and HTK, see this document:



--- (Edited on 5/30/2007 11:31 am [GMT-0400] by kmaclean) ---