General Discussion

Flat
Real Time Speech Recognition
User: davidare1d
Date: 12/20/2016 1:46 pm
Views: 4784
Rating: 0

Hello,

I have been working on creating a Voice Recognition system, using the on-line instructions.  After working up to step 2, my boss and I were discussing the VoxForge Software and came up with a very important question for our project.

Does VoxForge, Julius actually do Real Time Voice Dictation?  Or is this creating a text or document file, from the speech dictated?  And then putting this file into a location on the system, so that it can be uploaded or transferred somewhere else?

For our project we need real time dictation, such as Dragon Speak would do.  However, we do not want to look at paying for software unless we absolutely have to.

Any feedback would be appreciated.

--- (Edited on 12/20/2016 1:46 pm [GMT-0600] by davidare1d) ---

Re: Real Time Speech Recognition
User: TonyR
Date: 12/20/2016 2:59 pm
Views: 22
Rating: 0

There are two issues to tease apart here.

* VoxForge is primarily data.  As a secondary role it explains how to use the data and the HTK/Julius recipie is one way you can do this.

* Julius is not VoxForge, it is a long running software project that went dormant for a while but is now active again.   Julus can do real time (phrase batched) speech recognition fairly efficiently.

So, in terms of your project you need to:

* understand VoxForge legally (especially the stated intention behind the licence terms - if you don't agree with Ken's views then please don't use use VoxForge - he's the one who put all the effort in)

* understand the VoxForge data from a scientific viewpoint- it may be just what you want or it may not be

* understand HTK - it's stable as in nothing more is really every going to change (which is good and bad, if you want the latest and greatest ASR don't use HTK)

* understand Julius - I did when I was considering it commercially (about 4 years ago) but I didn't managed to explain to the author that there are design faults and bugs to be fixed which do limit the performance

In summary, if you have:

hours:  go commercial

days:  get pocketSphinx to run

weeks:  get a first version from VoxForge/Julius and then consider what comes next

months:  get to know Kaldi and expect it to take years

 

Sorry I don't contribute as much as I used to - times move on - expect an interesting announcement in the new year.

For those of us taking a holiday - have a good one.

 

Tony

-- 

Dr Tony Robinson
CTO Speechmatics

--- (Edited on 20-December-2016 8:59 pm [GMT+0000] by TonyR) ---

Re: Real Time Speech Recognition
User: colbec
Date: 12/21/2016 7:47 am
Views: 2224
Rating: 0

Yes and no. Julius does fast speech recognition and so you can just keep talking into the mike and Julius will keep returning the best result. But there won't be any bells and whistles.

First you need to take on board the difference between a grammar approach and an n-gram language model approach. The grammar will be better if you have strict expectations of valid sentences, and the language model will be better at picking out unexpected combinations of words.

Second, when you buy into a dictation package you get a clever responsive system which responds correctly to "period", "new paragraph" or "find word smith". Julius won't do that. You will have to build your own capabilities into a dialog manager.

As Dr. Tony says, time is money. If you want to spend the time, stick around here and report your experiments and we will try to help.

--- (Edited on 2016-12-21 8:47 am [GMT-0500] by colbec) ---

PreviousNext