Click here to register.

General Discussion

Speech Recognition without Pronunciation Dictionary
User: kmaclean
Date: 5/26/2015 11:24 am
Views: 3091
Rating: 1

In a paper entitled: Lexicon-Free Conversational Speech Recognition with Neural Networks by Maas, Xie, Jurafsky, and Ng, the authors describe a novel approach to creating acoustic models using the Kaldi speech toolkit without the use of a pronunciation dictionary:

We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decodingprocedure. This approach eliminates much of the complex infrastructure of modern speechrecognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding tasks. The system naturally handles out of vocabulary words and spoken word fragments. We demonstrate our approach using the challenging Switchboard telephone conversation transcription task, achieving a word error rate competitive with existing baseline systems.

They also state:

Our  method  yields  a  complete first-pass LVCSR system with about 1,000 lines of code  —  roughly  an  order  of  magnitude  less  than high  performance  HMM-GMM  systems.   Operating  entirely  at  the  character  level  yields  a  system which does not require assumptions about a lexicon or pronunciation dictionary, instead learning orthography and phonics directly from data.


--- (Edited on 5/26/2015 12:24 pm [GMT-0400] by kmaclean) ---

Re: Speech Recognition without Pronunciation Dictionary
User: TonyR
Date: 5/26/2015 2:02 pm
Views: 148
Rating: 2

There's a flood of these end-to-end papers at the moment all claiming "a word error rate competitive with existing baseline systems" yet when you read the paper you find that they've made up thier own baseline or, in the case of this paper, are comparing with a GMM-HMM baseline that's 15 years old.


But writing a paper that says "lexicon free speech recognition gives 45% increase in wrod error rate - a pronuciation dictionary really helps!" doesn't sell.


Okay, I admit I'm a grumpy old reviewer - I'd have rejected this paper and the many like it (only because they say "comparable" when it's not - CTC is interesting) and as the founder of the use of RNNs in ASR I believe I have the right to be a grumpy old reviewer.



Speechmatics is hiring

--- (Edited on 26-May-2015 8:02 pm [GMT+0100] by TonyR) ---

Re: Speech Recognition without Pronunciation Dictionary
User: kmaclean
Date: 5/29/2015 12:05 pm
Views: 224
Rating: 0

>they say "comparable" when it's not

thanks for setting me straight on this!  


--- (Edited on 5/29/2015 1:05 pm [GMT-0400] by kmaclean) ---

Re: Speech Recognition without Pronunciation Dictionary
User: colbec
Date: 6/12/2015 3:35 am
Views: 878
Rating: 0

If a key element of science is reproducibility of results, Kaldi sets a high barrier. I have tried to get Kaldi running on a single machine about three different times in three years, and have yet to get it to operate satisfactorily. The hardware requirements are certainly formidable, and it seems little attention is given to installation on single non-cuda capable machines for the purposes of familiarity and training.

In the meantime, HTK and Julius are doing a remarkably solid and consistent job as we small horsepower participants endeavour to learn and contribute.

--- (Edited on 2015-06-12 4:35 am [GMT-0400] by colbec) ---