Speech Recognition in the News

Click the 'Add' link to add a comment to this page.

Note: You need to be logged in to add a comment!


SpinVox Names Dr. Tony Robinson as Director
By kmaclean - 3/22/2008

You may have seen some posts on this site by Dr. Tony Robinson.  I'd like to  congratulate Tony on his recent appointment as Director of SpinVox' Advanced Speech Group.  From this press release on the SpinVox site:

Robinson's remit will be to further build a team from the ASR expertise that is concentrated in the Cambridge area. Under his leadership, the SpinVox ASG will further develop the Voice Message Conversion System that is at the heart of SpinVox services.

We'd like to wish him luck, and thank him for all his help to the VoxForge community in the past year and a half.



Zero Crossings as an Effective Feature In Speech Recognition for Embedded Applications
By kmaclean - 3/21/2008

This is an interesting article on the use of zero crossing rather than feature vectors (such as the MFCCs we use with HTK/Julius) that are traditionally used in speech recognition.  Shubhendu Trivedi was looking to create a speaker dependent, isolated word, speech recognizer for a 8051 micro-controller.  But traditional HMM approaches using MFCC based feature vectors were too computationally intensive to work on this controller.

He found a paper that provided the solution.  In it, the authors describe a way of only using zero crossings of the speech signal to determine the feature vector.   Shubhendu says in his article:

This feature vector is basically the histogram of the time interval between successive zero-crossings of the utterance in a short time window. These feature vectors for each window are then combined together to form a feature matrix. Since we are dealing with only small time series (isolated words), we can employ Dynamic Time Warping to compare the input matrix with the reference matrix’ stored.


Using Speech Recognition to Display Contextual Ads While Watching a Video
By kmaclean - 2/11/2008

Microsoft is working on a system to allow contextual ads to be served alongside video.  The system uses speech recognition to identify the topic of a video.  This would allow advertisers to display ads for sports gear alongside a video about soccer or furniture with a video for home improvement.

See video at this link: One To Watch: Microsoft's Video Advertising Systems


Simon on Slashdot
By kmaclean - 1/19/2008

From Slashdot article Open Source Speech Recognition:

"The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words."

Cell Phone Voice Transcription Services
By kmaclean - 12/13/2007

Here is an interesting list of services that let you convert your speech to an email or text message over your telephone.

Has anyone tried these services?  What were your experiences?

MIT's speech recognition interface to Google maps
By Visitor - 11/27/2007

MIT's Address Browser provides an speech recognition interface to Google maps.  From the site:

The AddressBrowser is a prototype speech-based interface that allows users to speak any city or address in the United States. You can say any valid address that follows a simple pattern, e.g., 32 Vassar Street in Cambridge, Massachusetts, the intersection of Main Street and Vassar Street in Cambridge, Massachusetts, or just Cambridge, Massachusetts.
In order to be heard you will need a microphone connected to your computer.

  • Press and hold the big green button and say your address. The button will turn red when it is recording your voice. Just speak your address naturally. Release the button after you have finished talking.
  • ...

It seems like it uses a Java Applet client front-end and a back-end speech recognition server. 

Early look at Android
By kmaclean - 11/13/2007 - 2 Replies

Android is a new operating system for cell phones, designed by Google engineers. Unlike most existing cell phone operating systems, it'll be friendly to applications created by outside software developers.

Basically the phone (scheduled for 2008) will run on a Linux core, with Java-based apps.  

Setup of the "early look" SDK is quite easy (if you know Eclipse).  I was able to create the HelloAndroid app without much problem.  When you press Run, the Android Emulator starts up, and you can see the results on the screen.

Here is what I have gleaned from comments from Dan Morrill on the Android Developer Google Group list:


Nabaztag robotic rabbit
By kmaclean - 11/9/2007

The Nabaztag robotic rabbit is a wireless Internet contraption that can speak, move its ears and flash its lights in response to user inputs, and  includes speech recognition.

From the site's "Voice Recognition FAQ" (sic): 

Services available with voice recognition :
  • Weather

  • Air Quality

  • Paris Traffic

  • Stock ticker (free and full)

  • Radio

The commands it recognizes are pretty simplistic:

  • Weather
  • Air or  Smog
  • Traffic
  • Market
  • Radio

but it seems to be an interesting harbinger of what Consumer Speech Recognition Appliances might look like in the not to distant future.


GPhone to include Open Source Speech Recognition?
By kmaclean - 11/6/2007 - 1 Replies

Android is a new operating system for cell phones, designed by Google engineers. Unlike most existing cell phone operating systems, it'll be friendly to applications created by outside software developers.

In a PC World Interview with Google co-founder Rich Miner, Miner says:

When we looked at the other [mobile] Linux activities out there, oftentimes they're initiatives that are based on Linux but their resulting platforms aren't completely open. Or they're completely open and they're Linux, but they're missing most of the things that [Android has]. They probably don't have video codecs, Midi sequencer, speech recognition. So they're not a complete phone stack. The goal with Android was to build into it everything you needed to release a phone: an entire stack to build a competitive smartphone or high-end feature phone.

Although Android is to be released under the Apache License, the speech recognition component likely will *not* be, since Nuance is also an Open Handset Alliance partner.  The Android™ SDK is set to be released on November 12, 2007.

Voice recognition technology nabs Colombian drug kingpin
By kmaclean - 8/10/2007

From a Globe and Mail Article:

A reputed leader of Colombia's biggest drug cartel radically altered his facial appearance with repeated plastic surgeries. But his own words gave him away, thanks to advanced voice recognition technology that has become a key tool in the war against drugs and terrorism.

U.S. agents confirmed the identity of Juan Carlos Ramirez Abadia using the equivalent of a vocal fingerprint, his attorney said Friday.

Background on voice recognition (from Wikipedia): 

Speaker recognition, or voice recognition is the task of recognizing people from their voices. Such systems extract features from speech, model them and use them to recognize the person from his/her voice.

Note that strictly speaking there is a difference between speaker recognition (recognizing who is speaking) and speech recognition (recognizing what is being said). Generally these two terms are frequently confused and voice recognition is used as a synonym for speech recognition instead.


--- (Edited on 8/10/2007 10:47 pm [GMT-0400] by kmaclean) ---