Click here to register.

Google Summer of Code Ideas Page

Nested
Using Games as a Way to Get Users to Submit Speech
User: kmaclean
Date: 3/3/2008 11:41 pm
Views: 8611
Rating: 43

From Brough Turner's blog:

There are three ways to create a collective work: 1. Pay people. 2. Get volunteers. 3. Architect your product in such a way that people create collective value by pursuing their individual self-interest. By way of example, Yahoo! built their directory using method 1. Many open source projects as well as shared content projects like Wikipedia use method 2. But many of the great successes of the Internet age have discovered method 3.

Brough proposes method 3 as the way to accumulate large speech corpora.  That is, to provide ways for people to contribute while pursuing their individual self-interest.  One way to achieve this might be through the creation of a game that, as a side effect, would collect speech as the user plays the game. 

The game approach has been used successfully as shown in this presentation by Luis Von Ahn's (from CMU) describing some initiative that use two player games to provide descriptive tags for web images:

http://video.google.com/videoplay?docid=-8246463980976635143

Key success factors:

  • For a game using speech recognition, it is important have a word or phrase that is known, and use that in your grammar (so open source speech recognition will work - since we are essentially limited to grammar-based speech rec). If the system doesn't recognize the utterance, then maybe get the user to type the answer and say it (not an optimal approach).

  • User's goals of the game should reasonably align with site goals (i.e. VoxForge).

  • The best games are those that don't attract too much attention to yourself. For example, most solitaire is played while goofing-off at work. 

Here is a list of possible games that might created that could help with the collection of speech for the VoxForge Speech Corpus:

Single Player vs computer, using speech recognition:

  • Riddles - with one answer or phrase. User guesses the answer to the riddle using speech. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Word jumble guessing – user is presented with a word/sentence with its letters randomly mixed up, must guess correct word/sentence. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Anagram guesser – user needs to guess all possible combinations of words using only the letters in a particular word. There are on-line websites/algorithms for determining all possible anagrams for a given word. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Identify music passages – play a section of music, user guesses the name of the song. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus. Copyright issues need to be resolved – if using current music. Might use midi versions of out of Copyright music.

  • Simon says – computer repeat an ever increasing sequence of words that the user must repeat correctly (cold be random words, or themed random sentences ... from Shakespeare for example). Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Repeat a rhyme or limerick as quickly as possible – basically trying to collect speech that more closely corresponds to spontaneous speech (as opposed to read speech). Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Bumper Stumper game – user presented with a word in abbreviated license plate format, must guess the correct word. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Text-messaging stumper game – players compete to see who can guess an abbreviated text message the quickest. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Wheel of Fortune game – user given a sentence with blanks representing the letters. User would key in letters as nouns or vowels are guessed, but must speak the answer to win. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • 20 questions – user is presented with a list of questions they can ask (basically this is the grammar for the SRE) to identify a person/place/thing. They have twenty tries to get an answer. Speech rec is used to recognize the question (removes it from the list as they are selected). User then must guess the word.

  • Voice tetris/scrabble – words fall down the screen to one side and player selects it by saying the word and using mouse to move the word into place on a scrabble-like board. Speech Rec is used to select the word. Collect the speech for incorporation into VF speech corpus.

  • Falling Word de-scramble – scrambled word or anagram falls down a screen and if person solves the word puzzle (by uttering the correct word) before it hits the bottom of the screen they get points. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

Single Player – no speech rec:

  • Virtual "Geeks With Talent"/"American idol" – use Shakespeare (or other out-of-Copyright author) and get people to recite them in different forms (e.g. Rapper, valley girl, etc.) Get others to vote. Collect the speech for incorporation into VF speech corpus.

Two Player Asymmetric games (i.e. one player has information that the other player does not) - speech rec:

  • "Pictionary"-like game - board game where one player tries to get other player to guess a word, but can only use drawings. One player would draw a picture on-line (using mouse; or searches for pictures in Google, but other user cannot see any text – only the resulting pictures from the search), other player would verbally guess, and speech rec would tell them when they got the right word.

  • Word guessing game – One player is given a word, and gives the other player hints as to what it is (speech or text) without actually saying the word (might have a "not usable" word list). Speech Rec is used to check for correct answer from second player. Collect the speech for incorporation into VF speech corpus.

  • Image Content Guessing Game Using Words. A picture is shown to one user. S/he describes the picture using text, the other player guesses the answer. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Quote jumble game – Two users are presented with a speech recording of a well know quote, but jumbled up. First person to guess the sentence wins. Speech Rec is used to check for correct answer. Collect the speech for incorporation into VF speech corpus.

  • Virtual Scavenger Hunt – one player is controlling a browser that is viewable to both players. The other gives directions telling the other where to go and what to collect from a limited grammar of words (i.e. go to the Sun website, and collect the logo for open source Java). The Spoken directions are converted to text (based on the grammar) and sent to the other user. Speech Rec is used to validate the words used. Collect the speech for incorporation into VF speech corpus.

Two Player Asymmetric Game – no speech rec (speech collection for later transcription)

  • Assemble 3D part – one player has pictorial instructions to assemble a virtual machine. Instructs other user how to put the machine together. Collect speech for later transcription.

  • Map Directions – both players have access to Google Maps. One player gives verbal directions to the other. Collect speech for later transcription.

Two Player Symmetric Game (i.e. both players have access to the same information) - speech rec

  • Image ESP game knockoff – System presents a known image (that has a predefined grammar of words that describe the picture) to both users. Each users utters words that describes the contents of the image. Players get points for using the same words to describe the image. Speech Rec is used to check for matching answers. Collect the speech for incorporation into VF speech corpus.

  • Sound ESP – same things as the Image ESP, but using sounds.

Two Player Symmetric game – no speech rec

  • Transcribing game – two users are presented with an audio sentence that needs transcribing. Jumble it up, and first person who puts it right, wins. The sentence gets un-jumbled bit by bit as time passes (with the point value for a correct answer going correspondingly down). Speech rec cannot be used in this case (at least not open source ... yet), but get enough players guessing the same answer at the same jumble of words, then we can say at some point that we have a good transcription.

 

Re: Using Games as a Way to Get Users to Submit Speech
User: kmaclean
Date: 5/6/2008 11:48 am
Views: 360
Rating: 27

Interesting article from CMU:

How to Prototype a Game in Under 7 Days

  • Setup: Rapid is a State of Mind
    • Embrace the Possibility of Failure - it Encourages Creative Risk Taking
    • Enforce Short Development Cycles (More Time != More Quality)
    • Constrain Creativity to Make You Want it Even More
    • Gather a Kickass Team and an Objective Advisor – Mindset is as Important as Talent
    • Develop in Parallel for Maximum Splatter
  • Design: Creativity and the Myth of Brainstorming
    • Formal Brainstorming Has a 0% Success Rate
    • Gather Concept Art and Music to Create an Emotional Target
    • Simulate in Your Head – Pre-Prototype the Prototype
      • For each of our most successful games, it was never a surprise when they ended up being fun to play – in the best cases, we knew before touching a line of code that the idea was solid, because we had run a simulation of the game as a little thought experiment beforehand. The reverse is also true. There was no game that accidentally or unexpectedly became successful. We always knew ahead of time.
  • Development: Nobody Knows How You Made it, and Nobody Cares
    • Build the Toy First
    • If You Can Get Away With it, Fake it
    • Cut Your Losses and "Learn When to Shoot Your Baby in the Crib"
    • Heavy Theming Will Not Salvage Bad Design (or "You Can't Polish a Turd")
    • But Overall Aesthetic Matters! Apply a Healthy Spread of Art, Sound, and Music
    • Nobody Cares About Your Great Engineering
  • General Gameplay: Sensual Lessons in Juicy Fun
    • Complexity is Not Necessary for Fun
    • Create a Sense of Ownership to Keep 'em Crawling Back for More
    • "Experimental" Does Not Mean "Complex"
    • Build Toward a Well Defined Goal
      • Without a gameplay goal, a prototype is just a toy – not a game. [...]The best goals, we found, were an innate part of the gameplay
    • Make it Juicy!
Re: Using Games as a Way to Get Users to Submit Speech
User: kmaclean
Date: 7/1/2008 11:40 am
Views: 1259
Rating: 15

The Free Rice web site has word definition game that might also be modified using speech recognition to collect speech for individual words or phrases. 

Re: Using Games as a Way to Get Users to Submit Speech
User: nsh
Date: 9/10/2009 9:48 am
Views: 138
Rating: 8

Someone should do this long time ago:

A Self-Labeling Speech Corpus: Collecting Spoken Words with an Online Educational Game

Ian McGraw , Alexander Gruenstein , Andrew Sutherland

We explore a new approach to collecting and transcribing speech data by using online educational games. One such game, Voice Race, elicited over 55,000 utterances over a 22 day period, representing 18.7 hours of speech. Voice Race was designed such that the transcripts for a significant subset of utterances can be automatically inferred using the contextual constraints of the game. Game context can also be used to simplify transcription to a multiple choice task, which can be performed by non-experts. We found that one third of the speech collected with Voice Race could be automatically transcribed with over 98% accuracy; and that an additional 49% could be labeled cheaply by Amazon Mechanical Turk workers. We demonstrate the utility of the self-labeled speech in an acoustic model adaptation task, which resulted in a reduction in the Voice Race utterance error rate. The collected utterances cover a wide variety of vocabulary, and should be useful across a range of research.


Presented at Interspeech 2009 in Brighton Cool We definitely need this.

 

Re: Using Games as a Way to Get Users to Submit Speech
User: kmaclean
Date: 9/14/2009 8:17 pm
Views: 55
Rating: 8

Hi nsh,

>One such game, Voice Race, elicited over 55,000 utterances over a

>22 day period, representing 18.7 hours of speech.

Wow... this puts VoxForge's approach to shame!

The "Voice Race Game" described in the article uses the WAMI  toolkit (Web-Accessible Multimodal Applications). 

As far as I can tell, WAMI is an open source Java front-end that sends speech to a closed-source MIT backend for speech recognition. It may be that MIT might have taken a page out of the Google playbook (i.e. harvesting voice data).  They say on their front page that : "all audio sent to MIT's servers will be logged for research purposes. [...]".   It is not clear as to what they might do, if anything, with the submitted speech.

How easily could WAMI be used as a front-end to a Sphinx server?  Or would it make more sense to use MIT's backend server for recognition (8kHz) and modify the app to send audio data asynchronously (16kHz) to a VoxForge server?

Very interesting... I've got lots of reading to do.

Ken

Re: Using Games as a Way to Get Users to Submit Speech
User: kmaclean
Date: 9/16/2009 10:31 pm
Views: 957
Rating: 8

Here is a proof of concept app (http://www.voxforge.org/testApp.html) using the WAMI toolkit that 'speechifies' the Free Rice word definition game.  This app uses MIT's Speech Rec and TTS engines, but could be set up to use another speech recognition engine...

There is only one definition and no scores are kept, but it certainly demonstrates the power of the WAMI toolkit to help developers create novel speech enabled web apps.

I think for us, the true value would be in letting us experiment by creating different games to collect speech.  Once one becomes successful, we can then look at modifying the app and migrating it to the VoxForge servers so we can collect the speech. 

We would be using the 'Fail Fast' approach described in an earlier post in this thread.

Ken

Re: Using Games as a Way to Get Users to Submit Speech
User: WAMI Team
Date: 10/5/2009 10:06 am
Views: 282
Rating: 8

Just so everyone knows, this technique has also been extended to continuous speech using the Voice Scatter game on Quizlet.com.


Publication: http://wami.csail.mit.edu/papers/QuizletSlate2009.pdf

Re: Using Games as a Way to Get Users to Submit Speech
User: kmaclean
Date: 8/4/2010 1:39 pm
Views: 2087
Rating: 8

This might be a good resource:

AudioGames.net is a community portal for audio games: games based on sound. An audio game is a game that consists (only) of sound. Its gamemechanics are usually based on the possibilites of sound as well. Usually (but not always) audio games have only auditive (so no visual!) output. We think audio games have the potentional to be a genre on its own due to the immense undiscovered possibilities of sound. Audiogames.net aims to promote audio games and support and inform the audio game community. By providing a clear view on the audio game genre we also contribute to the evergrowing game industry that is only just now beginning to recognize some of the potentials of sound.

This post has more info: Audio-based Game(s)

PreviousNext