Click here to register.

General Discussion

Flat
Just sayin Hello
User: ecret
Date: 2/18/2009 4:23 am
Views: 2776
Rating: 3

I am new here and have read the archive of forum posts.  I am somewhat new to speech recognition and have been bugging nsh quite a bit on freenode #cmusphinx. 


I spent some time figuring out ways to contribute and collect speech data and was going to do as one of the posts suggested, throw a few hundred into amazon turk.  Instead I thought it would be cheaper and maybe better to write a transcription type tool online that lets someone do transcription with similar features to what a downloadable application would have.  NOTE: manual transcriptions only.  No ASR.

So it would have playback speed, pause and resume timers, volume control, scroll bar, etc....

I started about a week ago and had to learn flex/actionscript/jsp.  I got a demo up at www.metrospeech.com   

 

Any suggestions on features ?  Sometimes it doesnt work in linux.  The progress bar when uploading doesnt seem to work all the time in linux.  So if you use linux, and are uploading, just wait a minute and when your able to hit play it means uploads done. Solid in firefox. It does work in ie6(had to insert a table in the div for it to show) .  I am unsure about ie7 and mac. I would appreciate it if someone could try with those.

 

  It needs flash 10 because of the playback rate feature.  Flash 10 has some neat stuff that gives you dynamic access to modify the sound being played. 

I was going to also work on getting a red5 server going for input similar to the java applet voxforge already has but nsh showed me nat's which is very impressive and unfortunate that its not on the main site as well.

I hope my tool is of some use and is able to get some quality data. 

Thanks 

--- (Edited on 2/18/2009 4:23 am [GMT-0600] by ecret) ---

Re: Just sayin Hello
User: kmaclean
Date: 2/18/2009 10:12 am
Views: 55
Rating: 2

Hi ecret,

Very cool app! 

Are you going to make the source code available? 

Where are collected audio and transcriptions going to be stored and how will they be accessible? 

What are the "seconds to play" and "seconds to delay" features for?

>Instead I thought it would be cheaper and maybe better to write a

>transcription type tool online that lets someone do transcription with

>similar features to what a downloadable application would have.

What is the speech collection process you are trying to create?  Are you still going to use the Mechanical Turk approach, and use your app as the point of input?  Or are you looking for people to upload their own mp3 speech data from somewhere and transcribe it?

>I was going to also work on getting a red5 server

I believe the Open Source Red5 server (or Adobe's Flash Media Serve) uses streaming audio, which might create some bandwidth issues (you may get 'skips' or 'distortion' in the transmitted speech if your server is not on a solid Internet connection).

The Disseminator's Audio Recorder (last post in this thread) uses a Flash client to record audio, but then to save to the user's local hard drive, for later upload by a script, which I think (if I remember correctly) was the approach suggested by nat.  Nat's code was unlicensed, so you should talk to him before using his, or look at the Disseminator code.

> but nsh showed me nat's which is very impressive and unfortunate

>that its not on the main site as well.

Why?  I'm not necessarily partial to any particular type solution (proprietary, like Flash, or open source...), but at the time, if an open source approach would work, then I thought the best approach would be a Java applet, since Java was Open Source (or very close to being open source) and since we are trying to collect Open Source transcribed speech - sorta of trying to keep everything "in the FOSS family".

One thing to think about: Sphinx and HTK require *segmented* speech data as input to their acoustic model creation process.  You might want to create some way for people to do this in your app, or look at ways for using speech recognition to automatically segment the submitted speech (see here: Automated Audio Segmentation Using Forced Alignment; I also worked on a Perl script to do this located here: AudioSegmentation ).

Ken

--- (Edited on 2/18/2009 11:12 am [GMT-0500] by kmaclean) ---

Re: Just sayin Hello
User: ecret
Date: 2/18/2009 3:56 pm
Views: 197
Rating: 3

>Very cool app! 

Why thank you.

>Are you going to make the source code available? 

Eventually.  The code is ugly as hell, so many hacks to get all those features working together.  Initially it was hoped to take two days to do, took 7.  I also need to clean up the foul language that resulted from dealing with the bugs.

>Where are collected audio and transcriptions going to be stored and how will they be accessible? 

WEll right now it uploads everything into the audio folder and all the text in a broken xml file by just appending using java.io


I suppose every week I will manually upload it to voxforge or make a directory on the site viewable. 

>What are the "seconds to play" and "seconds to delay" features for?

Transcription programs have this feature.  So it plays the audio for 6 seconds, then immediatley pauses it for 6 seconds so the typer can catch up.

>I believe the Open Source Red5 server (or Adobe's Flash Media Serve) uses streaming audio, which might create some bandwidth issues (you may get 'skips' or 'distortion' in the transmitted speech if your server is not on a solid Internet connection).

Yes its streaming.  I do not know about the skips or distortions.  With a stable connection it should be ok.  More data better than less data if its not.

>The Disseminator's Audio Recorder (last post in this thread) uses a Flash client to record audio, but then to save to the user's local hard drive, for later upload by a script, which I think (if I remember correctly) was the approach suggested by nat.  Nat's code was unlicensed, so you should talk to him before using his, or look at the Disseminator code.

Very impressive. 

>Why?  I'm not necessarily partial to any particular type solution (proprietary, like Flash, or open source...), but at the time, if an open source approach would work, then I thought the best approach would be a Java applet, since Java was Open Source (or very close to being open source) and since we are trying to collect Open Source transcribed speech - sorta of trying to keep everything "in the FOSS family".

incest isnt always the answer.  joking.  The problem is that 50% of people wont have the applet installed and will ignore it. Flash has 97%.  I had to use my windows machine as it wouldnt work on linux by default or adding a simple plugin.

>One thing to think about: Sphinx and HTK require *segmented* speech data as input to their acoustic model creation process.  You might want to create some way for people to do this in your app, or look at ways for using speech recognition to automatically segment the submitted speech (see here: Automated Audio Segmentation Using Forced Alignment; I also worked on a Perl script to do this located here: AudioSegmentation ).

nsh suggested that a good project to work on was to automatically segment speech from audio books.  I am likely going to try this as a fun way to contribute to sphinx.

 

 

 

--- (Edited on 2/18/2009 3:56 pm [GMT-0600] by ecret) ---

Re: Just sayin Hello
User: kmaclean
Date: 3/30/2009 1:46 pm
Views: 965
Rating: 2

Hi ecret,

>The problem is that 50% of people wont have the applet installed and will

>ignore it. Flash has 97%.  I had to use my windows machine as it wouldnt

>work on linux by default or adding a simple plugin.

According to Google Analytics, for the past month (Feb 27-March 29), 81% of all users had Java support.  Flash is better, but about 62% were using Flash 10 and 25% using Flash 9, with the remainder using 6,7 or 8.

58% of our users are Windows users, so Java is not an issue for them - it is just as easy to install Java on Windows as it is to install Flash on Windows, though I agree that many more will have already installed Flash  to watch videos. Though I don't think the process to install Sun's Java is any more difficult than installing Flash for the first time.

31% use Linux - most Linux distros include OpenJDK.  However, there are problems with OpenJDK's browser plugin, gcjwebplugin, which does not yet support signed plugins (which the VoxForge applet uses). This will happen, not sure when... Because of this, Linux users need to install Sun's version of Java - this is trickier than it needs to be.  Things were actually cleaner before we had OpenJDK, and users just installed Sun Java themselves.  However, installing Flash on Linux (Fedora), in my experience, has not been any easier to install than Java.

10% use Mac, not sure how this affects them.  I assume that they use Sun Java and not OpenJDK, but don't really know.

The bigger issue with the applet is that it currently does not support USB microphones (because it does not let you select audio channels - it just uses the default channel, which is usually the motherboard's on-board audio).  If anyone is interested in fixing this, please let me know.

Ken

--- (Edited on 3/30/2009 2:46 pm [GMT-0400] by kmaclean) ---

PreviousNext