How to get many more contributions

General Discussion

User: OrangeJon
Date: 1/18/2007 6:48 pm

Views: 18121
Rating: 17

If you'd like to encourage many more contributions, how about developing a Macromedia Flash voice recorder and embedding it in your website?

This could make it really quick and easy for people to contribute, and persuade many "casual visitors" to record a few of the scripts.

Cheers,

Jon (www.orangejon.com)

--- (Edited on 1/18/2007 6:48 pm [GMT-0600] by Visitor) ---

Re: How to get many more contributions

User: kmaclean
Date: 1/19/2007 11:09 am

Views: 321
Rating: 40

Hi Jon

I agree. Audio Submission must be as painless as possible if we are to encourage others to contribute.

After seeing your post, I researched Flash a bit. My understanding is that a Flash based audio recorder is a streaming recorder (i.e. it does not store the audio on the PC). All the posts I`ve seen mention that you need the Flash Media Server or the Open Source Red5 server for audio to work.

In addition, it also seems that the audio streaming used by the Flash client use compression to stream back to the server. Speech Recognition Engines work best recognizing speech with the same characteristics as the audio their Acoustic Models were trained with. In other words, if you train an Acoustic Model with speech audio collected from regular Telephone lines (8kHz-8bit audio, and 100+ hours of such audio), then the Speech Recognition engine will be good at recognizing that type of speech. It will not be so good at recognizing VoIP speech transmitted using a lossy codec at different sampling rates or bits per sample.

We would need the Flash client to capture and stream uncompressed audio or use a lossless compressed codec. However, because VoxForge is looking for for higher sampling rates and bits per sample (48kHz-16bit), we simply would not have the network bandwidth to accommodate such streams.

Please let me know if I am wrong in this interpretation - it would be great if Flash would permit a simple way to record of audio onto a user`s hard drive, which they could then upload to VoxForge.

I am currently looking at a Java based solution to address this need. The Java Sound Demo looks like a good starting point for such an app.

all the best,

Ken

--- (Edited on 1/19/2007 12:09 pm [GMT-0500] by kmaclean) ---

Re: How to get many more contributions

User: kmaclean
Date: 1/20/2007 4:13 pm

Views: 307
Rating: 22

Hi Jon,

After more research, it may be that a Flash based audio recorder might fit the bill - if it buffers its stream before sending it to the server. We basically cannot support a real-time stream from an client to the VoxForge server, but if the Flash client can let the user record, and start streaming the audio but not in real-time, this might be workable.

Anyway, there are some examples on the Red5 server site (an Open Source Flash server) that I need to take a look at in more detail before deciding on a Flash or Java WebStart (or applet) solution.

thanks for pointing out Flash as a possible approach,

Ken

--- (Edited on 1/20/2007 5:13 pm [GMT-0500] by kmaclean) ---

Re: How to get many more contributions

User: kmaclean
Date: 2/13/2007 2:43 pm

Views: 297
Rating: 30

More info:

Flash uses its own proprietary codec for audio called the Nellymoser Asao Codec. This codec is proprietary, although there was a bounty for an Open Source implementation of a compatible audio codec. I assume it uses lossy compression.

--- (Edited on 2/13/2007 3:43 pm [GMT-0500] by kmaclean) ---

Re: How to get many more contributions

User: jaiger
Date: 2/22/2007 8:37 am

Views: 347
Rating: 32

I personally think Java will provide the most portable solution (assuming Flash is out of the question due to streaming and/or codec restrictions.)

Let's take a step back though and look at the problem one step at a time...

Ken, why don't you (or we?) define one or more API function calls for:

1- selecting/getting a set of prompts

2- submitting a single recording file (wav+prompt-text+userid etc.)

I suggest the API should be some sort of HTTP-based XML like SOAP or something. The keys being simple and programming language neutral. The server would of course be hosted on VoxForge servers.

The next steps would be to allow the community to develop clients to this API. A client can record a file or set of files locally and upload via the API as needed. Some people like Java, Some like Flash some like AJAX or .NET. Maybe VoxForge hosts/provides one or two "official" clients (eg. Java/Flash) and the community can integrate others as desired.

-joe

--- (Edited on 2/22/2007 09:37:39 [GMT-0500] by jaiger) ---

Re: How to get many more contributions

User: kmaclean
Date: 2/22/2007 11:27 am

Views: 281
Rating: 32

Hi Joe,

thanks for the feedback...

I was actually trying to figure out how to get help from the Google Summer of Code project for the audio submission portion of the VoxForge project. I did not include it in the current list because I thought it would have been too 'WebGUI cms' oriented and not 'speech recognition' oriented enough. I never thought of documenting an API for prompt selection and speech submission. Excellent idea. Any help you can provide in creating such an API would be greatly appreciated.

What

Basically we need a way to allow users to submit transcribed speech to the VoxForge Website, in a way that permits as much automation of the submission and back-end Acoustic Model creation processes as possible.

Technical Limitations

Although the API should be technology agnostic, it's needs work with our current set up, which is as follows:
- VoxForge uses the WebGUI Content Management System. The Server side uses Perl and MySQL, and the client side uses css, html and Javascript.

The VoxForge Front-end web server (www.voxforge.org) has bandwidth limitations - from a user perspective it has 5 mbit upload and 800 kbit download (according to my ISP at least .... ).

How

I don't have much experience with SOAP or XML based APIs, so I'll need your help on this one.

As a start to creating an API, here are some comments:

1- selecting/getting a set of prompts

I currently have a script (a 'macro' in WebGUI speak) that will randomly select a prompt file for a user. It also keeps track of which prompts the user has already submitted in the user's profile. It stays selected until the user submits audio corresponding to the prompt - I have not completed this part yet. It will then randomly select another prompt. The prompts it selects are basically the prompt file 'children' of this URL (i.e. http://www.voxforge.org/home/submitspeech/linux/step-1/phoneme). So new prompt files can easily be added.

The approach I took is that the user needs to be signed in to the VoxForge website. They then click a link ('Submit Speech') and get a submission page with the randomly selected prompts file displayed on it, which he/she then reads to create the audio files on his PC (based on feedback from atterer). The user then creates a zip or tarball of the recorded files, adds it as an attachment, and saves (i.e. uploads) their submission. I was hoping figure out a way to get the 'client' to do this transparently, so that the user would not have to understand zip or tarballs, just click "upload".

I am not really sure of the best approach from an API perspective, since the server, in this case, keeps track of the audio submitted by a user. Should the client simply keep logon credentials, and request the prompt file from the server? What about if the user wants to submit their own prompts (which at some point we might need to look into get better triphone coverage)?

Should a different approach be used to make the API simpler. Should we use my approach for now, with a view to evolving it to an API based solution?

2- submitting a single recording file (wav+prompt-text+userid etc.)

Are you thinking that the user should be able to submit one wav file and one prompt file at a time? I was thinking of something along these lines at one point, because users might be more apt to submit 1-3 audio files at a time, rather than having to submit 40 wav files at a time.

But this then requires that the back-end processes/scripts be re-thought ... so that rather than storing the URL of submitted prompt files in the client profile, a file or database table would need to be created on the server that would keep track of each prompt line the user submitted. This would make things easier if the user decided to submit their own prompts.


Username	Password