Click here to register.

General Discussion

Flat
VoxForge -- Ubuntu collaboration
User: Henrik
Date: 4/26/2007 6:33 pm
Views: 4947
Rating: 26

Hello,

I'm the accessibility coordinator for the Ubuntu project. I'd like to see some progress made on the Linux speech recognition front and I think the VoxForge initiative is a great way to start. Pooling resources is the only way to go!

I see a few ways we may collaborate. For a start we could work together on GSoC projects. We have a fairly good track record with Google, having mentored about 20 projects each year from 2005. I did two myself in the accessibility field last year and will do three this year. Accessibility is such a narrow field that I feel several projects like Ubuntu, VoxForge, Orca and eSpeak should consider making a common project application to Google for 10-15 places. (btw, I realise that VoxForge is only partly about accessibility)

Several of the projects listed in your GSoC forum section would be suitable for collaboration between our projects, and certainly the voice recording client. If we distribute that with Ubuntu (in universe at least) we might see decent participation numbers. It would be great if the same application also facilitated auditing of text-to-speech output. Ubuntu 7.04 just shipped with the eSpeak TTS engine with a handful of languages, but most of them could use some work.

I would think parts of that could be recycled in the Dialog Manager GUI as well. I'm currently working on a specification for the speech recognition front-end and will post a link here once I've completed the first draft of it.

Henrik 

--- (Edited on 4/26/2007 6:33 pm [GMT-0500] by Henrik) ---

Re: VoxForge -- Ubuntu collaboration
User: kmaclean
Date: 4/27/2007 8:59 am
Views: 260
Rating: 18

Hi Henrik,

Thanks for the post! My replies follow:

> I'd like to see some progress made on the Linux speech recognition front and I think the VoxForge initiative is a great way to start. Pooling resources is the only way to go!

I agree ... It's an enormous task, and working together can only help speed things up.

>I see a few ways we may collaborate. For a start we could work together on GSoC projects. We have a fairly good track record with Google, having mentored about 20 projects each year from 2005. 

That would be great!  As you probably know, we tried unsuccessfully to apply as a mentor organization for the Google Summer of Code.  The VoxForge project, and speech recognition on Linux in general, could benefit greatly from your experiences.

>Accessibility is such a narrow field that I feel several projects like Ubuntu, VoxForge, Orca and eSpeak should consider making a common project application to Google for 10-15 places. (btw, I realise that VoxForge is only partly about accessibility)

As states on the Ubunto Accessibility wiki: "Accessibility is cutting edge in software design".  Quality speech recognition on Linux would benefit many people.  Much of the usability issues that need to be addressed in an 'accessibility' context would also help speech recognition users generally.  So I am very keen on looking for ways we can work together.  

>Several of the projects listed in your GSoC forum section would be suitable for collaboration between our projects, and certainly the voice recording client. If we distribute that with Ubuntu (in universe at least) we might see decent participation numbers. 

That would be amazing!  VoxForge is still a young community, and any collaboration with a large and well-recognized project such as Ubunto, would go a long way to help address any hesitancy people might have in contributing their speech.

>It would be great if the same application also facilitated auditing of text-to-speech output. 

I'm not sure what you mean by this ... if you mean using speech recognition to recognize Text-to-Speech output, please note that the general rule is that speech recognition engines work best to recognize the same type of speech their acoustic models ('AM') were trained with.  Thus, if the AM was trained with 'real people' speech audio, it would not work so well with the output from a text-to-speech engine.  This post provides some further information.

>I'm currently working on a specification for the speech recognition front-end and will post a link here once I've completed the first draft of it.

If you need some help drafting, please let us know.  Otherwise I am sure the VoxForge community would be more than happy to provide feedback on your proposal.

All the best,

Ken 

 

--- (Edited on 4/27/2007 9:59 am [GMT-0400] by kmaclean) ---

Re: VoxForge -- Ubuntu collaboration
User: Henrik
Date: 4/27/2007 10:53 am
Views: 1240
Rating: 23

> That would be great!  As you probably know, we tried unsuccessfully to apply as a mentor organization for the Google Summer of Code.  The VoxForge project, and speech recognition on Linux in general, could benefit greatly from your experiences.

It would have been nice to do something this year, but I think we can benefit from some solid planning anyway. I find that SoC works better when the parent project has prepared a detailed spec ahead of time. It saves you spending 3 weeks at the start figuring out what to do and you tend to get more student applications for the projects which means your chance of getting a talented and hard-working student is better.

>I'm not sure what you mean by this ...

Sorry, this was a bit unclear. I mean that it would be great if the application, let's call it speechMaker, were dual-purpose. VoxForge and eSpeak (to take one TTS as an example) both have a similar need to collect data relating to text from the general public. In the case of VoxForge you want to display a few lines of text on the screen and the user should dictate the text into the microphone. With eSpeak, the same utility would display the same few lines of text but instead of the user dictating, the computer would read out the text via eSpeak. The listener would listen for errors in the output and would be able to comment on them easily. On submit, this feedback would be sent to the eSpeak developers who would adjust the speech models to improve the synthesis. So, these are different issues, but with a fair bit of overlap in the needed infrastructure.

> If you need some help drafting, please let us know.  Otherwise I am sure the VoxForge community would be more than happy to provide feedback on your proposal.

Thanks, feel free to edit or comment. Here is my first draft: https://wiki.ubuntu.com/SpeechRecognition I'll follow up now with a more detailed spec for the front end part. 

--- (Edited on 4/27/2007 10:53 am [GMT-0500] by Henrik) ---

PreviousNext