Click here to register.

Google Summer of Code Ideas Page

Flat
Desktop applications for voice submission
User: RainCT
Date: 12/9/2009 12:57 pm
Views: 938
Rating: 0    Rate [
]

Several months ago at the Ubuntu Developer Summit in Barcelona there was a discussion on voice recognition and the possibility to have some desktop application for users to submit voice samples to Voxforge and to publicize it (for example by proposing it as a project for Local Communities, Ubuntu Woman or any other group) was discussed.

Independently from this, a few weeks ago I met some people from Sugar Labs and I discussed the same topic with one of the Sugar developers. After telling him about VoxForge, he also agreed that I'd be nice to ship some utility to enable users to contribute.

So basically why I'm writing this is to let you know that a move in such directions is possible, and also because I'd like to get some information on what would be necessary for this to work (eg. just having the apps uploading the raw voice and the corresponding, free, text via FTP, something else?).

Cheers,

Reply
Re: Desktop applications for voice submission
User: kmaclean
Date: 12/10/2009 8:39 pm
Views: 19
Rating: 0    Rate [
]

Hi RainCT,

>I'd like to get some information on what would be necessary for this to

>work (eg. just having the apps uploading the raw voice and the

>corresponding, free, text via FTP, something else?).

The easiest way might be to run the applet as a Java application - i.e. call the CapturePlayback class directly from a java command line.  The original Sun code I used for the applet could run either as an applet or program.  The 'main' class method was removed a while back, but could be re-integrate.

Or are you thinking of something else?

Ken

Reply
Re: Desktop applications for voice submission
User: RainCT
Date: 12/11/2009 9:59 am
Views: 14
Rating: 0    Rate [
]

I was thinking more about a nice GTK+ application :).


The question was not so much about *how* to do it, but *what* the application would need to do.

Reply
Re: Desktop applications for voice submission
User: kmaclean
Date: 12/11/2009 11:27 am
Views: 17
Rating: 0    Rate [
]

>but *what* the application would need to do.

OK, seeing that Christmas is around the corner, here is my wishlist.

Basically collect the same information listed in the READMEs for each  submission (with the changes noted below):

A. User Specific Information

(This infomation should set on a preferences page once - i.e. does not have to be repeated with each submission)

1. Gender (important for creating gender specific acoustic models)

2. Age (19 and below, 20-29,30-39,40-49,50-59,60-69, 70 and above) - to ensure we have a representative distribution of speech from all age groups; the current groupings are too broad.

3. Pronunciation Dialect (if not listed, then allow user to add their own dialect)

4. a) Native Speaker (yes/no)
b) if 'no', then user can add their native language from list , if not on the list, then user can add their own (to create speech corpora or acoustic models with non-native speech)

5. Microphone type (e.g. headset, desktop boom mic,laptop built-in, WebCam, Studio, other) - where user can add their own type.

6. Microphone connection type (e.g. analog wire, usb, bluetooth, wireless, not applicable, etc) - where user can add their own type

7. O/S - collect from system

8. Audio card - collect info from system (where applicable)

B. Prompt recordings

1. Predefined prompt lines

2. Prompt lines can be changed by the user.  Two possible ways this could be done:
a) user overwrites the individual prompt lines, or
b) supplies a paragraph, and the program breaks it up into 15-30 line prompt lines for reading by the user.

3. Waveform display for *each* prompt line (something like Audacity's GUI).  Waveform must be generated simultaneously with recording so they can see if they started too soon or are speaking to loudly.

4. Prompts should be updateable (push or pull) from a central, multilingual, prompt repository.

5. User must record at least 10 prompts for a valid submission, but can record an unlimited number more.

C. Recording preferences

1. Need to be able to select the correct microphone/input line - this needs to be configurable by the user (speech submission app only records form the default input device... it needs to be user selectable)

2. Sampling rate (48kHz, no less than 16kHz - we can downsample to 8kHz)

3. Bits per sample (16bits-32bits)

4. Format (WAV, RAW, or lossless compressed like FLAC)

5. Some way to determine the frequency range of the microphone - different microphones have different frequency ranges that can affect the quality of the recordings.

D. Recording validation

1. Some way to notify the user (or reject the recording) if they don't leave half a second of silence at the beginning and the end of their submission.  The training scripts require silence surrounding the prompt recording, but some users start speaking before the app starts up, and I have to manually edit the recording (usually removing the partial utterance and manually adding silence to the beginning of the recording).

2. some way to notify the user (or reject the recording) if there recording is too loud or too soft.  The current speech submission app does the first (but the notification is not very prominent), but not the second.

E. Licensing

1. GPLv3 or GPLv3 compatible license (e.g. BSD, public domain, CC sharealike, ...) of user's choice.

2. clicking upload signifies that the user agrees to assign their Copyright to the Free Software Foundation (though I am not sure on the legal enforceability of such an assignment).

3. If users creates their own prompts, need a check box for the user to certify that they created these prompts themselves or that they are taken from GPLv3 compatible licenced texts (e.g. BSD, public domain, CC sharealike, ...).

F. Uploading

1. User selectable upload (ftp or http) to speech repository of their choice (preferably with VoxForge as the default repository...)

2. Preferably should be http so we can use PHP to perform some rudimentary validation checks on the submissions before acceptance.

3. Streaming vs batch upload is a 'how' decision, but the voxforge repository server is limited to a standard batch upload.

G. Localization/Internationalization

1. Everything on the interface should be translatable (p.o. files)

2. Prompts should be multilingual (prompt server, or as a separate set of files that can be easily downloaded without having to update the program)

Thanks for your help on this!

Ken

Dec 14, 2009 Edit: added recording validation section

Reply
Re: Desktop applications for voice submission
User: RainCT
Date: 12/14/2009 10:08 am
Views: 10
Rating: 0    Rate [
]

Now that's a nice list!

I've started looking into gstreamer and it looks pretty nice, so I may have a go at it, but hey I don't promise anything soon (especially as I have exams next month :/).

Reply
Re: Desktop applications for voice submission
User: kmaclean
Date: 12/14/2009 11:24 am
Views: 33
Rating: 0    Rate [
]

>so I may have a go at it, but hey I don't promise anything soon

>(especially as I have exams next month :/).

Anything you can do to help collect more speech is greatly appreciated.

Ken

Reply
Re: Desktop applications for voice submission
User: Robin
Date: 12/17/2009 4:25 pm
Views: 47
Rating: 0    Rate [
]

I think it would be better if users would upload their licensed speech under the GPLv2 or any later version of the GPL. That gives us more flexibility. For instance, the Dutch corpus at the University of Amsterdam is probably not licensed under the GPLv3, but an earlier version (I think version 2) because I think the corpus is older than the GPLv3. There might be more small, but useful corpora licensed under an earlier version.

If the speech will be licensed under GPLv2 or later, we can make an acoustic model and license it under the GPLv2. When there is no necessity any more to include speech that is licensed under GPLv2-only, we can make an acoustic model licensed under GPLv3.

It is not possible to simply use GPLv2 only speech and license it under the GPLv3, since the two versions are not compatible (but it is possible to licensed under GPLv2 and later versions, which is common practice with some projects I believe).

Also, offering different licensing options will have a detrimental effect to the number of effective submissions. I think offering only one option will give the best results (GPLv2 or any later version is only one licensing option, please don't think I'm suggesting to give users an option to choose their preferred version). Read "The Paradox of Choice", or watch a presentation of the author you can find online to understand why I think that.

By the way, such a licensing option would also make it possible to upgrade to version 4 or later (in case there will ever be one, and who knows, it might again not be compatible with version 3).

Reply
Re: Desktop applications for voice submission
User: kmaclean
Date: 12/21/2009 1:13 pm
Views: 9
Rating: 0    Rate [
]

>For instance, the Dutch corpus at the University of Amsterdam is

>probably not licensed under the GPLv3, but an earlier version (I think

>version 2) because I think the corpus is older than the GPLv3.

I believe you are referring to The IFA Spoken Language Corpus v1.0:

The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers in a variety of speaking styles. For a total of 50,000 words (41 minutes/speaker) [...]

whose licence text, inter alia, states:

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or at your option) any later version. [...]

(emphasis mine)

>there might be more small, but useful corpora licensed under an

>earlier version.

I've tabulated quite a few links to various corpora (English, Other Languages) over the past few years, and if they select GPL (and these are few to begin with... ), they tend to stay with the standard wording and include the passage "...any later version".

Although the final decision rests with RainCT, if a decision needs to be made on a single license, then I would recommend "GPLv3 or any later version" and let the laggards (if any) catch up...    :)

Just my two cents..

Ken
 

Reply
Re: Desktop applications for voice submission
User: Robin
Date: 12/21/2009 3:00 pm
Views: 6
Rating: 0    Rate [
]

Ken, perhaps I spoke too soon. If you've never come across corpora that are GPLv2-only then perhaps there are none. I was afraid that we might exclude some relatively significant corpora (some hours and probably more important, some processing work already done).

Of course, if there are any, there is no easy way for them to catch up... unless they ask permission from all the people who submitted some speech. Well, that's possible, but usually practically it's not. Anyway, it seems as though this is a nonissue, so sorry about raising it.

Reply
Re: Desktop applications for voice submission
User: kmaclean
Date: 12/21/2009 3:25 pm
Views: 9
Rating: 0    Rate [
]

>Anyway, it seems as though this is a nonissue, so sorry about raising it.

No need to apologize, if this project is to grow (which I hope it will), varying points of view are required/necessary - I cannot have a monopoly on where VoxForge is going...

>there is no easy way for them to catch up... unless they ask permission

>from all the people who submitted some speech.

Wikipedia offers an alternative... they just voted to switch licenses (from the GNU doc license to CC share alike...).  Seemed to work for them   :)

Ken

Reply
PreviousNextAdd