Click here to register.

About VoxForge

VoxForge was set up to collect transcribed speech for use in Open Source Speech Recognition Engines ("SRE"s) such as such as ISIP, HTK, Julius and Sphinx.  We will categorize and make available all submitted audio files (also called a 'Speech Corpus") and Acoustic Models in GPL format. 

Why do we need GPL Transcribed Speech?

In order to recognize speech, Speech Recognition Engines require two types of files: the first, called an Acoustic Model, is created by taking a very large number of transcribed speech recordings (called a Speech Corpus) and 'compiling' them into statistical representations of the sounds that make up each word. The second is a Grammar or Language Model.  A Grammar is a relatively small file containing sets of predefined combinations of words. A Language Model is a much larger file containing the probabilities of certain sequences of words.

Problems with Current Approaches:

Acoustic Models are Closed-Source 

Most Acoustic Models used by 'Open Source' Speech Recognition engines are 'closed source'.  They do not give you access to the speech audio (the 'source') used to create the Acoustic Model.  If they do give you access, there are usually licensing restrictions on the distribution of the 'source' (i.e. you can only use it for personal or research purposes). 

The reason for this is because there is no free Speech Corpus in a form that can readily be used, or that is large enough, to create good quality Acoustic Models for Speech Recognition Engines.   Although there are a few instances of small FOSS speech corpora that could be used to create acoustic models, the vast majority of corpora (especially large corpora best suited to building good acoustic models) must be purchased under restrictive licenses.

As a result, Open Source projects that want to distribute their code freely must purchase restrictively licensed Speech Copora that limit distribution of the 'source' speech audio, but allow them to distribute any Acoustic Models they create.

VoxForge will address this problem by providing all Acoustic Models and their 'source' (i.e. transcribed speech audio) in GPL licensing format - which requires that the distribution of derivative works include access to the source used to create that work.

Restrictive Licensing Creates an Access Barrier to Potential Contributors

Every project that wants to build an acoustic model using a corpus with restrictive licensing must purchase their own copy.  This is difficult for FOSS projects, which usually have no revenue.  If a project does purchase such resources, the license restrictions will require them to keep the resources behind some kind of access barrier restricted to official project members.   This takes away freedom and flexibility from end users and shrinks the pool of potential contributors to the project. 

Acoustic Models are not Interchangeable 

Most Open Source Speech Recognition Engines ("SRE"s) come with an Acoustic Model.  However, these Acoustic Models are not interchangeable with other open source Speech Recognition engines.  The way to address this problem is to provide the 'source code' for the Acoustic Models (i.e. the Speech Corpora used to create the Acoustic Models), and permit users to 'compile' it into Acoustic Models that can be used with the Open Source SRE of their choice.  

VoxForge hopes to address this problem by creating a repository of 'source' speech audio and transcriptions, and by creating Acoustic Models for each of the main Open Source Speech Recognition Engines (such as Sphinx, Julius, HTK and ISIP) .

Open Source Acoustic Models Need to be Improved 

Current Acoustic Models used by Open Source Speech Recognition Engines are not at the level of quality of Commercial Speech Recognition Engines. 

VoxForge provides a central location that can collect GPL speech audio and transcriptions.  As more speech audio data is collected, better Acoustic Models can be created, to the point that someday they will be comparable to Commercial Speech Recognition.

No Open Source Dictation Software

Most Open Source SREs are designed for command and control and IVR telephony type applications (e.g. Sphinx, HTK and ISIP).  The Julius Speech Recognition Engine was designed for dictation applications, however the Julius distribution only includes Japanese Acoustic Models.  But since it uses Acoustic Models trained using the HTK toolkit, it can also use Acoustic Models trained in other languages - like English.  We just need hundreds of hours of transcribed speech audio to create English dictation Acoustic Models.  This same audio data might also be used to permit the other open source Speech Recognition Engines to work in dictation applications.

Although the current focus of VoxForge is on Speech Recognition for IVR telephony applications or Command and Control applications on the desktop, when the amount of audio data collected reaches a certain threshold, this data can then be used in the creation of Acoustic Models for Open Source Dictation Applications.

Why GPL?

Unrestricted Licenses for Speech Corpora will not be Effective

We believe that making Speech Corpora available using an unrestrictive, BSD style license will not help the Open Source Community in this particular case.  A BSD style license permits users to distribute derivative works without having to contribute the source of those modifications back to the community.  In our opinion, the Open Source Speech Recognition community does not have the required threshold of users to create a self-sustaining community using a BSD style license.  If there was a larger community, then there would be a greater likelihood that a self-sustaining group would give back to the community, even if not required to do so using a BSD style license.

GPL licensing ensures that any contributions made by the Open Source Community to VoxForge will benefit the community.  This is because the distribution of any derivative works based on the VoxForge Speech Corpora must make the source (i.e. the transcribed speech audio) available to the community.


Comments

Click the 'Add' link to add a comment to this page; click the 'Read More' link to view replies to a posted comment.

AddSearch

Simon and HTK Licensing
By kmaclean - 11/12/2009

In reponse to this post on LWN.net: Simon - speech activated user interface for KDE (KDE.News):

KDE.News has a look at simon, which is a speech-activated interface for KDE. It looks like an interesting project, but, unfortunately, may suffer from some licensing snags: "HTK, the toolkit responsible for the HMM [Hidden Markov Model] evaluation is distributed under GPL-incompatible, restrictive license that prevents redistribution. In order to install simon, one must separately download HTK from their website which requires registration. The source is available, [...]

bedhar (Simon developer) replies with:

Speech models are not code. Think of them as documents (in this metaphor simon is a document editor).

Of course there are existing speech models.

You could even use speech models created by SPHINX-Train by using a speech model converter to convert the model to HTK format (there is such a converter available on sourceforge).

BUT: Speech models created by the HTK can be used _freely_ anyways. You can create models using HTK and then basically use them for whatever you want. This is also the reason why the voxforge initiative can build their speech model using the HTK and still licence the model itself under the GPL license.

The HTK plain text hmm format is well documented.

You can check out an example here: http://www.repository.voxforge1.org/downloads/Nightly_Bui...
(The file hmmdefs is the HMM model created by the HTK).

[...]

For the record: There is an open source initiative called ghmm which tries to create a GPL licenced library for working with HMM models but I contacted them and they said they were not ready for this kind of usage and generally want to be more general-purpose than the HTK so I am not sure if they will be soon/ever.

Also, the HTK is very high quality software and a good recognition rate is obviously the main goal for any speech recognition software - GPL or not.

GPL & application code?
By softtalk - 10/8/2009 - 2 Replies

How does the GPL license for voice data affect the owner's obligations with respect to application source code that uses the voices? In general, GPL licensed code "infects" any code that links to it. If my code requires voxforge voices or voices that are derived from voxforge voices to run, do I have to open source my application code as well as voices?

Melarkia
By Judy - 9/21/2009 - 1 Replies

What is GPL and does the word "melarkia" have anything to do with it?  I did a google search for "melarkia" and it sent me to VoxForge.  Anyone know the answer?  Thanks.

Why GPL?
By kmaclean - 8/2/2009 - 1 Replies

This article provides an support for VoxForge's use of the GPL: The different reasons for company code contributions.   In it, the author states:

[...] licensing issues are the main reason for publishing back, but separated by very few percentage points other reasons appear: the signaling advantage (being good players), the R&D sharing, and many others. In this sense, my view is that the GPL creates an initial context (by forcing the publication of source code) that creates a secondary effect - reuse and quality improvement - that appears after some time. In fact, our research shows that companies need quite some time to grasp the advantages of reuse and participation; the GPL enforces participation for enough time that companies discovers the added benefits, and start moving their motivations to economic reasons, as compared to legal enforcing or legal risks.

 

final state not reached
By vj61614 - 1/12/2009

pls help me in eliminating this error

Why not also LGPL?
By Anthony Martinez - 5/31/2008 - 1 Replies

Hello,

Congratulations for your project :)

My question is simple, why not use also LGPL?

If we could put this module in a commercial software, using LGPL would be mandatory to commit any modifications to the source code (that would bring more coders and investment into this module).

This would be fair for the community, it would bring more people and more investment, and wouldn't be anti-enterprise. Would be the best of both worlds.

Not all enterprises are evil. There are good enterprises that bring innovation, create jobs, pay taxes and make a good contribution to the world. The problem is when those companies become too greedy to share any innovation. Thats why LGPL is so nice.

Think about it :)

Cya 

 

 

GPL will cause years of pain
By Robert (Jamie) Munro - 1/27/2008 - 2 Replies

Be very careful in the way you license this data. If you release a great collection of GPL data, and someone else releases a great collection of data under, for example, a CC-by license, it will probably be illegal to combine the two corpuses and make a working speech recognition product.

I have been involved in 2 projects that went through immense pain because of this issue. The solution is as follows:

  1. Set up a proper legal entity to hold the data (an organisation).
  2. Make sure the organisation has good governance that end users will approve of.
  3. Make end users assign a non-exculsive irrevokeable perpetual license to the organisation for the organisation do to whatever it democratically decides in future.

If you don't make this clear right from the start, any attempt to change the license, for whatever reason, will be impossible.

For example, if a court rules that using a subset of the data derived from VoxForge on a hand-held device doesn't comply with the source code rule, and you have to bundle gigabytes of data in order to use VoxForge derived engines on mobile devices, you will be back to square one, and will have to start collecting data all over again. Who knows why a court may do that, but unless you can be sure, you can't risk not assigning data.

The organisation could have in it's constitution, (terms of incorporation or whatever it's called in the relevant jurisdiction) that it will always make the data available under the GPL, but may additionally make it available under other licenses.

I don't get the GPL issue...
By Visitor - 10/10/2006 - 5 Replies

I think there is a misunderstanding of the difference between audio data and source code. When I create an executable from source code, I might modify the source code. But when I create acoustic models, I don't modify the data that is used to train the acoustic models. So, if I have to distribute the data along with my models, I'd be distributing an identical copy of the data. (Not to mention the difficulties of distributing gigabytes of data...)

 

 

Languages