Re: License terms vs. existing databases
The reason I chose GPL is to encourage the open source community to contribute transcribed speech - if you submit something, you know it will always benefit the community. In creating VoxForge, I did not set out to shut-out third party suppliers speech corpora from the creation of VoxForge Acoustic Models - but with GPL licensing that will likely be the end result.
LDC and ELRA have been around a long time and have contributed greatly toward basic research and to getting Open Source Speech Recognition Engines to where they are today. However they charge for their speech corpora. And with good reason, transcribing audio is a tedious and mistake prone exercise, and you have to pay people to do it full time. I want to leverage the open source community by asking many people to contribute a little, rather than paying fewer people to contribute a lot. GPL helps to encourage this process because people know their contribution will always be available.
My belief is that if we are to truly grow Open Source Speech Recognition, we need free Open Source Speech Corpora. Apache/BSD style licenses have their place, but in the context of speech recognition, they have not created a self-sustaining open source community - there is not a big enough user base (yet...). My hope is that GPL licensing will improve the situation. It's not perfect, but given our goals, it is the best available choice.
all the best,
--- (Edited on 10/13/2006 9:24 am [GMT-0400] by kmaclean) ---