Re: Massive contribution to Spanish

Spanish

Flat

Massive contribution to Spanish

User: alberto.alvarellos
Date: 2/6/2014 3:48 am

Views: 4602
Rating: 4

Hi!

I'm part of a team of a Spanish project (for a spanish health care system) where we need to use voice recognition in order to input data.

Given that there is no open source Spanish acoustic model, we would like to contribute to VoxForge in order to create one. We would like to contribute as productivly as posible, what is the best way to do it? I mean:

How many different voices (different people) would be ideal for the model to be able to generalize?
Would be a model accurate constructing it from 140 hours of audio? How many hours would we need in order to build an accurate model?
Once we have the recordings, how much would it take to build the model? Is the model construction a process we can contribute to (to accelerate its construction)?

Sorry for my english.

Cheers!

(Saludos desde España)

Re: Massive contribution to Spanish

User: kmaclean
Date: 2/6/2014 8:26 am

Views: 146
Rating: 5

Hi Alberto,

>Given that there is no open source Spanish acoustic model,

While we would appreciate any speech contributions you could make, there already is an CMU Sphinx Spanish acoustic model using VoxForge data.

>How many hours would we need in order to build an accurate model?

see Training Acoustic Model For CMUSphinx

Ken

Re: Massive contribution to Spanish

User: nsh
Date: 2/6/2014 2:36 pm

Views: 318
Rating: 5

Ideally you want to have way more than 140 hours of data, 200-300 should be a good amount. The easiest way to train the model would be to collect transcribed audio in any form. It can be a collection of transcribed podcasts, subtitled videos, transcribed recordings and so on.

Once you have that data the model could be trained relatively easily, CMUSphinx has all the tools for training.

For health care it's better to collect a transcribed in-domain data since the lexicon is pretty specific. Quite some work will be required on the lexicon itself, not on the acoustic model.

Re: Massive contribution to Spanish

User: Dade
Date: 3/19/2014 6:06 pm

Views: 1672
Rating: 5

Well, maybe when the spanish side is more developed we could differentiate between Spain and Latin America Spanish acoustic models, as there are some differences with the spoken language that could be interesting to mark up.

Anyway, I was just passing by to cheers my Spanish compatriots, wish them the best with their project and extend my help if they need something I can do.

See you all,
Davide

Previous • Next •


Username	Password