Click here to register.


Fala Brasil - Speech Recognition for the Brazilian Portuguese
User: kmaclean
Date: 2/17/2010 9:00 pm
Views: 27147
Rating: 31

An excellent site for HTK based speech recognition for the portuguese language is Falab Brazil, includes:


Acoustic Models and Language:

  • LAPSA v1.3 - Acoustic Model created with HTK.
  • LaPSLM v1.0 - Language Model N-gram built with the toolkit SRILM

Phonetic Dictionary:

  • UFPAdic.2.0 -  32 phones based on SAMPA alphabet.
  • UFPAdic.3.0 - New phonetic dictionary with 38 phones also based on SAMPA alphabet.

Speech Corpus:

  • LapsBenchMark1.4 - Corpus voice used to evaluate the performance of LVCSR systems.

Text Corpora:

  • TextCorpora1.5 - Set of sentences used for training language models.
  • LapsNews1.0 - (former LapsFolha) - First version of the new corpora of text-based automatic extraction of text from the web.  Version with 120 thousand sentences.

Scripts for training of Acoustic and Language Models:

  • Acoustic Model - A set of scripts used for training and testing of acoustic models using the HTK tool.
  • Language Model - A set of scripts used to train language models.
Re: Fala Brasil - Speech Recognition for the Brazilian Portuguese
User: m_ice
Date: 6/20/2011 6:40 am
Views: 5723
Rating: 25


I checked the scripts.Surprisingly the bigram language model that I build with LM HTK toolkit gain more accuracy than bigram that I build by SRILM tool kit. at least 10 perecent better!!!


Here is my command for build bigram in SRILM:

ngram-count -text sentences.txt -order 2 -wbdiscount 1 -wbdiscount 2 -lm  bigram.txt

       sentences.txt has 405 sentences.

Here is my Command for build bigram in HTK:

HLstatsCommand = ['HLStats -b Dictionary\bgwTel.txt -o dictionary\wordslist.txt labels\wordsMlf.mlf']



I build my acoustic model based on left to right HMMs with 16 Gaussian mixture for triphones using HTK.