Frequently Asked Questions
- 32-bit Fedora 9 Sun Java FireFox Plugin Installation
- 64-bit Fedora 9 Sun Java FireFox Plugin Installation
- Acoustic Model training
- Audacity seems let me record at higher rates that my sound card supports
- Cloud based Speech Recognition
- Editing Content with WebGUI
- FTP Clients that are compatible with VoxForge Submission System
- How can I execute HTK or Julius commands from my Windows console?
- How do I Adjust my Microphone for Recording Speech?
- How do I install HTK on a 64 bit OS
- How do I Pronounce a Word I Don't Know?
- How do I reactivate my VoxForge Account?
- How many times do I need to record a phoneme so that HTK can compile it to an Acoustic Model?
- How much speech do we need?
- How to compile Julius from source
- How to Connect to VoxForge FTP Site using FireFTP
- How to convert HTK acoustic model to Sphinx
- How to Create a Quiet Environment for Recording Prompts?
- How to create tree.hed "questions"
- How to implement a continuous speech recognition using HTK's HDecode
- How to improve speech recognition accuracy
- How to Provide Feedback on a Submitted Audio File
- How to Receive Email Notification of a New Post on a Forum
- how to remove Java applet from cache
- How to Search HTK's email archives
- How-to modify the sound.properties file of your Java JRE - Linux
- How-to Rate an Audio Submission to VoxForge
- HTK ERROR MESSAGES
- HTK training recipes for Timit corpus
- HVite error: HMM Def Error: Regression Tree definition expected
See this post (many thanks to the author: scott_glaser):
- Sun Java Installation - i386 (FC8-9)
(alternate title #1: How-to install Sun Java on 64-bit Fedora 9 so that signed applets will work)
(alternate title #2: How-to install 32-bit FireFox on 64-bit Fedora 9 so that so that Sun Java applet will run properly in FireFox)
The information presented here was taken from these posts (many thanks to the authors: scott_glaser & natousayni):
- 64-bit Fedora 9 contains OpenJDK, which cannot run signed applets and the the VoxForge Speech Submission applet is a signed Java applet.
OpenJDK is the free implementation of Sun's Java run-time environment. The browser plugin used in Fedora 9, gcjwebplugin, does not yet support signed plugins. From the Fedora Project Wiki: :
Handling Java Applets
Upstream OpenJDK does not provide a plugin. The Fedora OpenJDK packages include an adaptation of gcjwebplugin, that runs untrusted applets safely in a Web browser. The plugin is packaged as java-1.6.0-openjdk-plugin.
- The gcjwebplugin adaptation does not support signed applets. Signed applets will run in untrusted mode. Experimental support for signed applets is present in the IcedTea repository, but it is not ready for deployment in Fedora.
- The gcjwebplugin security policy may be too restrictive. To enable restricted applets, run the firefox -g command in a terminal window to see what is being restricted, and then grant the restricted permission in the /usr/lib/jvm/java-1.6.0-openjdk-126.96.36.199/jre/lib/security/java.policy file.
- Sun recommends 32-bit Java to run applets;
- 32-bit Java needs a 32-bit plugin to work in a browser;
- The 64-bit Fedora 9 implementation of FireFox is 64-bit and will not work with a 32-bit Java plugin.
- Install 32-bit FireFox
* Add i386 Yum repository
* create a new Yum configuration file:
# gedit /etc/yum.repos.d/fedora-i386.repo
* copy these settings into the new configuration file:
name=Fedora $releasever - i386
name=Fedora $releasever - i386 - Updates
* Remove the default Firefox (64-bit) installation
# yum -y erase firefox.x86_64
* Install 32-bit Firefox
# yum -y install firefox.i386
- Add libXtst.i386 library
# yum -y install libXtst.i386
- Don't touch your default java installation
Other how-tos state that you need to remove openjdk. However, other programs on Fedora 9, like Eclipse, need the default Java. You should not have to remove your default Java installation (java-1.6.0-openjdk java-1.0.6-openjdk-plugin) because you can use the alternatives command to select the version of Java you need (but you need to make sure you don't use the rpm version of Sun's Java install, because it changes the /usr/bin/java executable to not point to the "alternatives" command symbolic links ).
- Create new Java directory
# mkdir /usr/java
# cd /usr/java
- Download Sun's java your new Java directory
(the .bin file NOT the rpm.bin - because the rpm changes the /usr/bin/java executable to not point to the "alternatives" command symbolic links).
- Execute the bin
# chmod +x jre*
- Link FireFox plugins to the new Sun Java
* for a particular user# cd /home/yourusername/.mozilla/plugins# ln-s /usr/java/jre-6u7-linux-i586/plugins/i386/ns7/libjavaplugin_oji.so* for all users:# cd /usr/lib/mozilla/plugins# ln-s /usr/java/jre-6u7-linux-i586/plugins/i386/ns7/libjavaplugin_oji.so- OR -
- Use alternatives command (for all users)(you can also use the alternatives command to set the plugin link - since FireFox is the only app that needs the Sun Java and it uses libjavaplugin.so rather than the libjavaplugin.so.x86_64 used by other programs (like Eclipse) on 64-bit Fedora 9).
# /usr/sbin/alternatives --install /usr/lib/mozilla/plugins/libjavaplugin.so libjavaplugin.so /usr/java/jdk1.6.0_07/jre/plugin/i386/ns7/usr/java/libjavaplugin.so 2
# /usr/sbin/alternatives --config libjavaplugin.so
There is 1 program which provide 'libjavaplugin.so'.
*+ 1 /usr/java/jre1.6.0_07/plugin/i386/ns7/libjavaplugin_oji.so
Enter to keep the current selection[+], or type selection number: 1
- VoxForge tutorial for training acoustic models for Julius (need to use HTK)
- Forums for asking questions
Audacity will let you change your Sample Rate and Bits per Sample to rates higher than what your Sound Card can support. It will record at the highest rate your sound card supports, and then dynamically upsample the audio to the higher rate your selected - without providing any warnings that it did this. This is NOT the approach you should take for any audio submitted to VoxForge. The upsampling and later downsampling for use in Acoustic Models can introduce noise.
Please check your audio card manual to determine the highest Sampling Rate it supports. If you don't have your manual (or lost it, or never received one ...) these other FAQ entries can help you determine your max sampling rate:
- WAMI Toolkit (uses a Java client)
wami project (open source version, roll-your-own)
SpeechAPI (uses a Flash client, and Red5 back end)
- NetworkSpeech.com (commercial)
SpeechCloud (open source client)
The main thing to remember when editing content, is
that WebGUI creates a "version tag" that keeps track of all your edits
(its default name is your username and current date). To publish your
changes so that everyone can see them, you need to "commit" your version tag.
To edit content you need to be in "admin" mode (click the "admin" link on top right hand corner of page - your account needs permission for this to show). The Admin Console appears on the left hand side of your web page.
To commit content click the "Version Tags" category in the Admin Console, select "commit my version". When you click this, the content you edited will become visible to everyone. No content changes are visible until you commit the version tag!
If you log out and log back in without committing the changes you've made, your changes will not be displayed, and the things you edited will be locked (the padlock icon appears). You need to select the version tag that you created in your previous session to make it active. To do this, click the "Version Tags" category in the Admin Console, and click your version tag.
If you don't select your previous version tag, and start editing something, then WebGUI will
automatically create a new version tag... this can get confusing,
because your previous edits will still be "locked" under your old
The commit process is useful because you can edit multiple web pages (over a period of days) under a single version tag, and publish them all at once with a single commit.
Note: when editing the Read page for your language, it is best to disable the VoxForge Speech Submission Java Applet. Otherwise, your edits slow to a crawl as your browser attempts to refresh the entire page (including the applet) every time you make a change to a web object.
A "Page Layout" is container object. Think of it as a directory or folder that displays its contents as a web page. If you want to localize the VoxForge menu, change the menu property to your language. You can add text to a Page Layout, but it is best (easier to format) to only have text in Articles.
"Articles" are objects that contain your text. You
can have more than one article (or any other object) on the same Page
Layout. Just click edit on the article menu, and replace the given text
with your translations. Articles also have a menu property, but this is only used for setting secondary menus (which VoxForge does not use), not the main menu (set the menu property in the Page Layout container).
"Forum" is an object... which, if you are reading this you would obviously know, lets users submit posts and post replies.
"Shortcuts" are pointers to other objects that are displayed on the Page Layout they are located in. Usually these point to another article or forum from the English side of the VoxForge website. You cannot change the object that the Shortcut points to, but you can "overwrite" some of its properties. Using this feature you can overwrite the title (i.e. translate it to another language) of the object being pointed to.
Putting it all together:
Therefore, if an Article and a Shortcut to a Forum (or any other web object) are contained on a Page Layout, when you click the URL of the Page Layout, the content of the Article and the Shortcut will be displayed, in the same way images are displayed on a web page, even though the Article and the Shortcut are both uniquely addressable through their own URL.
Note: Cascading Style Sheets (CSS)
Web objects like Articles and Shortcuts can have their own css properties. However, when these web objects are displayed on a Page Layout container object, the css of the Page Layout overrides the css of the contained object. This is the opposite of how cascading style sheets normally work (where the most specific css entry overrides the general css entry). The reason for this is that each object on a Page Layout has its own URL and can be displayed independently of the Page Layout. Therefore such web objects require a style sheet to be displayed by itself.
Cross-platform FTP client:
- FireFTP - requires Firefox 1.5 or greater
- Nautilus (Gnome)
You may want to have Julius and HTK included in your path environment variable. This way you can execute Julius and/or HTK commands from your Windows console (note that you will not be able to execute bash or Perl scripting commands),
Step 1 - Update your path environment variable to include HTK and Julius
- Login to Windows with your administrator account;
- Go to Windows Start menu;
- Right-click My Computer;
- Click properties from the right-click menu;
- this opens up your System Properties window;
- Click the Advanced tab;
- Click Environment Variables button;
- Go to the System variables window and roll down until you find the Path variable and click on it;
- Click the System variables Edit button;
- In the Edit System Variable window, click the Variable Value entry field (this field contains is a long string of directory entries for your system)
hit your 'end' key to go to the end of the Variable Value string and then add the following entry (all one line, and include the semi-colon ";" at the beginning of the string) to the end of the Variable Value entry field:
Warning: be very careful here, you can cause serious
problems with your system if you make a mistake. Don't overwrite or
delete anything already there. You are just adding directory entries
for HTK and Julius to the end of your current path.
If you think you may have made an error, click the cancel button and restart.
Click OK in the Edit System Variable window
Click OK in the Environment Variables window
Click OK in the System Properties window
Step 2 - Testing Your HTK/Julius Install
- Open a Cygwin Console window:
- Click Start>All Programs>Cygwin>Cygwin Bash Shell;
- Click Start>All Programs>Cygwin>Cygwin Bash Shell;
in "HVite" in the Cygwin Console;
if your system lists all the options available to the hvite command, then HTK is installed properly.
- Type in "julius" in the Cygwin Console;
if your system displays version information for Julius, then Julius is installed properly;
- If you don't see the expected results, review your installation steps for Julius or HTK to determine where you might have made an error.
If you have a headset microphone, this should be easy to do. Your microphone should be a bit to the side and below your mouth (so the microphone won't pick-up your breathing), and no more than a half inch (1-2 cm) away.
A standalone microphone makes recording a little more difficult. It is very important to keep your mouth at the same distance from the microphone for the entire duration of the recording of one file. The same applies to a handheld microphone, try to be consistent in the way you hold it when recording the prompts for one file. It does not have to be the same distance from one file to another, but must be the same for the duration of one file.
To compile HTK on a 64 bit Fedora Core 4, I used the following commands:
- ./configure --prefix=/home/kmaclean/bin/htk-3.3 CC=gcc32
- make all
- make install
There are a couple of approaches
1. Use the VoxForge Dictionary:
If you are wondering about pronunciations, the VoxForge Dictionary might provide you with some indication as to the pronunciation. For example, the word "etc" shows up as follows in the dictionary:
ETC [ETC] eh t s eh dx er ax
ETCETERA [ETCETERA] eh t s eh dx er ax
You really don't need to know how the phonemes are pronounced in this particular example, because you can see that 'ETC' and 'ETCETERA' contain the same phonemes, and therefore should be pronounced the same.
For other words you are not sure how to pronounce, you can look at their component phonemes and search for similar strings of phonemes until you find a word you know how to pronounce.
For example, for the word "windward", you would look it up in the dictionary and find:
WINDWARD [WINDWARD] w ih n d w er d
You would then search for the string "w er d" and find the word "word"
WORD [WORD] w er d
So now you know you would pronounce the word windward as "wind" + "word".
Note that this is not clearcut in all instances, because some dialects pronounce the "ward" in the word "windward" like the "ward" in the word "award", see this dictionary entry:
AWARD [AWARD] ax w ao r d
Therefore, it all depends on the target users of the speech recognition system and what their own particular dialect is. And if we are targeting an Acoustic Model to this particular dialect, we might add an entry to the dictionary like this:
WINDWARD [WINDWARD] w ih n d w ao r d
But in the non-native speaker case, where you might not have any idea how to pronounce a word, the dictionary is a good start.
2. Listen to Someone Else's Audio
Another approach might be to listen to the audio from someone else's submission to see how they pronounce it.
3. Other Resources
- LibriVox discussion re: Pronunciation Resources mentions the following resources:
You should be able to reactive your account by performing the following steps
1. clicking Login from the top menu
2. click "click here to register"
3. click "i forgot my password"
4. enter your email address, click save
5. an email will be sent to you with a link (and a new password) to activate your account.
You need to record a particular word from your Grammar more than once in order to create HMM statistical models that are robust enough to recognize your voice - more is better, 3 to 5 is the minimum required so that HTK will compile.
From Nsh's post:
Database size [...] is only one of the factors that affect accuracy, there are many others. And there is no direct dependency between size and accuracy. It's possible to have good accuracy with 70 hours, it's possible that with 10 thousands you'll have bigger error rate.
Here on page 13 you can find comparision of accuracy and database size
Basically the difference in accuracy between 400 hours and 2200 hours is 2%.
Step 1 - Download Source Code
Create a new directory in your home directory called 'bin', it
should have the following path (replace yourusername with the username
you are using on your system):
and save it to your new bin directory.
Extract the file using:
- Nautilus (right click the tar/gzipped file and click extract here)
- use tar from the command line:
- tar -xvzf julius-4.3.1.tar.gz
this should create a julius-4.3.1 directory in your bin folder.
Step 2 - Compile & Install Julius
After unpacking the sources, open a command line terminal and go to the /hom/yourusername/bin/julius-4.3.1 directory where you downloaded your files.
The default location for binaries is "/usr/local" which will put the tools in "/usr/local/bin". You need to change this default location using the "./configure" script to specify where you want the binaries installed:
To compile Julius:
$./configure --with-mictype=alsa --enable-setup=standard --prefix=~/bin/julius-4.3.1
(Note: ~/bin/julius-4.3.1 points to /home/yourusername/bin/julius-4.3.1)
This directs the make command to put all your binaries in the following folder:
To compile 32-bit Julius on a 64-bit computer:
you will need the .686 version of alsa-lib as root:
# yum install alsa-lib-devel.i686
then add a flag before you run the Julius configure to tell your gcc compiler to compile 32-bit binaries:
$ CFLAGS=-m32 ./configure --with-mictype=alsa --enable-setup=standard --prefix=~/bin/julius-4.3.1 --host=i686-generic-linux-gnu
To build the libraries and binaries, execute the following:
Running the following command will install them:
Step 3 Update your User Path
To update your user path, you need to add the '$HOME/bin/julius-4.3.1/bin' path to your path variable. To do this, edit your '.bash_profile' file in your home directory (in Fedora you need to show 'hidden files' in Nautilus - so you can display file names with a period in front of them) and add a colon (":") and this path to the end of the PATH variable (leaving the rest of it unchanged):
|# User specific environment and startup programs
Log out and log back in to make your path change effective.
Using Right-click Menu
- Right-click the VoxForge FTP link;
- click "Open Link in FireFTP" in the menu window;
- Enter the password when prompted (the Login section should fill in automatically);
- Click "OK".
Connect to VoxForge FTP site from FireFTP
- click tools, and
- then select FireFTP.
FireFTP should appear in a new tab in your FireFox browser.
Inside the FireFTP tab:
- Click "Manage Accounts",
- then select "QuickConnect".
- In the Account Manager po-up window, enter the following information where indicated:
- Click "OK"
Click "Connect" (located next to drop down list of accounts) to make the actual connection to the VoxForge FTP site.
Before you begin, you need to make sure that the room you are recording in is as quiet as possible. You do not need an acoustically sound-proof room, but you need to make sure that, while you are recording, there are no external noises that your microphone might pick up and which may render you speech audio files unusable for Acoustic Model creation purposes. Use common sense: you should not have any music in the background, no fans, air conditioners, microwaves, television, etc. In addition, make sure you turn off you speakers while recording - to avoid acoustic feedback in your audio files.
This Discussion Thread provides more ideas.
Step 10 of the Voxforge Tutorial and Howto use a tree.hed script that contain "questions" are specific to the English language, and therefore will not work with other languages.
- nsh's overview of how to create clustered triphone "questions" for Sphinx and HTK for new languages
- Ticket #153 - htk error on step 10, and a related thread in the forums
- my post on this Thread on "Error when compiling model" where I discuss Creating clustered triphone "questions"
See also the HTK manual.
Theoretically, you should be able to automatically create questions using the CMU Robust Group Sphinx Tutorial. These would be in Sphinx format, but could be used as a starting point for the creation of HTK questions for a tree.hed script.
From the HTK mailing list archives:
> Hi everyone,
> I am trying to implement a continuous speech recognition in Spanish
> Language. I followed the indications to make the tri-gram language
> model presented in the HTK book. I am using the HDecode tool. My
> firsts results are very poor (WER >= 35%). So I tuned various
> parameters in HDecode, but know I need to make tuning of the language
> model parameter. Any user can help me in that? for example, what
> parameter in the to language model generation are recommended to
> continuous speech?
Did you try to follow htk wsj1 reciept? It has almost everything
required I think:
with all beams used and lm factors. Though for really large vocabulary lm factor should be smaller (around 6-8).
Others may then download your submission and provide feedback on the recording. This will help in classifying your speech audio submission for later merging into the VoxForge Acoustic Models.
There are two ways to provide feedback for a submission:
- rate the submission - click the 'thumbs up' icon or the 'thumbs down' icon at the top right corner of the submission; or
- written feedback - if you want to provide written feedback to a submission, you can click 'reply' at the bottom of the submission to add your comments.
To receive an email notification of any new user submissions to a forum:
click on the title of a forum, a list of threads appears; next
click the subscribe link at the very top of the thread list page.
You need to be logged in for the subscribe link to appear
You need to subscribe to each forum if you want all of them - there is no 'master subscribe'.
When you post to a forum without logging in (as a 'visitor'), there will be a lag of a few minutes before your post will display (you need to refresh your browser for the change to display). There is no such delay if you register and log in to the system.
Debugging Java applets in a Browser on Linux can be tricky because of the way Java caches them. Therefore, if you change the applet, create a jar file, sign and then deploy it, if the jar file has the same name as the old version of your applet, the new version will not get picked up in your browser.
To fix this, you need to remove the old Java applet from your Java deployment cache (browsers do not control where Java applets are stored...).
If you are running Oracle JDK, you first need to open a Java Control Panel window using the jcontrol command:
Start>All Programs>Java>Configure Java
From the Java Control Panel:
Under Temporary Internet Files, click View...
Linux: From the Java Cache Viewer, right-click the name of the applet to remove, and click delete.
Windows: click Show: Resources; then right-click the name of the applet to remove, and click delete.
Close the window and you are done.
If you want to know where these Temporary Internet Files are stored, click the Settings... tab under Temporary Internet Files. The default location for my version of linux (Fedora) is:
HTK does not have a forum, but uses email lists:
You can get some very useful information by searching these archives here:
(select: Search through: All mailing list archives)
If you have an microphone jack on your motherboard (or are using an audio card), and you want to use a USB microphone, the Speech Submission applet will connect to your on-board microphone, and ignore you USB microphone (or only record at very low volumes).
This How-to shows you how to modify the sound.properties file of your Java JRE (Java-run-time Environment) so that you can change the default audio mixer from an on-board microphone to a different microphone (like a USB mic).
Use arecord command to list the devices attached on your PC:
$ arecord --list-devices
**** List of CAPTURE Hardware Devices ****
card 0: NVidia [HDA NVidia], device 0: ALC883 Analog [ALC883 Analog]
Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 2: ALC883 Analog [ALC883 Analog]
Subdevice #0: subdevice #0
One of the listed devices will be your USB device
Modify your sound.properties file using the output from arecord (quote taken from this post):
Note: With JDK 1.5, it is possible to use a sound.properties file in the JRE\lib directory of your JDK installation. This sound.properties file can be used to specify the default Mixer that will be used when AudioSystem.getLine() is called for the various type of Line. In my case, the sound.properties file contains:
javax.sound.sampled.Port=#Port SB Audigy Audio [C400]
javax.sound.sampled.Sourc#SB Audigy Audio [C400]
javax.sound.sampled.TargetDataLine=#SB Audigy Audio [C400]
The mixer name is placed after the # sign. These are actually the defaults for my machine, so this file is not strictly necessary.
VoxForge is not looking for TV or radio announcer quality voices (just listen to my voice recordings ...) or perfect audio quality.
For Free and Open Source Speech Recognition to work, we need a large variety of speech (from different people, with different dialects/accents, and using different prompts files with various phonemes and triphones) recorded in a variety of environments (rooms with echo, such as hardwood floors or tiles, and rooms with no echo, such as carpet, etc.) and on a variety of recording equipment (headset mics, desktop mics, built in mics, and USB mics, integrated audio, audio cards ...).
That is not to say that you should not try to minimize non-speech noise in your audio submissions, it just that the submissions we are looking for should reflect the environments where the acoustic model might be used for speech recognition.
Therefore, most audio submitted to the VoxForge site should receive a thumbs up. This is because it takes some effort to create a recording (when you are first starting out), and new submitters should be encouraged, not discouraged. A "thumbs up" rating would go a long way to encouraging submissions.
What should result in a thumbs down is when a transcription doesn't match its corresponding audio or when there is excessive background noise (i.e. non-speech noise or talking in the background). What is "excessive noise" is subjective, since the Acoustic Model creation process can tolerate some low level hiss and/or hum (usually heard in quiet periods of some recordings). But if enough people submit their rating of a submission, on average we should get a good view of the quality of a recording for use in the creation of acoustic models.
See this post: UNDERSTANDING HTK ERROR MESSAGES
Full error message:
HMM Def Error: Regression Tree definition expected at line 34/col 10/char 1507 in hmmAdapt/hmmdefs
ERROR [+7050] HMError:
ERROR [+7035] LoadAllMacros: Get macro data failed in MMF hmmAdapt/hmmdefs
ERROR [+7050] LoadHMMSet: Macro name expected
ERROR [+3228] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HVite
run HTK 3.2.1 version of HVite
this error occurs when you have an Acoustic Model you adapted with HTK version 3.2.1, but then try to run it against HVite version 3.3
See Ticket #55