VoxForge
See this post (many thanks to the author: scott_glaser):
The information presented here was taken from these posts (many thanks to the authors: scott_glaser & natousayni):
OpenJDK is the free implementation of Sun's Java run-time environment. The browser plugin used in Fedora 9, gcjwebplugin, does not yet support signed plugins. From the Fedora Project Wiki: :
Handling Java Applets
Upstream OpenJDK does not provide a plugin. The Fedora OpenJDK packages include an adaptation of gcjwebplugin, that runs untrusted applets safely in a Web browser. The plugin is packaged as java-1.6.0-openjdk-plugin.
- ...
- The gcjwebplugin adaptation does not support signed applets. Signed applets will run in untrusted mode. Experimental support for signed applets is present in the IcedTea repository, but it is not ready for deployment in Fedora.
- The gcjwebplugin security policy may be too restrictive. To enable restricted applets, run the firefox -g command in a terminal window to see what is being restricted, and then grant the restricted permission in the /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre/lib/security/java.policy file.
* Add i386 Yum repository
* create a new Yum configuration file:
# gedit /etc/yum.repos.d/fedora-i386.repo
* copy these settings into the new configuration file:
[fedora-i386]
name=Fedora $releasever - i386
failovermethod=priority
baseurl=http://download.fedora.redhat.com/pub/fedora/linux/releases/$releasever/Everything/i386/os/
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-$releasever&arch=i386
enabled=1
gpgcheck=1
includepkgs=firefox
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora
#
[updates-i386]
name=Fedora $releasever - i386 - Updates
failovermethod=priority
baseurl=http://download.fedora.redhat.com/pub/fedora/linux/updates/$releasever/i386/
mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=updates-released-f$releasever&arch=i386
enabled=1
gpgcheck=1
includepkgs=firefox
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora* Remove the default Firefox (64-bit) installation
# yum -y erase firefox.x86_64
* Install 32-bit Firefox
# yum -y install firefox.i386
# yum -y install libXtst.i386
Other how-tos state that you need to remove openjdk. However, other programs on Fedora 9, like Eclipse, need the default Java. You should not have to remove your default Java installation (java-1.6.0-openjdk java-1.0.6-openjdk-plugin) because you can use the alternatives command to select the version of Java you need (but you need to make sure you don't use the rpm version of Sun's Java install, because it changes the /usr/bin/java executable to not point to the "alternatives" command symbolic links ).
# mkdir /usr/java
# cd /usr/java
www.java.com/en/download
(the .bin file NOT the rpm.bin - because the rpm changes the /usr/bin/java executable to not point to the "alternatives" command symbolic links).
# chmod +x jre*
# ./jre-6u7-linux-i586.bin
* for a particular user# cd /home/yourusername/.mozilla/plugins# ln-s /usr/java/jre-6u7-linux-i586/plugins/i386/ns7/libjavaplugin_oji.so* for all users:# cd /usr/lib/mozilla/plugins# ln-s /usr/java/jre-6u7-linux-i586/plugins/i386/ns7/libjavaplugin_oji.so- OR -
- Use alternatives command (for all users)
(you can also use the alternatives command to set the plugin link - since FireFox is the only app that needs the Sun Java and it uses libjavaplugin.so rather than the libjavaplugin.so.x86_64 used by other programs (like Eclipse) on 64-bit Fedora 9).# /usr/sbin/alternatives --install /usr/lib/mozilla/plugins/libjavaplugin.so libjavaplugin.so /usr/java/jdk1.6.0_07/jre/plugin/i386/ns7/usr/java/libjavaplugin.so 2
# /usr/sbin/alternatives --config libjavaplugin.so
There is 1 program which provide 'libjavaplugin.so'.
Selection Command
-----------------------------------------------
*+ 1 /usr/java/jre1.6.0_07/plugin/i386/ns7/libjavaplugin_oji.so
Enter to keep the current selection[+], or type selection number: 1
Open-Source Large Vocabulary CSR Engine Julius:
Audacity will let you change your Sample Rate and Bits per Sample to rates higher than what your Sound Card can support. It will record at the highest rate your sound card supports, and then dynamically upsample the audio to the higher rate your selected - without providing any warnings that it did this. This is NOT the approach you should take for any audio submitted to VoxForge. The upsampling and later downsampling for use in Acoustic Models can introduce noise.
Please check your audio card manual to determine the highest Sampling Rate it supports. If you don't have your manual (or lost it, or never received one ...) these other FAQ entries can help you determine your max sampling rate:
Windows: How to determine your audio card's, or USB mic's, maximum sampling rate
Linux: How to determine your audio card's, or USB mic's, maximum sampling rate
wami project (open source version, roll-your-own)
SpeechAPI (uses a Flash client, and Red5 back end)
SpeechCloud (open source client)
The main thing to remember when editing content, is
that WebGUI creates a "version tag" that keeps track of all your edits
(its default name is your username and current date). To publish your
changes so that everyone can see them, you need to "commit" your version tag.
To edit content
you need to be in "admin" mode (click the "admin" link on top right
hand corner of page - your account needs permission for this to show). The Admin Console appears on the left
hand side of your web page.
To commit content click the "Version Tags" category in the Admin Console, select "commit my version". When you click this, the content you edited will become visible to everyone. No content changes are visible until you commit the version tag!
If you log
out and log back in without committing the changes you've made, your changes will not be displayed, and the things you edited will be
locked (the padlock icon appears). You need to select the version tag
that you created in your previous session to make it active. To do
this, click the "Version Tags" category in the Admin Console, and click your version tag.
If you don't select your previous version tag, and start editing something, then WebGUI will
automatically create a new version tag... this can get confusing,
because your previous edits will still be "locked" under your old
version tag.
The commit process is useful because you can edit
multiple web pages (over a period of days) under a single version tag,
and publish them all at once with a single commit.
Note: when editing the Read page for your language, it is best to disable the VoxForge Speech Submission Java Applet. Otherwise, your edits slow to a crawl as your browser attempts to refresh the entire page (including the applet) every time you make a change to a web object.
WebGUI has a tutorial (WebGUI-Primer.pdf) and also a sandbox (http://demo.plainblack.com). Look up Page Layouts, Articles, Forums and Shortcuts.
Some Concepts/Definitions:
A "Page Layout"
is container object. Think of it as a directory or folder that
displays its contents as a web page. If you want to localize the VoxForge menu, change the menu property to your
language. You can add text to a Page Layout, but it is best (easier to format) to only have text in Articles.
"Articles" are objects that contain your text. You
can have more than one article (or any other object) on the same Page
Layout. Just click edit on the article menu, and replace the given text
with your translations. Articles also have a menu property, but this is only used for setting secondary menus (which VoxForge does not use), not the main menu (set the menu property in the Page Layout container).
"Forum" is an object... which, if you are reading this you would obviously know, lets users submit posts and post replies.
"Shortcuts"
are pointers to other objects that are displayed on the Page Layout
they are located in. Usually these point to another article or forum
from the English side of the VoxForge website. You cannot change the
object that the Shortcut points to, but you can "overwrite" some of its
properties. Using this feature you can overwrite the title (i.e.
translate it to another language) of the object being pointed to.
Putting it all together:
Therefore, if an Article and a Shortcut to a Forum (or any other web object) are contained on a Page Layout, when you click the URL of the Page Layout, the content of the Article and the Shortcut will be displayed, in the same way images are displayed on a web page, even though the Article and the Shortcut are both uniquely addressable through their own URL.
Note: Cascading Style Sheets (CSS)
Web objects like Articles and Shortcuts can have their own css properties. However, when these web objects are displayed on a Page Layout container object, the css of the Page Layout overrides the css of the contained object. This is the opposite of how cascading style sheets normally work (where the most specific css entry overrides the general css entry). The reason for this is that each object on a Page Layout has its own URL and can be displayed independently of the Page Layout. Therefore such web objects require a style sheet to be displayed by itself.
Cross-platform FTP client:
Linux:
Windows:
Mac
You may want to have Julius and HTK included in your path environment variable. This way you can execute Julius and/or HTK commands from your Windows console (note that you will not be able to execute bash or Perl scripting commands),
hit your 'end' key to go to the end of the Variable Value string and then add the following entry (all one line, and include the semi-colon ";" at the beginning of the string) to the end of the Variable Value entry field:
;c:\cygwin\HTK\htk-3.3-windows-binary\htk;C:\Cygwin\Julius\julius-3.5-win32bin\bin |
Warning: be very careful here, you can cause serious
problems with your system if you make a mistake. Don't overwrite or
delete anything already there. You are just adding directory entries
for HTK and Julius to the end of your current path. If you think you may have made an error, click the cancel button and restart. |
Click OK in the Edit System Variable window
Click OK in the Environment Variables window
Click OK in the System Properties window
if your system lists all the options available to the hvite command, then HTK is installed properly.
if your system displays version information for Julius, then Julius is installed properly;
If you have a headset microphone, this should be easy to do. Your microphone should be a bit to the side and below your mouth (so the microphone won't pick-up your breathing), and no more than a half inch (1-2 cm) away.
A standalone microphone makes recording a little more difficult. It is very important to keep your mouth at the same distance from the microphone for the entire duration of the recording of one file. The same applies to a handheld microphone, try to be consistent in the way you hold it when recording the prompts for one file. It does not have to be the same distance from one file to another, but must be the same for the duration of one file.
To compile HTK on a 64 bit Fedora Core 4, I used the following commands:
There are a couple of approaches
1. Use the VoxForge Dictionary:
If you are wondering about pronunciations, the VoxForge Dictionary might provide you with some indication as to the pronunciation. For example, the word "etc" shows up as follows in the dictionary:
ETC [ETC] eh t s eh dx er ax
ETCETERA [ETCETERA] eh t s eh dx er ax
You really don't need to know how the phonemes are pronounced in this particular example, because you can see that 'ETC' and 'ETCETERA' contain the same phonemes, and therefore should be pronounced the same.
For other words you are not sure how to pronounce, you can look at their component phonemes and search for similar strings of phonemes until you find a word you know how to pronounce.
For example, for the word "windward", you would look it up in the dictionary and find:
WINDWARD [WINDWARD] w ih n d w er d
You would then search for the string "w er d" and find the word "word"
WORD [WORD] w er d
So now you know you would pronounce the word windward as "wind" + "word".
Note that this is not clearcut in all instances, because some dialects pronounce the "ward" in the word "windward" like the "ward" in the word "award", see this dictionary entry:
AWARD [AWARD] ax w ao r d
Therefore, it all depends on the target users of the speech recognition system and what their own particular dialect is. And if we are targeting an Acoustic Model to this particular dialect, we might add an entry to the dictionary like this:
WINDWARD [WINDWARD] w ih n d w ao r d
But in the non-native speaker case, where you might not have any idea how to pronounce a word, the dictionary is a good start.
2. Listen to Someone Else's Audio
Another approach might be to listen to the audio from someone else's submission to see how they pronounce it.
3. Other Resources
You should be able to reactive your account by performing the following steps
1. clicking Login from the top menu
2. click "click here to register"
3. click "i forgot my password"
4. enter your email address, click save
5. an email will be sent to you with a link (and a new password) to activate your account.
You need to record a particular word from your Grammar more than once in order to create HMM statistical models that are robust enough to recognize your voice - more is better, 3 to 5 is the minimum required so that HTK will compile.
From Nsh's post:
Database size [...] is only one of the factors that affect accuracy, there are many others. And there is no direct dependency between size and accuracy. It's possible to have good accuracy with 70 hours, it's possible that with 10 thousands you'll have bigger error rate.
Here on page 13 you can find comparision of accuracy and database size
http://mi.eng.cam.ac.uk/research/projects/EARS/pubs/evermann_sttmay04.pdf
Basically the difference in accuracy between 400 hours and 2200 hours is 2%.
Create a new directory in your home directory called 'bin', it
should have the following path (replace yourusername with the username
you are using on your system):
Get the tarball of the most current version of the Julius source files.
and save it to your new bin directory.
Extract the file using:
this should create a julius-4.3.1 directory in your bin folder.
After unpacking the sources, open a command line terminal and go to the /hom/yourusername/bin/julius-4.3.1 directory where you downloaded your files.
The default location for binaries is "/usr/local" which will put the tools in "/usr/local/bin". You need to change this default location using the "./configure" script to specify where you want the binaries installed:
$./configure --with-mictype=alsa --enable-setup=standard --prefix=~/bin/julius-4.3.1 |
(Note: ~/bin/julius-4.3.1 points to /home/yourusername/bin/julius-4.3.1)
This directs the make command to put all your binaries in the following folder:
you will need the .686 version of alsa-lib as root:
# yum install alsa-lib-devel.i686 |
then add a flag before you run the Julius configure to tell your gcc compiler to compile 32-bit binaries:
$ CFLAGS=-m32 ./configure --with-mictype=alsa --enable-setup=standard --prefix=~/bin/julius-4.3.1 --host=i686-generic-linux-gnu |
To build the libraries and binaries, execute the following:
$make all |
Running the following command will install them:
$make install |
To update your user path, you need to add the '$HOME/bin/julius-4.3.1/bin' path to your path variable. To do this, edit your '.bash_profile' file in your home directory (in Fedora you need to show 'hidden files' in Nautilus - so you can display file names with a period in front of them) and add a colon (":") and this path to the end of the PATH variable (leaving the rest of it unchanged):
# User specific environment and startup programs PATH=$PATH:$HOME/bin/julius-4.3.1/bin |
Log out and log back in to make your path change effective.
This how-to assumes you already have FireFTP installed in your FireFox browser. You can download FireFTP from here (note: it requires Firefox 1.5 or greater).
From FireFox:
FireFTP should appear in a new tab in your FireFox browser.
Inside the FireFTP tab:
Click "Connect" (located next to drop down list of accounts) to make the actual connection to the VoxForge FTP site.
Before you begin, you need to make sure that the room you are recording in is as quiet as possible. You do not need an acoustically sound-proof room, but you need to make sure that, while you are recording, there are no external noises that your microphone might pick up and which may render you speech audio files unusable for Acoustic Model creation purposes. Use common sense: you should not have any music in the background, no fans, air conditioners, microwaves, television, etc. In addition, make sure you turn off you speakers while recording - to avoid acoustic feedback in your audio files.
This Discussion Thread provides more ideas.
Step 10 of the Voxforge Tutorial and Howto use a tree.hed script that contain "questions" are specific to the English language, and therefore will not work with other languages.
For more information on how to create a tree.hed file for a new language, see the following links:
See also the HTK manual.
Theoretically, you should be able to automatically create questions using the CMU Robust Group Sphinx Tutorial. These would be in Sphinx format, but could be used as a starting point for the creation of HTK questions for a tree.hed script.
From the HTK mailing list archives:
> Hi everyone,
>
> I am trying to implement a continuous speech recognition in Spanish
> Language. I followed the indications to make the tri-gram language
> model presented in the HTK book. I am using
the HDecode tool. My
> firsts results are very poor (WER >= 35%). So I tuned various
> parameters in HDecode, but know I need to make tuning of the language
> model parameter. Any user can help me in that? for example, what
> parameter in the to language model generation are recommended to
> continuous speech?
Did you try to follow htk wsj1 reciept? It has almost everything
required I think:
http://www.inference.phy.cam.ac.uk/kv227/htk/
http://www.inference.phy.cam.ac.uk/kv227/lm_giga/
with all beams used and lm factors. Though for really large vocabulary lm factor should be smaller (around 6-8).
Nickolay (nsh) has written up an excellent overview on how to improve speech recognition using Sphinx: Speech Recognition With CMU Sphinx.
Once you have posted some audio it will be posted on the Voxforge Listen page.
Others may then download your submission and provide feedback on the recording. This will help in classifying your speech audio submission for later merging into the VoxForge Acoustic Models.
There are two ways to provide feedback for a submission:
To receive an email notification of any new user submissions to a forum:
click on the title of a forum, a list of threads appears; next
click the subscribe link at the very top of the thread list page.
Notes:
You need to be logged in for the subscribe link to appear
You need to subscribe to each forum if you want all of them - there is no 'master subscribe'.
When you post to a forum without logging in (as a 'visitor'), there will be a lag of a few minutes before your post will display (you need to refresh your browser for the change to display). There is no such delay if you register and log in to the system.
Debugging Java applets in a Browser on Linux can be tricky because of the way Java caches them. Therefore, if you change the applet, create a jar file, sign and then deploy it, if the jar file has the same name as the old version of your applet, the new version will not get picked up in your browser.
To fix this, you need to remove the old Java applet from your Java deployment cache (browsers do not control where Java applets are stored...).
If you are running Oracle JDK, you first need to open a Java Control Panel window using the jcontrol command:
$/usr/java/latest/jre/bin/jcontrol
Start>All Programs>Java>Configure Java
Under Temporary Internet Files, click View...
Linux: From the Java Cache Viewer, right-click the name of the applet to remove, and click delete.
Windows: click Show: Resources; then right-click the name of the applet to remove, and click delete.
Close the window and you are done.
If you want to know where these Temporary Internet Files are stored, click the Settings... tab under Temporary Internet Files. The default location for my version of linux (Fedora) is:
/home/username/.java/deployment/cache
HTK does not have a forum, but uses email lists:
You can get some very useful information by searching these archives here:
http://htk.eng.cam.ac.uk/cgi-bin/search.cgi
(select: Search through: All mailing list archives)
If you have an microphone jack on your motherboard (or are using an audio card), and you want to use a USB microphone, the Speech Submission applet will connect to your on-board microphone, and ignore you USB microphone (or only record at very low volumes).
This How-to shows you how to modify the sound.properties file of your Java JRE (Java-run-time Environment) so that you can change the default audio mixer from an on-board microphone to a different microphone (like a USB mic).
Use arecord command to list the devices attached on your PC:
$ arecord --list-devices
**** List of CAPTURE Hardware Devices ****
card 0: NVidia [HDA NVidia], device 0: ALC883 Analog [ALC883 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: NVidia [HDA NVidia], device 2: ALC883 Analog [ALC883 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
One of the listed devices will be your USB device
Modify your sound.properties file using the output from arecord (quote taken from this post):
Note: With JDK 1.5, it is possible to use a sound.properties file in the JRE\lib directory of your JDK installation. This sound.properties file can be used to specify the default Mixer that will be used when AudioSystem.getLine() is called for the various type of Line. In my case, the sound.properties file contains:
javax.sound.sampled.Port=#Port SB Audigy Audio [C400]
javax.sound.sampled.Sourc#SB Audigy Audio [C400]
javax.sound.sampled.TargetDataLine=#SB Audigy Audio [C400]
The mixer name is placed after the # sign. These are actually the defaults for my machine, so this file is not strictly necessary.
VoxForge is not looking for TV or radio announcer quality voices (just listen to my voice recordings ...) or perfect audio quality.
For Free and Open Source Speech Recognition to work, we need a large variety of speech (from different people, with different dialects/accents, and using different prompts files with various phonemes and triphones) recorded in a variety of environments (rooms with echo, such as hardwood floors or tiles, and rooms with no echo, such as carpet, etc.) and on a variety of recording equipment (headset mics, desktop mics, built in mics, and USB mics, integrated audio, audio cards ...).
That is not to say that you should not try to minimize non-speech noise in your audio submissions, it just that the submissions we are looking for should reflect the environments where the acoustic model might be used for speech recognition.
Therefore, most audio submitted to the VoxForge site should receive a thumbs up. This is because it takes some effort to create a recording (when you are first starting out), and new submitters should be encouraged, not discouraged. A "thumbs up" rating would go a long way to encouraging submissions.
What should result in a thumbs down is when a transcription doesn't match its corresponding audio or when there is excessive background noise (i.e. non-speech noise or talking in the background). What is "excessive noise" is subjective, since the Acoustic Model creation process can tolerate some low level hiss and/or hum (usually heard in quiet periods of some recordings). But if enough people submit their rating of a submission, on average we should get a good view of the quality of a recording for use in the creation of acoustic models.
See this post: UNDERSTANDING HTK ERROR MESSAGES
Full error message:
HMM Def Error: Regression Tree definition expected at line 34/col 10/char 1507 in hmmAdapt/hmmdefs
ERROR [+7050] HMError:
ERROR [+7035] LoadAllMacros: Get macro data failed in MMF hmmAdapt/hmmdefs
ERROR [+7050] LoadHMMSet: Macro name expected
ERROR [+3228] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HVite
Fix:
run HTK 3.2.1 version of HVite
this error occurs when you have an Acoustic Model you adapted with HTK version 3.2.1, but then try to run it against HVite version 3.3
See Ticket #55