Simon Dialog Manager and Julian Speech Recognition

General Discussion

User: kmaclean
Date: 4/19/2007 10:24 am

Views: 13107
Rating: 45

Hi bedahr,

I managed to translate enough of the GUI (using Google translation, and recompiling the source) to get a basic understanding of what Simon does/will do.

Some questions/clarifications (note: these comments are based on my rough translations from German to English using Google, so I may be misinterpreting things because of this ...) :

juliusd

The Julius Daemon (juliusd) seems like it starts *Julian* in server mode, and then opens up a console that is essentially a replacement for jcontrol. Does juliusd essentially act as an API to Julian for Simon? i.e. Simon has no direct contact with Julian, and only gets recognition Julian results from juliusd?

Juliusd has a configuration setting that points to a julian.jconf. I guess that this is where julian gets pointed to its Acoustic Model. I had to modify the juliusd settings as follows to get things to work:
Command: julian
Arguments: -input mic -C julian.jconf

I put the julian.jconf configuration file in the juliusd/bin directory, with the following configurations:
-h acoustic_model_files/hmmdefs
-hlist acoustic_model_files/tiedlist

I then copied the most current VoxForge Acoustic Models to the juliusd/bin/acoustic_model_files directory, and started juliusd in its own console as follows:
$cd juliusd/bin/
$juliusd

The output in the juliusd console looked like some of the output usually seen with Julian starts-up, so I think I got things set properly.

How does the “Send one Test Word” work? I'm not sure I understand what it is supposed to do...

Simon

System tab
In the Simon System tab, you set your Julian grammar files (the .voca and .grammar files), and the corresponding system commands. So it seems like you could speak a command, and Simon would send the request to the operating system or application. Will Simon be sending the commands to x-windows (like x-voice), or will it use some other method? Do you have any sample command files? - I'd like to get an idea of what format they are supposed to be in.

You can also point to a pronunciation dictionary and a prompts file. It seems like Simon is a GUI front that can permit someone to add new words to a Julian Grammar file, using pronunciation information from the pronunciation dictionary (so the user does not have to enter phonemes by hand).

I seem to be able to connect to juliusd, but am not quite sure how everything is supposed to work. When I click the Connect link in Simon, I get a message in juliusd that the server was connected, but I am not sure how recognition, and corresponding command execution is to take place.

Word add
In the main window for Simon, there is a Word icon that seems to let you add a new word, and record speech audio corresponding to that word. I assume that this is for the purpose of gathering acoustic data so that the julian acoustic model can be adapted using audio for the new word.

I'm wondering how any new words might be trained into the Acoustic Model. If you need to use HTK, it would add a level of complexity for new users, because they would have to download the HTK source themselves and compile it (since HTK has distribution restrictions on the source and binaries).

Word List
This seems like the repository for all the words added in word add ... i.e. a place where you can manage all the words that have been added to the system.

It also seems like it is set up to point to an Acoustic Model trainor (like HTK ...?) - is that what it is for?

Train
Seems like a way to prompt users to record sentences – using either their own text that they import or some predefined text. But I am not clear how this is to be used to update the acoustic models – since there is no sentence repository GUI front-end, as there is for Word Adds.

Implement
This seems like where you select the programs that Simon will be sending recognized commands to.

Thanks,

Ken

--- (Edited on 4/19/2007 11:24 am [GMT-0400] by kmaclean) ---