VoxForge
From this thread: For Noisy Input
For a recognition result like this:
### read waveform input
Stat: adin_file: input speechfile: seven.wav
STAT: 12447 samples (1.56 sec.)
STAT: ### speech analysis (waveform -> MFCC)
### Recognition: 1st pass (LR beam)
............................................................................pass1_best: <s> 5
pass1_best_wordseq: 0 2
pass1_best_phonemeseq: sil | f ay v
pass1_best_score: -1867.966309
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 120 generated, 120 pushed, 14 nodes popped in 76
sentence1: <s> 5 </s>
wseq1: 0 2 1
phseq1: sil | f ay v | sil
cmscore1: 1.000 0.316 1.000
score1: -1944.799561
(tpavelka's post): Julius outputs two types of scores:
The Viterbi score, e.g.:
score1: -1944.799561
This is the cummulative score of the most likeli HMM path. The Viterbi algorithm (decoder) is just a graph search which compares scores of all possible paths through the HMM and outputs the best one. The problem is, that a score of a path (sentence) depends on the sound files length but also on the sound file itself (see this thread for more discussion). This means that Viterbi scores for different files are not comparable. I understand that you want some kind of measure, which can tell you something about whether the result found by Julius is believable or not. In that case, have a look at
The confidence score, in your example:
cmscore1: 1.000 0.316 1.000
Julius outputs a separate score for each word, so in your example the starting silence has confidence score of 1.0 (i.e. 100%), the word "five" has the score 0.316 (i.e. not that reliable) and the ending silence has again 1.0.
Hi Ken
What is the confidence score that is trustable. I need to find out when out of vocabulary words are spoken. Assuming Julius gives low confidence when idetifying out of vocabulary words as vocabulary words, what is the threshold level that can be used to identify such words? Thank
>What is the confidence score that is trustable.
This depends on many things (the quality of your acoustic model for one...).
You will need to experiment a bit - Simon uses a .7 threshold as its default (which is user modifiable).
I am using htk and i would like to be able to compute word confidence scoring and utterance confidence scoring without using julius
I have read some methods on how to do this(FLDA ,log likelyhood) and are trying to find the best one to fit with htk.
Thanks