Hi ralf,
Thanks.
Robin noticed this a few weeks ago ... I just have not had a chance to look in to it.
It seems like it only occurs on Windows. Everything seems to display OK on Linux (FC6) ... so much for Java being write-once run anywhere :)
see ticket 321 - Windows: SpeechSubmission app for German - umlauts not displaying properly.
Ken
Hello Ken,
I can see that you are trying to find a solution for ticket # 321 (Windows: SpeechSubmission app for German - umlauts not displaying properly).
Perhaps this might help. My different "prompts.txt" files (de1, de2, ... de150) should be encoded in "ANSI" (Notepad++ under Windows XP). Take a look into the Wikipedia:
"the phrase "ANSI" refers to the Windows ANSI code pages [...]."
Notepad++ offers the possibility to convert a prompts.txt file (obviously some kind of Windows ANSI code, perhaps encoded in Windows-1252?) into UTF-8. This option is available via the Notepad++ menu Format-"Convert to UTF-8."
So perhaps my prompts should be converted from ANSI into UTF-8 using Notepad++?
So you wouldn't have to find a solution via Java. You may just use a simple text editor to do the conversion.
Thanks and greetings, Ralf
Hi Ralf,
Thanks for advice, though I think it might be something more than just the character encodings of the text files (ANSI, UTF-8, ...). The reason I think it might be is that they display fine on my install of Linux (FC6). The problem might be related to the default character set the user selects on their Windows or Linux machine.
I need to look into this further,
Ken
Hi Ralf,
I've updated the speech submission app (now on release 0.1.4). The encoding problem should now be fixed.
Basically, Java takes the default encoding of whatever computer it is running on. So even though the prompts might look OK on my computer (using UTF-8), it might look different on someone elses computer (usually Windows).
Please let me know if you still are having character display problems.
thanks,
Ken
Hi Ralf,
>Would it be possible to implement more of my prompts into the speech submission application?
Done. You now have 1200+- German prompts ... enjoy! :)
As an aside, it would be better if your prompts did not contain numerals (e.g. 1010, 555-2234, 1918) - it is better if they are written out (e.g. ten ten, five five five twenty two thirty four, nineteen eighteen). Because for some
thanks,
Ken
HI Ralf,
>The German speech submission application now works fine under all systems.
thanks!
Ken
Hi Ralf,
One other thing ... are the prompts I used OK? Would there be another set that would be better (assuming the characters are fixed)?
thanks,
Ken
Hi Ralf,
thanks for the feedback.
>language model or acoustic model - I don't know the difference
From the VoxForge Tutorial:
All Speech Recognition Engines ("SRE"s) are made up of the following components:
- Language Model or Grammar - Language Models contain a very large list of words and their probability of occurrence in a given sequence. They are used in dictation applications. Grammars are a much smaller file containing sets of predefined combinations of words. Grammars are used in IVR or desktop Command and Control applications. Each word in a Language Model or Grammar has an associated list of phonemes (which correspond to the distinct sounds that make up a word).
- Acoustic Model - Contains a statistical representation of the distinct sounds that make up each word in the Language Model or Grammar. Each distinct sound corresponds to a phoneme.
- Decoder - Software program (like Sphink, Julius, HTK's HVite) that takes the sounds spoken by a user and searches the Acoustical Model for the equivalent sounds. When a match is made, the Decoder determines the phoneme corresponding to the sound. It keeps track of the matching phonemes until it reaches a pause in the users speech. It then searches the Language Model or Grammar file for the equivalent series of phonemes. If a match is made it returns the text of the corresponding word or phrase to the calling program.
>So why not integrate all of them into the VoxForge speech submission application?
Unfortunately, we are getting to the point where I need to create separate builds of the SpeechSubmission app for each language, otherwise the size of the downloadable application will get to big. I will add this an RFE in Trac.
Ken