Click here to register.

German

Flat
Re: German "Sonderzeichen" (ä,ö,ü,ß)
User: kmaclean
Date: 2/8/2008 5:41 pm
Views: 116
Rating: 22    Rate [
]

Hi ralf,

Thanks.

Robin noticed this a few weeks ago ... I just have not had a chance to look in to it. 

It seems like it only occurs on Windows.  Everything seems to display OK on Linux (FC6) ... so much for Java being write-once run anywhere :)

see ticket 321 -  Windows: SpeechSubmission app for German - umlauts not displaying properly.

Ken 

 

Reply
"Sonderzeichen" (ä,ö,ü,ß): ANSI-to-UTF-8-conversion
User: ralfherzog
Date: 3/24/2008 5:52 am
Views: 113
Rating: 13    Rate [
]

Hello Ken,

I can see that you are trying to find a solution for ticket # 321 (Windows: SpeechSubmission app for German - umlauts not displaying properly).

Perhaps this might help.  My different "prompts.txt" files (de1, de2, ... de150) should be encoded in "ANSI" (Notepad++ under Windows XP).  Take a look into the Wikipedia:

"the phrase "ANSI" refers to the Windows ANSI code pages [...]."

Notepad++ offers the possibility to convert a prompts.txt file (obviously some kind of Windows ANSI code, perhaps encoded in Windows-1252?) into UTF-8.  This option is available via the Notepad++ menu Format-"Convert to UTF-8."

So perhaps my prompts should be converted from ANSI into UTF-8 using Notepad++?

So you wouldn't have to find a solution via Java.  You may just use a simple text editor to do the conversion.

Thanks and greetings, Ralf

Reply
Re: "Sonderzeichen" (ä,ö,ü,ß): ANSI-to-UTF-8-conversion
User: kmaclean
Date: 3/24/2008 12:10 pm
Views: 142
Rating: 27    Rate [
]

Hi Ralf,

Thanks for advice, though I think it might be something more than just the character encodings of the text files (ANSI, UTF-8, ...).  The reason I think it might be is that they display fine on my install of Linux (FC6).  The problem might be related to the default character set the user selects on their Windows or Linux machine.

I need to look into this further,

Ken 

 

Reply
Re: "Sonderzeichen" (ä,ö,ü,ß): ANSI-to-UTF-8-conversion
User: kmaclean
Date: 4/3/2008 10:03 pm
Views: 105
Rating: 18    Rate [
]

Hi Ralf,

I've updated the speech submission app (now on release 0.1.4).  The encoding problem should now be fixed. 

Basically, Java takes the default encoding of whatever computer it is running on.  So even though the prompts might look OK on my computer (using UTF-8), it might look different on someone elses computer (usually Windows).

Please let me know if you still are having character display problems.

thanks,

Ken 

Reply
"Sonderzeichen" (ä,ö,ü,ß) are being displayed correctly.
User: ralfherzog
Date: 4/4/2008 4:30 pm
Views: 101
Rating: 14    Rate [
]
Hi Ken,

The German speech submission application now works fine under all systems.  I just tested it under Windows XP (32-bit), Window Vista (64-bit), and Ubuntu Linux (32-bit).  The German signs "ä, ö, ü, ß" are being displayed correctly.

Would it be possible to implement more of my prompts into the speech submission application?

Keep up the good work.

Greetings, Ralf
Reply
Re: "Sonderzeichen" (ä,ö,ü,ß) are being displayed correctly.
User: kmaclean
Date: 4/6/2008 7:52 pm
Views: 107
Rating: 16    Rate [
]

Hi Ralf,

>Would it be possible to implement more of my prompts into the speech submission application?

Done.  You now have 1200+- German prompts  ... enjoy!  :)

As an aside, it would be better if your prompts did not contain numerals (e.g. 1010, 555-2234, 1918) - it is better if they are written out (e.g. ten ten, five five five twenty two thirty four, nineteen eighteen).  Because for some

  • dates (e.g. Nineteen-Eighteen vs. Nineteen-Hundred-and-Eight, ...),
  •  telephone numbers (five-five-five-twenty-two-thirty-four vs. five-five-five-two-two-three -four, ...) and
  •  long numbers (e.g. ten-ten vs one-thousand-and-ten vs one-oh-one-oh, ...),
different people say them differently (in English at least, I assume the same applies to most other languages).  If they are written out, you can be sure that it will be read consistently. 

thanks, 

Ken 

Reply
Re: "Sonderzeichen" (ä,ö,ü,ß) are being displayed correctly.
User: kmaclean
Date: 4/6/2008 7:57 pm
Views: 455
Rating: 16    Rate [
]

HI Ralf,

>The German speech submission application now works fine under all systems.

thanks!

Ken 

Reply
Re: German "Sonderzeichen" (ä,ö,ü,ß)
User: kmaclean
Date: 2/8/2008 5:44 pm
Views: 103
Rating: 20    Rate [
]

Hi Ralf,

One other thing ... are the prompts I used OK?  Would there be another set that would be better (assuming the characters are fixed)?

thanks,

Ken 

 

Reply
integration of prompts (de1, de2, de3, ..., de100)
User: ralfherzog
Date: 2/9/2008 6:59 pm
Views: 216
Rating: 21    Rate [
]
Hi Ken,

OK, so you knew already about the problem with the special characters of the German language.

In my opinion, all of my prompts should be OK. So if you want, you can implement all of my prompts (de1, de2, de3, ..., de100).  At the moment, I am preparing to submit more prompts.  It should be possible to build a not too bad first statistical model (language model or acoustic model - I don't know the difference) of the German language, at least I hope so.

I try to submit normal sentences of the German language.  Most of those sentences should be of a medium level - not too easy and not too complicated. That means I'm trying to cover a lot of situations, and a lot of words.  And those words should have a distribution that is typical for the German language.  To achieve this goal, it is necessary to submit much more prompts than I already have submitted.  I will continue the work.  And I hope that other speakers will follow.  This was a lot of work dictating them with Dragon NaturallySpeaking, and editing them.

There shouldn't be major mistakes in my prompts.  It would be good if other persons would use my prompts.  They don't have to create their own prompts.  I have done the first steps.  So this should be a good basis.

My prompts should build a whole unit.  So why not integrate all of them into the VoxForge speech submission application?

Greetings, Ralf
Reply
Re: integration of prompts (de1, de2, de3, ..., de100)
User: speechsubmission
Date: 2/11/2008 12:28 pm
Views: 231
Rating: 26    Rate [
]

Hi Ralf,

thanks for the feedback. 

>language model or acoustic model - I don't know the difference

From the VoxForge Tutorial:

All Speech Recognition Engines ("SRE"s) are made up of the following components:

  • Language Model or Grammar - Language Models contain a very large list of words and their probability of occurrence in a given sequence.  They are used in dictation applications.  Grammars are a much smaller file containing sets of predefined combinations of words.  Grammars are used in IVR or desktop Command and Control applications.   Each word in a Language Model or Grammar has an associated list of phonemes (which correspond to the distinct sounds that make up a word).
  • Acoustic Model - Contains a statistical representation of the distinct sounds that make up each word in the Language Model or Grammar.  Each distinct sound corresponds to a phoneme.
  • Decoder - Software program (like Sphink, Julius, HTK's HVite) that takes the sounds spoken by a user and searches the Acoustical Model for the equivalent sounds.  When a match is made, the Decoder determines the phoneme corresponding to the sound.  It keeps track of the matching phonemes until it reaches a pause in the users speech.  It then searches the Language Model or Grammar file for the equivalent series of phonemes.  If a match is made it returns the text of the corresponding word or phrase to the calling program. 

>So why not integrate all of them into the VoxForge speech submission application?

Unfortunately, we are getting to the point where I need to create separate builds of the SpeechSubmission app for each language, otherwise the size of the downloadable application will get to big.  I will add this an RFE in Trac.

Ken 

Reply
PreviousNextAdd