German

Flat
Building a Dictionary with the Help of eSpeak
User: timobaumann
Date: 12/12/2007 5:59 am
Views: 18556
Rating: 53

Hi,

I have uploaded a little script to convert eSpeak's (v. 1.29) output to something that is close to our phoneme set. More information is available, the script itself is here.

Please try to use it and inform me of any problems you may have.

We should now go on and start to build a dictionary for the words contained in our contributions. Then continue on for other important words. 

For the moment we will have to review the output of our scripts. Maybe we can find someone interested in grapheme-to-phoneme conversion and improve eSpeak's output or build or own G2P?

 

Could someone create a simon-compatible (HTK) dictionary?
User: ralfherzog
Date: 11/22/2008 5:36 am
Views: 105
Rating: 10

Hello! Could someone create a german pronounciation lexicon (GPL) that is simon-compatible (HTK)? Maybe someone is able to write a Perl script that could convert the PLS dictionary into an ASCII dictionary? As far as I know, HTK accepts only ASCII, not IPA. Thanks, Ralf

Re: Could someone create a simon-compatible (HTK) dictionary?
User: kmaclean
Date: 11/25/2008 12:21 pm
Views: 410
Rating: 10

Hi Ralf,

>Could someone create a german pronounciation lexicon (GPL) that is

>simon-compatible (HTK)?

You can use an xsl transform and your browser to do this... (see this link for more information: XSLT - Transformation)

First, download the current voxDE-lexicon.xml file. 

Next change the XML lexicon header as follows:

<lexicon version="1.0"
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" xml:lang="de">

so it looks like this:

<lexicon>

Next, add the following to the *second* line of the XML file:

<?xml-stylesheet type="text/xsl" href="voxDE-lexicon.xsl"?>

This creates a link to an xsl file (to be called "voxDE-lexicon.xsl") that you will create in the same directory as you XML pronuniciation dictionary.

Next, create a new file called "voxDE-lexicon.xsl" (the XSL transform script) containing the following:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/lexicon">
  <html>
  <body>
    <h2>German Pronunciation dictionary</h2>
    <table>
    <xsl:for-each select="/lexicon/lexeme">
    <tr>
      <td><xsl:value-of select="grapheme"/></td>   
      <td><xsl:value-of select="phoneme"/></td>
    </tr>
    </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template>

</xsl:stylesheet>

Open "voxDE-lexicon.xml" in your browser (I used Firefox) and then copy the result to a text file and voila...

Hope that helps,

Ken

Re: Could someone create a simon-compatible (HTK) dictionary?
User: kmaclean
Date: 11/27/2008 11:05 am
Views: 111
Rating: 9

Hi Ralf,

After thinking about this a little more, I realized that you needed a way to convert the phonemes to something more agreeable to HTK (i.e. ascii letters or numbers).

Thinking that this could be done in XSL with a browser, I found the "character-map" xsl command and created the following XSL script:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output use-character-maps="cmap"/>

<xsl:character-map  name="cmap">
     <xsl:output-character  character="ː" string=" 1 "/>
     <xsl:output-character  character="ə" string=" 2 "/>
</xsl:character-map>

<xsl:template match="/lexicon">

  <html>
  <body>
    <h2>German Pronunciation dictionary</h2>
    <table>
    <xsl:for-each select="/lexicon/lexeme">
    <tr>
      <td><xsl:value-of select="grapheme"/></td>  
      <td><xsl:value-of select="phoneme" /></td>
 

    </tr>

    </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template>

</xsl:stylesheet>

The only problem is that it does *not* work (on FireFox 3 at least...), and I don't know why. All the examples that I Googled seem to use this approach, but I cannot make it work...

If anyone can see where my mistake might be, please let me know,

thanks,

Ken

Re: Could someone create a simon-compatible (HTK) dictionary?
User: kmaclean
Date: 11/27/2008 11:14 am
Views: 3203
Rating: 9

Hi Ralf,

Since I could not figure it out in XSL, here is a simple Perl script that (I think...) does what you want (see below). 

You need to change the $xmlfile variable to point to your XML file, and you need to add your own phoneme conversions in the %characterTranscriptions hash (a 'hash' is just a Perl data structure that has a key - in this case the original German language phoneme, and a value - in this case what you want to convert it to).

You also will need to install some additional perl packages using CPAN: "XML::LibXML" which is require to process the XML files, and "Encode" which is required to deal with UTF-8 conversions.  The other packages: "Carp" and "Diagnostics" are used for debugging, and don't need to be installed (they are still good to have...).

Ken

 

#! /usr/bin/perl
####################################################################
###
### script name: xmlPronDict.pl
### created by: Ken MacLean
### email: [email protected]
### Date: 2008.11.27
### Command: ./xmlPronDict.pl
### Version: 0.1
###       
### Copyright (C) 2008 Ken MacLean
###
### This program is free software; you can redistribute it and/or
### modify it under the terms of the GNU General Public License
### as published by the Free Software Foundation; either version 3
### of the License, or (at your option) any later version.
###
### This program is distributed in the hope that it will be useful,
### but WITHOUT ANY WARRANTY; without even the implied warranty of
### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
### GNU General Public License for more details.
###
####################################################################
use strict;
use diagnostics;
use Carp;
use XML::LibXML;
use Encode;

my $xmlfile = "voxDE-lexicon.xml";

my %characterTranscriptions = (
  't' => ' 1 ',
  'ə' => ' 2 ',
  'ː' => ' 3 ',
);

my $parser = XML::LibXML->new();
my $doc = $parser->parse_file( $xmlfile );   
my @lexemeList = $doc->findnodes('/lexicon/lexeme');
foreach my $lexeme (@lexemeList) {
    my @lexemeNodes = $lexeme->childNodes;   
    foreach my $node (@lexemeNodes) {
        my $childnode = $node->textContent;
        $childnode =~ s/\s//g;
    if ($node->nodeName =~ /grapheme/) {
       $childnode = encode("utf8",$childnode);
       print "$childnode \t[$childnode]\t\t";
    } elsif ($node->nodeName =~ /phoneme/) {
        processChildNode($childnode);
    }
    }
}

sub processChildNode {
  my $childnode = shift;
  my @characters = split(//,$childnode);
  my $word;
  foreach my $character (@characters) {
     $character = encode("utf8",$character);
        if ($characterTranscriptions{$character}) {
              $word .= $characterTranscriptions{$character};       
        } else {
            $word .= $character;
        }
  }
  print $word . "\n";
}

 

Micro sd hc
User: women bandage dress
Date: 11/25/2013 1:32 am
Views: 42
Rating: 0

Thanks for the good writeup. It in truth was once a enjoyment account it. Glance advanced to far brought agreeable from you! By the way, how can we communicate?

<a href="http://www.fashionimm.com/women-real-rabbit-raccoon-long-jacket-coat-fur.html">Real Rabbit fur coat</a>
Micro sd hc
User: women bandage dress
Date: 11/25/2013 1:36 am
Views: 2335
Rating: 0

Well, very good post with informative information. I really appreciate the fact that you approach these topics from a stand point of knowledge and information. This is the first time, I visited at your site and became your fan. You are bookmarked. Please keep on posting.

<a href="http://www.fashionimm.com/women-s-wool-collar-knit-rabbit-fur-vest-with-strap-beige.html">fur Vest</a>
PreviousNext