Click here to register.

Audio and Prompts Discussions

Re: Automatic Segmentation of LibriVox Audio
User: kmaclean
Date: 6/26/2007 11:18 am
Views: 316
Rating: 20

Hi nsh,

The pronunciations generated by my Festival implementation (Fedora FC4) do not always match those in the VoxForge Dict (about 90% do match ...).  They should be the same but they are not.  I think Festival uses an older version of the CMU dictionary (VoxForge uses release 0.6), or I've somehow managed to diverge from the CMU pronunciations with my manual additions to the dictionary.

I currently use Festival to provide draft pronunciations for out-of-vocabulary words.  However, even with the pronunciations generated with Festival, sometimes the rules are not complete, and it omits phones altogether. 

Since the phones don't match exactly, and some words pronunciations generated by Festival are incomplete, I figured that creating a new rule set using the VoxForge Dictionary was the approach easiest approach...


--- (Edited on 6/26/2007 12:18 pm [GMT-0400] by kmaclean) ---

Re: Automatic Segmentation of LibriVox Audio
User: nsh
Date: 6/26/2007 11:44 am
Views: 319
Rating: 26

Ah, I see. Really festival uses CMUdict 0.4 because it's targeted speech synthesis, not speech recognition. Alan commented it earlier:

Well, really one should convert CMUdict-0.6 and train new rules with festvox. festvox/src/lts has script to train included.

--- (Edited on 6/26/2007 11:44 am [GMT-0500] by nsh) ---

Re: Automatic Segmentation of LibriVox Audio
User: nsh
Date: 8/9/2007 12:14 am
Views: 273
Rating: 22

Hm, I used to discover this thing:

Bad thing is that we didn't know about that. Good thing is that we'll be able to segment librivox audio faster.

--- (Edited on 8/9/2007 12:14 am [GMT-0500] by nsh) ---

Re: Automatic Segmentation of LibriVox Audio
User: kmaclean
Date: 8/9/2007 10:17 pm
Views: 1857
Rating: 17

Hi nsh,

Very interesting!

BTW the link you posted is dead ... here is the updated link:

From Kishore Prahallad's post:

We call this project as Interslice  - to be released under Festvox (Alan 
would have more comments).
The basic idea of interslice is to automatically build synthetic voices
from large speech databases typically available from public domain such
as and
Interslice comes with a segmentation tool capable to handling infinitely
large corpora and chunking them into utterances and *.lab files.

This is great news!   Especially if it can easily supply pronunciations for  words not already included in the VoxForge dictionary.

It also sounds similar to Cepstral's commerical product offering called "VoiceForge". From the press release:

Cepstral LLC announced the release of VoiceForge(tm), a web 2.0 product that can turn a set of recorded audio prompts into a Text-to-Speech (TTS) voice capable of saying anything. With VoiceForge(tm), companies or actors can capture or "bank" their voices on their own. Once a voice is synthetically forged, it can be used to speak dynamic information for Entertainment, Telephony, Navigation, Education, or Reminder applications.


--- (Edited on 8/9/2007 11:17 pm [GMT-0400] by kmaclean) ---