General Discussion

Flat
transcribing seminar videos
User: Visitor
Date: 2/25/2010 6:21 am
Views: 4674
Rating: 2

i have a group of seminar videos, I am looking at setting up a website like metavid ( http://metavid.org/ ) witch is based on the wikipedia software


is there a way to improve the speech recognition by using the improved transcripts?

--- (Edited on 2/25/2010 6:21 am [GMT-0600] by Visitor) ---

Re: transcribing seminar videos
User: nsh
Date: 2/25/2010 1:02 pm
Views: 72
Rating: 2

There are ways, but they depend on the decoder you are using.

--- (Edited on 2/25/2010 22:02 [GMT+0300] by nsh) ---

Re: transcribing seminar videos
User: tom_a_sparks
Date: 2/25/2010 11:54 pm
Views: 60
Rating: 1

I have read http://www.voxforge.org/home/dev/autoaudioseg

what decoder to you recommend?

--- (Edited on 2/25/2010 11:54 pm [GMT-0600] by tom_a_sparks) ---

Re: transcribing seminar videos
User: kmaclean
Date: 2/26/2010 12:20 pm
Views: 80
Rating: 2

> what decoder to you recommend?

Regardless of which decoder you decide to use, there is the issue of the need for a specialized language model.  From this post (PyCon transcription):

To build a specialized model you can take transcription of the previous conferences, mailing list archives, related documentation, technical papers [...] and so on. This language model will be more suitable for decoding reports.

--- (Edited on 2/26/2010 1:20 pm [GMT-0500] by kmaclean) ---

Re: transcribing seminar videos
User: tom_a_sparks
Date: 2/28/2010 5:21 am
Views: 95
Rating: 1

that sound like a catch-22,

so can i use Speaker Independent Acoustic Model and a adapt it to include what i need?

 

--- (Edited on 2/28/2010 5:21 am [GMT-0600] by tom_a_sparks) ---

Re: transcribing seminar videos
User: kmaclean
Date: 3/8/2010 11:30 pm
Views: 50
Rating: 2

>so can i use Speaker Independent Acoustic Model and a adapt it to

>include what i need?

Note: you need both a language model and an acoustic model

You can adapt a speaker independent acoustic model using some of the (manually transcribed) speech from the video you want to automatically transcribe. You may get better recognition rates by converting the audio used to create your speaker independent acoustic model into the same compressed format used in the video, and then training a new acoustic model using this 'compressed' audio (see David Gelbart's post in this thread).

Then use all the video transcriptions you currently have (and as nsh stated: transcription of the previous conferences, mailing list archives, related documentation, technical papers [...] and so on) to create a language model that is specific to the type of speech being uttered in the video.

--- (Edited on 3/9/2010 12:30 am [GMT-0500] by kmaclean) ---

Re: transcribing seminar videos
User: tom_a_sparks
Date: 3/9/2010 12:24 am
Views: 1888
Rating: 1

I was looking at something like this[1], but use the speech recognitor to do the transcript, and add the miss recognited words to the speech recognition database and repeat until all the words are recognited

[1]

you might try to transcribe one hour's worth of audio, create an acoustic model from this.  Then create a language model, and then try recognizing another hour of audio.  Next use the transcriptions generated from the recognition results to re-train an acoustic model with the additional transcribed audio (i.e. now you are training an acoustic model with 2 hours of audio).  Your acoustic models will get better as they are trained with more audio.  Keep iterating this process until all the videos are completed.
- http://www.voxforge.org/home/forums/message-boards/speech-recognition-engines/looking-for-an-engine-to-extract-voice-from-video

--- (Edited on 3/9/2010 12:24 am [GMT-0600] by tom_a_sparks) ---

Re: transcribing seminar videos
User: nsh
Date: 2/26/2010 12:54 pm
Views: 150
Rating: 2

http://cmusphinx.sourceforge.net/sphinx4/

--- (Edited on 2/26/2010 21:54 [GMT+0300] by nsh) ---

PreviousNext