Speech Recognition Engines

Sphinx 4 in action - transcribing Bruce Sterling's reboot 11 closing talk
Date: 7/29/2009 9:51 am
The Laurian Gridinoc's blog has an interesting entry documenting how the author tried to use the Replay tool to transcribe a presentation by Bruce Sterling.  Replay uses the sphinx 4 open source speech recognition engine for speech-to-text.

The quality of the transcription output is not very good.  However, it is a good example of the current state of open source speech recognition to transcribe presentations/podcasts etc.

Replay is interesting also...  It uses a number of open source technologies to help users to store and index presentations (OCR for slides and speech rec for the actual presentation itself).  From their website:

REPLAY is an open source solution developed in java to manage the workflow of audiovisual lecture recordings from production in the classroom to distribution on various channels in an automated manner. In this, it also provides comprehensive functionalities for existing audiovisual archives, repositories or collections.

REPLAY is a solution not only for academia, but also for institutions and companies producing, hosting, managing and allocating audiovisual content.

Key features:

  • Support for automated capture of audio, video, and content

  • Isochronic Indexation, based on OCR (slides) or audio (speech to text)

  • Support for long-term archival

  • Providing various user interfaces (administrativ, operational etc.)

  • Devoted to standards and accepted formats

  • Open Source (GNU LGPL license version 2)


