What is a speech corpus or speech corpora?

Frequently Asked Questions

Flat

User: kmaclean
Date: 1/1/2010 11:52 am

Views: 58481
Rating: 26

A Speech Corpus (or Spoken Corpus) is a database of speech audio files and text transcriptions of these audio files in a format that can be used to create Acoustical Models (which can then be used with a Speech Recognition Engine). ISIP's Switchboard database is a good example of this.

A corpus is one such database. Corpora is the plural of corpus (i.e. it is many such databases).

There are two types of Speech Copora:

(1) Read Speech - which includes

Book excerpts;
Broadcast news;
Lists of words;
Sequences of numbers.

(2) Spontaneous Speech - which includes:

Dialogs - between two or more people (includes meetings);
Narratives - a person telling a story;
Map-tasks - one person explains a route on a map to another;
Appointment-tasks - two people try to find a common meeting time based on individual schedules.

Re: What is a speech corpus or speech corpora?

User: atriokke
Date: 9/28/2012 8:03 pm

Views: 1227
Rating: -34

Hyperlink for Switchboard throwing a 404.

Previous • Next •


Username	Password