Welcome to KingLine Data Center!   Contact Phone: 0086-10-62660053   Email: marketing@speechocean.com

English

Speechocean HomepageHelp

Home > Commercial Resources

Recording Platform

>

All

Language

>

All

Sort By:Default

23 Results

King-AVT-002

This database contains 100 hours of speech which was transcribed and annotated from home videos collected from real life of Chinese native speakers.

King-NLP-034

This corpus contains 220,000 US English SMS sentences. All the words were processed by filtering out the sensitive words, noisy words, repeated chatting sentences. All text sentences were classified into different categories and were proofreaded manually.

King-NLP-037

The corpus contains 74,300 sentences, 2,835,290 Chinese characters. It was collected from the daily life and business of individual person from Taiwan and Hong Kong by getting his/her authorization. All words were proofreaded manually.<br /> *Only for domestic market.

King-ASR-044

The Taiwan Mandarin Mobile Speech Recognition Corpus was collected in Taiwan. It contains the voices of &nbsp;5232 different speakers (2365 males, 2867 females) who were balanced distributed in age (mainly 16 &ndash; 30,31 &ndash; 45,46 &ndash; 60), gender and regional accents (for the details, please see the technical document). The script contains 1,643,521(approx.) utterances in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in 2 environments: quiet enviroments (office, home) and noisy enviroments (cafe, restaurant, street). Mobile platform, i.e. Android was used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in Pinyin. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.

King-ASR-080

The Canada French Mobile Speech Recognition Corpus was collected in Canada. It contains the voices of 50 different speakers (25 males, 25 females) who were balanced distributed in age (mainly 16 &ndash; 30,31 &ndash; 45,46 &ndash; 60), gender and regional accents (for the details, please see the technical document). The script contains 25,049(approx.) utterances in total, covering 16 categories and 44 sub-categories(for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in a quiet office environment. Mobile platforms, i.e. iOS\Android\Windows were used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in XSAMPA. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.

King-ASR-125

The Japanese Speech Recognition Corpus was collected in Japan.<br /> <br /> The script contains 98,825(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.<br /> <br /> This corpus contains the voices of 308 different speakers who were balanced distributed in age, gender and regional accents. Each speaker was recorded in 1 or 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, HIGH_SPEED_GOOD_ROAD, and etc.). <br /> <br /> 4 high quality audio channels were used for speech collection. <br /> <br /> A pronunciation lexicon is available. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.<br /> <br /> For more details, please check the technical document or ask our sales people.<br /> <br /> Contact Information:<br /> Phone: +86-10-62660053<br /> Email: contact@speechocean.com<br />

King-ASR-140

The Canada English Desktop Speech Recognition Corpus was collected in Canada. It contains the voices of 202 different speakers (92 males, 110 females) who were balanced distributed in age (mainly 18 &ndash; 30,31 &ndash; 45,46 &ndash; 60), gender and regional accents (for the details, please see the technical document). The script contains 322,936(approx.) utterances in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in a quiet office environment. Desktop platform, i.e. Windows XP SP2 was used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in SAMPA. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification. <div>&nbsp;</div>

King-ASR-143

The Mexico Spanish Mobile Speech Recognition Corpus was collected in Mexico.It contains the voices of 303 different speakers (151 males, 152 females) who were balanced distributed in age (mainly 16 &ndash; 35,31 &ndash; 45, &gt;46 ), gender and regional accents (for the details, please see the technical document). &nbsp;The script contains 270,681 (approx.) utterances in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in quiet environments (office and home). Mobile platforms, i.e. iOS\Android\Windows were used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in SAMPA. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.

King-ASR-145

The American Spanish Speech Recognition Corpus was collected in USA.<br /> <br /> The script contains 350,408(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.<br /> <br /> This corpus contains the voices of 295 different speakers who were balanced distributed in age, gender and regional accents. Each speaker was recorded in 1 or 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, HIGH_SPEED_GOOD_ROAD, and etc.). <br /> <br /> 4 high quality audio channels were used for speech collection. <br /> <br /> A pronunciation lexicon is available. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.<br /> <br /> For more details, please check the technical document or ask our sales people.<br /> <br /> Contact Information:<br /> Phone: +86-10-62660053<br /> Email: contact@speechocean.com<br />

King-ASR-148

The Italian Mobile Speech Recognition Corpus was collected in Italy. It contains the voices of 300 different speakers (151males,149 females) who were balanced distributed in age (mainly 18&ndash; 30,31 &ndash; 45, &gt;46 ), gender and regional accents (for the details, please see the technical document). The script contains 377,535(approx.) utterances in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in 2 environments: quiet environment (office) and noisy environments (restaurant, street). &nbsp;Mobile platforms, i.e. iOS\Android\Windows were used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in SAMPA. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.

1 2 3