Welcome to KingLine Data Center!   Contact Phone: 0086-10-62660053   Email: marketing@speechocean.com

English

人工智能数据资源服务平台 Speechocean Homepage Help

Home > Commercial Resources > ASR-Corpus

Recording Platform

>

All

Language

>

All

Sort By:Default

16 Results

King-ASR-034

The Chinese Mandarin Speech Recognition Corpus was collected in China.

The script contains 19,198(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 20 different speakers (10 males, 10 females) who were balanced distributed in age, gender and regional accents. All data were recorded in 4 different environments.

2 high quality audio channels were used for speech collection.

A pronunciation lexicon is available with a phonemic transcription in Pinyin. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-108

The British English Speech Recognition Corpus was collected in the UK.

The corpus contains 42,818(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 131 different speakers (84 males, 47 females) who were balanced distributed in age (16 – 30, 31 – 45, 46 – 65), gender and regional accents. Each speaker recorded in 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, and etc.).

iOS and Android were used for speech collection. A pronunciation lexicon is available with a phonemic transcription in OALD. All manually checked. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-120

The Chinese Mandarin Speech Recognition Corpus was collected in China.

The script contains 139,922(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 300 different speakers (151 males, 149 females) who were distributed in different ages, genders and regional accents.

A pronunciation lexicon is available with a phonemic transcription in pinyin phone set. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-122

The Japanese Speech Recognition Corpus was collected in Japan.

The corpus contains 200,696(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 100 different speakers (50 males, 50 females) who were balanced distributed in age (18 – 30, 31 – 45, 46 – 60), gender and regional accents. Each speaker was recorded in 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, and etc.).

2 kinds of vehicle (MAZDA WAGON / MAZDA 6) and 2 kinds of Microphone (Shure SM10A / AKG C400 BL) were used when recording.

A pronunciation lexicon is available with a phonemic transcription in. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-125

The Japanese Speech Recognition Corpus was collected in Japan.

The script contains 98,632(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 308 different speakers who were balanced distributed in age, gender and regional accents. Each speaker was recorded in 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, HIGH_SPEED_GOOD_ROAD, and etc.).

A pronunciation lexicon with a phonemic transcription in Hepburn is available. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-129

The Canadian French Speech Recognition Corpus was collected in Montreal, Canada.

The script contains 90,142(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 304 different speakers (152 males, 152 females) who were balanced distributed in age (mainly 16 – 30,31 – 45,46+), gender and regional accents. Each speaker was recorded in 1 or 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, HIGH_SPEED_GOOD_ROAD, and etc.).

3 kinds of vehicle (HONDA ACCORD, CIVIC, TOYOTA) and 3 kinds of Microphone (Shure SM10A / Sennheiser ME104 / AKG Q400) were used when recording. 4 high quality audio channels were used for speech collection.

A pronunciation lexicon is available with a phonemic transcription in SAMPA phone set . All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-131

The American English Speech Recognition Corpus was collected in USA.

The script contains 383,788(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 304 different speakers (152 males, 152 females) who were balanced distributed in age (mainly 18 – 30,31 – 45,46 – 65), gender and regional accents. Each speaker was recorded in 1 or 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, HIGH_SPEED_GOOD_ROAD, and etc.).

3 kinds of vehicle (MAZDA 6, HONDA ACCORD and HONDA CRV) and 3 kinds of Microphone (Shure SM10A / Sennheiser ME104 / AKG C400BL) were used when recording. 4 high quality audio channels were used for speech collection.

A pronunciation lexicon is available with a phonemic transcription in CMU phone set. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-132

The France French Speech Recognition Corpus was collected in France.

The script contains 103,480(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 304 different speakers (142 males, 162 females) who were balanced distributed in age (16 – 30, 31 – 45, 46 – 65), gender and regional accents. Each speaker was recorded in 1 or 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, and etc.).

5 kinds of vehicle (PEUGEOT / TOYOTA / ...) and 8 kinds of Microphone (SHURE SM10A / SENNHEISER ME104 / ...) were used when recording. 4 high quality audio channels were used for speech collection.

A pronunciation lexicon is available with a phonemic transcription in SAMPA phone set. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-134

The Turkish Speech Recognition Corpus was collected in Turkey.

The corpus contains 398,692(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 316 different speakers who were balanced distributed in age (18 – 30, 31 – 45, 46 – 60), gender and regional accents. Each speaker was recorded in 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, HIGH_SPEED_GOOD_ROAD, and etc.).

3 kinds of vehicle (FORD-FOCUS/CITROEN-XZARA/AUDI-A4) and 3 kinds of Microphone (Shure SM10A / Sennheiser ME104 /AKG C400BL) were used when recording.

A pronunciation lexicon is available with a phonemic transcription in SAMPA. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

King-ASR-135

The American English Speech Recognition Corpus was collected in USA.

The script contains 395,712(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 300 different speakers (161 males, 139 females) who were balanced distributed in age (mainly 16 – 30,31 – 45,46 – 65), gender and regional accents. Each speaker was recorded in 1 or 2 different environments taking among 7 possible environments (STOP_MOTOR_RUNNING, LOW_SPEED_ROUGH_ROAD, HIGH_SPEED_GOOD_ROAD, and etc.).

3 kinds of vehicle (FORD FOCUS, VOLKSWAGEN GOLF, VOLKSWAGEN JETTA) and 3 kinds of Microphone (Shure SM10A / Sennheiser ME104 / AKG C400BL) were used when recording. 4 high quality audio channels were used for speech collection.

A pronunciation lexicon is available with a phonemic transcription in OALD phone set. All audio files were manually transcribed and annotated by native transcribers. The corpus follows the general convention of SpeechDat-Car.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com

1 2