Welcome to KingLine Data Center!   Contact Phone: 0086-10-62660053   Email: marketing@speechocean.com

English

Speechocean HomepageHelp

Commercial Resources

What's Hot What's New View All...>>
The American English Speech Recognition Corpus was collected in USA.

The script contains 829,631(approx.) utterances in total, specially designed to provide materials for both training and testing of speech recognizers. Each utterance wave was stored in a separate file and uncompressed.

This corpus contains the voices of 2,602 different speakers (1,232 males, 1,370 females) who were balanced distributed in age (16 – 30, 31 – 45, >45), gender and regional accents. Each speaker was recorded in quiet office and home environment.

Mobile platform, i.e. Android was used for speech collection. A pronunciation lexicon is available with a phonemic transcription in CMU. All manually checked. All audio files were manually transcribed and annotated by native transcribers.

For more details, please check the technical document or ask our sales people.

Contact Information:
Phone: +86-10-62660053
Email: contact@speechocean.com
The Chinese Mandarin Mobile Speech Recognition Corpus was collected in China. It contains the voices of 4062 different speakers (1937 males, 2125 females) who were balanced distributed in age (mainly 16 – 30,31 – 45,46 – 60), gender and regional accents (for the details, please see the technical document). The script contains 2,125,560(approx.) utterances in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in a  quiet  office environment. Mobile platforms, i.e. iOS\Android\Windows were used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in Pinyin. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.
The Chinese Mandarin Mobile Speech Recognition Corpus was collected in China. It contains the voices of 1200 different speakers (602 males, 598 females) who were balanced distributed in age (mainly 18 – 35, 36 – 45, 46 – 60), gender and regional accents (for the details, please see the technical document). The script contains 359,451(approx.) utterances in total, covering 13 categories and 42 sub-categories(for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded under 2 environments: quiet environments (office/home) and noisy environments (street, restaurant, car). Mobile platforms, i.e. iOS, Android, Windows Mobile and Symbian were used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in Pinyin. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.
The Chinese Mandarin Mobile Speech Recognition Corpus was collected in China. It contains the voices of 5048 different speakers ( 2584 males, 2464 females) who were balanced distributed in age (mainly 16 – 35,31 – 45,>46), gender and regional accents (for the details, please see the technical document). The script contains 1,512,937(approx.) utterances in total,  covering 3 categories (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in a quiet office environment. Mobile platforms, i.e. iOS\Android\Windows were used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in Pinyin. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.

Academic Resources

What's Hot What's New View All...>>
This Chinese Mandarin Speech Recognition Corpus, which was collected in China, contains the voices of 285 different native speakers (144 males, 141 females) who were balanced according to age (mainly 16-28, 29-45), gender and regional accents (26 Provinces and regions were covered). The database contains about 12.2 hours of recording. A set of 6,140 digit strings were specially designed for both training and testing of speech recognizers. 199 speakers uttered 30 digit strings, 86 speakers uttered 25 digit strings. All the speech data was transcribed and labeled.

Credits: 533.00

This Chinese Mandarin Speech Recognition Corpus was collected in China and contains the voices of 20 different native speakers. Each speaker red some person names, place names, digit strings and stock names in a moving car. It includes 19,198 audio files and about 20.9 hours.

Credits: 667.00

Entries: 12900
Phoneme inventory: XSAMPA
Format: ASCII format with UTF8 character set.
Stress: include
Syllable boundary: include

Credits: 1200.00

This one channel Hokkien Speech Recognition Corpus is collected in Fujian, which is owned by Acoustic Signal and Speech Processing Lab - Xiamen University. There are 40 native speakers in total. The database contains 10134 audio files. All the speech data was transcribed and labeled.

Relevent Paper: The Hokkien Isolated Word Recognition System Based on FPGA (Or copy this link to browser: http://pan.baidu.com/s/1nuHoPuh)
Reference:Lin Li, Wenhao Xu, Jiawen Wu, Shan He, Xiaochao Li
                     Department of Electronic Engineering, Xiamen University, Xiamen, China
                     Xiamen Key Lab of Micro-Nano-Electron Devices & Integrated System, Xiamen, China
                     C Design &IT Research Center of Fujian Province, Xiamen University, Xiamen, China
                     E-mail: heshan@xmu.edu.cn, lilin@xmu.edu.cn

Credits: 500.00 or Price: 770 USD

Data Sharing

Click Here

No credits?

Don't worry...

How to gain credits?

Online Payment function is coming soon

Thanks for waiting...

Monthly Promotion

This one channel Hokkien Speech Recognition Corpus is part of King-ASR-M-001, which is collected in Fujian and owned by Acoustic Signal and Speech Processing Lab - Xiamen University. There are 10 native speakers in total. The database contains 3500 audio files. All the speech data was transcribed and labeled.

Relevent Paper: The Hokkien Isolated Word Recognition System Based on FPGA (Or copy this link to browser: http://pan.baidu.com/s/1nuHoPuh)
Reference:Lin Li, Wenhao Xu, Jiawen Wu, Shan He, Xiaochao Li
                     Department of Electronic Engineering, Xiamen University, Xiamen, China
                     Xiamen Key Lab of Micro-Nano-Electron Devices & Integrated System, Xiamen, China
                     C Design &IT Research Center of Fujian Province, Xiamen University, Xiamen, China
                     E-mail: heshan@xmu.edu.cn, lilin@xmu.edu.cn

Credits: 0.00 0