Taiwan Mandarin Speech Recognition Corpus-Sentence (Mobile)-5232 Speakers
16k,16bit; 1 Channel
The Taiwan Mandarin Mobile Speech Recognition Corpus was collected in Taiwan. It contains the voices of 5232 different speakers (2365 males, 2867 females) who were balanced distributed in age (mainly 16 – 30,31 – 45,46 – 60), gender and regional accents (for the details, please see the technical document). The script contains 1,643,521(approx.) utterances in total (for more details of script structure design, please check the specification), specially designed to provide materials for both training and testing of many classes of speech recognizers. Each speaker was recorded in 2 environments: quiet enviroments (office, home) and noisy enviroments (cafe, restaurant, street). Mobile platform, i.e. Android was used for speech collection. Each utterance wave was stored in a separate file and uncompressed. A pronunciation lexicon is available with a phonemic transcription in Pinyin. All manually checked. All audio files were manually transcribed and annotated by native transcribers. Details are available with specification.