Welcome to KingLine Data Center!   Contact Phone: 0086-10-62660053   Email: marketing@speechocean.com

Speechocean Homepage Help

Home > Commercial Resources > NLP-Corpus




Sort By:Default

2 Results


This corpus contains 220,000 US English SMS sentences. All the words were processed by filtering out the sensitive words, noisy words, repeated chatting sentences. All text sentences were classified into different categories and were proofreaded manually.


The corpus contains 74,300 sentences, 2,835,290 Chinese characters. It was collected from the daily life and business of individual person from Taiwan and Hong Kong by getting his/her authorization. All words were proofreaded manually.
*Only for domestic market.