Welcome to KingLine Data Center!   Contact Phone: 0086-10-62660053   Email: marketing@speechocean.com

English

Speechocean HomepageHelp

Home > Commercial Resources > NLP-Corpus

Language

>

All

Sort By:Default

122 Results

King-NLP-001

This corpus contains 3,050,000 Chinese characters collected from the real Emails of Chinese native speakers. All words were proofreaded manually and all the sensitive words and repeated sentences were filtered in the pure word layer.

King-NLP-002

This data contains 100,000 SMS sentences collected from the real life of Chinese native speakers. All short message sentences were proofreaded manually, repeated sentences were filtered, etc.; The whole data is formed by four layers of Pure word, Pinyin with Tone, Word Segmentation and Name Entity.

King-NLP-003

This data contains 410,000 SMS sentences collected from the real life of Chinese native speakers. All short message sentences were proofreaded manually, repeated sentences were filtered in the pure word layer and all the sentences were annotated with Pinyin with tone information.

King-NLP-004

This data contains 1,260,000 SMS sentences collected from the real life of Chinese native speakers. All short message sentences were proofreaded manually, repeated sentences were filtered in the pure word layer and all the sentences were annotated with word segmentation information.

King-NLP-005

This data contains 110,000 SMS sentences collected from the real life of Chinese native speakers. All short message sentences were proofreaded manually, repeated sentences were filtered in the pure word layer and all the sentences were annotated with name entity information.

King-NLP-006

This data contains 1,960,000 SMS sentences collected from the real life of Chinese natives. All short message sentences were proofreaded manually, repeated same sentences were filtered, etc.;The whole data is formed of four layers.

King-NLP-007

This corpus contains 1,200,000 sentences collected from the real instance messages of Chinese natives. All words were proofreaded manually, sensitive words and repeated same sentences were filtered in the pure word layer.

King-NLP-008

This corpus contains 1,450,000 sentences collected from the real instance messages of Chinese natives. All words were proofreaded manually, sensitive words and repeated same sentences were filtered in the pure word layer.

King-NLP-009

This corpus contains 350,000 Chinese person names collected from the real names of Chinese native speakers. All words were proofreaded manually and repeated names were filtered.

King-NLP-010

This corpus contains 6,000,000 Chinese place names collected based on the administrative districts of China. All words were proofreaded manually.

1 2 3 4 5 6 7 8 9 10