Open Source Text Processing Project: Jieba

Jieba: Chinese text segmentation

Project Website: None

Github Link:

Description

“Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module.

Features
Support three types of segmentation mode:

Accurate Mode attempts to cut the sentence into the most accurate segmentations, which is suitable for text analysis.
Full Mode gets all the possible words from the sentence. Fast but not accurate.
Search Engine Mode, based on the Accurate Mode, attempts to cut long words into several short words, which can raise the recall rate. Suitable for search engines.

Supports Traditional Chinese
Supports customized dictionaries
MIT License


Leave a Reply

Your email address will not be published. Required fields are marked *