A Beginner’s Guide to spaCy

About spaCy Open Source Text Processing Project: spaCy Install spaCy and related data model Install spaCy by pip: sudo pip install -U spacy Collecting spacy Downloading spacy-1.8.2.tar.gz (3.3MB) Downloading numpy-1.13.0-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB) Collecting murmurhash=0.26 (from spacy) Downloading murmurhash-0.26.4-cp27-cp27mu-manylinux1_x86_64.whl Collecting cymem=1.30 (from … Continue reading

Getting started with topia.termextract

About topia.termextract Open Source Text Processing Project: topia.termextract Install topia.termextract Also topia.termextract has a pip site, but cannot install it by “pip install” method, you should download the source code first: https://pypi.python.org/packages/d1/b9/452257976ebee91d07c74bc4b34cfce416f45b94af1d62902ae39bf902cf/topia.termextract-1.1.0.tar.gz Then “tar -zxvf topia.termextract-1.1.0.tar.gz” and “cd topia.termextract-1.1.0” and … Continue reading

A Beginner’s Guide to TextBlob

About TextBlob Open Source Text Processing Project: TextBlob Install TextBlob Install the latest TextBlob on Ubuntu 16.04.1 LTS: textprocessing@ubuntu:~$ sudo pip install -U textblob Collecting textblob Downloading textblob-0.12.0-py2.py3-none-any.whl (631kB) Requirement already up-to-date: nltk>=3.1 in /usr/local/lib/python2.7/dist-packages (from textblob) Requirement already up-to-date: … Continue reading

Getting started with NLTK

About NLTK Open Source Text Processing Project: NLTK Install NLTK 1. Install the latest NLTK pakage on Ubuntu 16.04.1 LTS: textprocessing@ubuntu:~$ sudo pip install -U nltk Collecting nltk Downloading nltk-3.2.2.tar.gz (1.2MB) 35% |███████████▍ | 409kB 20.8MB/s eta 0:00:0 …… 100% … Continue reading

Open Source Text Processing Project: topia.termextract

topia.termextract: Content Term Extraction using POS Tagging Project Website: https://pypi.python.org/pypi/topia.termextract/ Github Link: None Description This package determines important terms within a given piece of content. It uses linguistic tools such as Parts-Of-Speech (POS) and some simple statistical analysis to determine … Continue reading

Open Source Text Processing Project: Jieba

Jieba: Chinese text segmentation Project Website: None Github Link: https://github.com/fxsjy/jieba Description “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. Features Support three types of segmentation mode: Accurate Mode attempts to … Continue reading

Open Source Text Processing Project: Stanford Log-linear Part-Of-Speech Tagger

Stanford Log-linear Part-Of-Speech Tagger Project Website: http://nlp.stanford.edu/software/tagger.shtml Github Link: None Description A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as … Continue reading