A Beginner’s Guide to spaCy

About spaCy Open Source Text Processing Project: spaCy Install spaCy and related data model Install spaCy by pip: sudo pip install -U spacy Collecting spacy Downloading spacy-1.8.2.tar.gz (3.3MB) Downloading numpy-1.13.0-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB) Collecting murmurhash=0.26 (from spacy) Downloading murmurhash-0.26.4-cp27-cp27mu-manylinux1_x86_64.whl Collecting cymem=1.30 (from … Continue reading

Getting started with Python Word Segmentation

About Python Word Segmentation Python Word Segmentation WordSegment is an Apache2 licensed module for English word segmentation, written in pure-Python, and based on a trillion-word corpus. Based on code from the chapter “Natural Language Corpus Data” by Peter Norvig from … Continue reading

Getting started with topia.termextract

About topia.termextract Open Source Text Processing Project: topia.termextract Install topia.termextract Also topia.termextract has a pip site, but cannot install it by “pip install” method, you should download the source code first: https://pypi.python.org/packages/d1/b9/452257976ebee91d07c74bc4b34cfce416f45b94af1d62902ae39bf902cf/topia.termextract-1.1.0.tar.gz Then “tar -zxvf topia.termextract-1.1.0.tar.gz” and “cd topia.termextract-1.1.0” and … Continue reading

Getting started with WordNet

About WordNet WordNet is a lexical database for English: WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means … Continue reading

A Beginner’s Guide to TextBlob

About TextBlob Open Source Text Processing Project: TextBlob Install TextBlob Install the latest TextBlob on Ubuntu 16.04.1 LTS: textprocessing@ubuntu:~$ sudo pip install -U textblob Collecting textblob Downloading textblob-0.12.0-py2.py3-none-any.whl (631kB) Requirement already up-to-date: nltk>=3.1 in /usr/local/lib/python2.7/dist-packages (from textblob) Requirement already up-to-date: … Continue reading

Getting started with Word2Vec

1. Source by Google Project with Code: Word2Vec Blog: Learning the meaning behind words Paper: [1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. … Continue reading

Getting started with NLTK

About NLTK Open Source Text Processing Project: NLTK Install NLTK 1. Install the latest NLTK pakage on Ubuntu 16.04.1 LTS: textprocessing@ubuntu:~$ sudo pip install -U nltk Collecting nltk Downloading nltk-3.2.2.tar.gz (1.2MB) 35% |███████████▍ | 409kB 20.8MB/s eta 0:00:0 …… 100% … Continue reading