A Beginner’s Guide to spaCy

About spaCy Open Source Text Processing Project: spaCy Install spaCy and related data model Install spaCy by pip: sudo pip install -U spacy Collecting spacy Downloading spacy-1.8.2.tar.gz (3.3MB) Downloading numpy-1.13.0-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB) Collecting murmurhash=0.26 (from spacy) Downloading murmurhash-0.26.4-cp27-cp27mu-manylinux1_x86_64.whl Collecting cymem=1.30 (from … Continue reading

Getting started with Python Word Segmentation

About Python Word Segmentation Python Word Segmentation WordSegment is an Apache2 licensed module for English word segmentation, written in pure-Python, and based on a trillion-word corpus. Based on code from the chapter “Natural Language Corpus Data” by Peter Norvig from … Continue reading

Getting started with topia.termextract

About topia.termextract Open Source Text Processing Project: topia.termextract Install topia.termextract Also topia.termextract has a pip site, but cannot install it by “pip install” method, you should download the source code first: https://pypi.python.org/packages/d1/b9/452257976ebee91d07c74bc4b34cfce416f45b94af1d62902ae39bf902cf/topia.termextract-1.1.0.tar.gz Then “tar -zxvf topia.termextract-1.1.0.tar.gz” and “cd topia.termextract-1.1.0” and … Continue reading

A Beginner’s Guide to TextBlob

About TextBlob Open Source Text Processing Project: TextBlob Install TextBlob Install the latest TextBlob on Ubuntu 16.04.1 LTS: textprocessing@ubuntu:~$ sudo pip install -U textblob Collecting textblob Downloading textblob-0.12.0-py2.py3-none-any.whl (631kB) Requirement already up-to-date: nltk>=3.1 in /usr/local/lib/python2.7/dist-packages (from textblob) Requirement already up-to-date: … Continue reading

Getting started with NLTK

About NLTK Open Source Text Processing Project: NLTK Install NLTK 1. Install the latest NLTK pakage on Ubuntu 16.04.1 LTS: textprocessing@ubuntu:~$ sudo pip install -U nltk Collecting nltk Downloading nltk-3.2.2.tar.gz (1.2MB) 35% |███████████▍ | 409kB 20.8MB/s eta 0:00:0 …… 100% … Continue reading

Open Source Text Processing Project: segtok

segtok: sentence segmentation and word tokenization tools Project Website: http://fnl.es/segtok-a-segmentation-and-tokenization-library.html Github Link: https://github.com/fnl/segtok Description A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features. The segtok package provides two modules, segtok.segmenter and segtok.tokenizer. The segmenter provides functionality for … Continue reading

Open Source Text Processing Project: textacy

textacy: higher-level NLP built on spaCy Project Website: https://textacy.readthedocs.io Github Link: https://github.com/chartbeat-labs/textacy Description textacy is a Python library for performing higher-level natural language processing (NLP) tasks, built on the high-performance spaCy library. With the basics — tokenization, part-of-speech tagging, dependency … Continue reading

Open Source Text Processing Project: vivekn sentiment

Sentiment analysis using machine learning techniques Project Website: http://sentiment.vivekn.com/ Github Link: https://github.com/vivekn/sentiment Description Sentiment analysis using machine learning techniques. Check info.py for the training and testing code. A demo of the tool is available here Refer this paper for more … Continue reading

Open Source Deep Learning Project: ELEKTRONN

ELEKTRONN: A highly configurable toolkit for training 3d/2d CNNs and general Neural Networks Project Website: http://elektronn.org/ Github Link: https://github.com/ELEKTRONN/ELEKTRONN Description ELEKTRONN is a deep learning toolkit that makes powerful neural networks accessible to scientists outside of the machine learning community. … Continue reading

Open Source Deep Learning Project: Gnumpy

Gnumpy Project Website: None Github Link: http://www.cs.toronto.edu/~tijmen/gnumpy.html Description Do you want to have both the compute power of GPU’s and the programming convenience of Python numpy? Gnumpy + Cudamat will bring you that. Gnumpy is a simple Python module that … Continue reading