Open Source Text Processing Project: semanticizest

Deep Learning Specialization on Coursera

semanticizest: Standalone Semanticizer

Project Website:

Github Link:


Semanticizest is a package for doing entity linking, also known as semantic linking or semanticizing: you feed it text, and it outputs links to pertinent Wikipedia concepts. You can use these links as a “semantic representation” of the text for NLP or machine learning, or just to provide some links to background info on the Wikipedia.


pip install -r requirements.txt
pip install .

To train a semanticizer, download a Wikipedia database dump from Then issue:

python -m semanticizest.parse_wikidump
The result will be a semanticizer model (in SQLite 3 format, if you must know).

Alternatively, you can use the –download flag to instruct semanticizest to download the LATEST wikipedia dump. For example, to download and process the Scottish Wikipedia (which is small and useful for testing):

python -m semanticizest.parse_wikidump –download scowiki sco.model
will download to scowiki.xml.bz2 and construct the model from it.


Full documentation can be found at

Leave a Reply

Your email address will not be published. Required fields are marked *