semanticizest: Standalone Semanticizer
Project Website:
Github Link:
Description
Semanticizest is a package for doing entity linking, also known as semantic linking or semanticizing: you feed it text, and it outputs links to pertinent Wikipedia concepts. You can use these links as a “semantic representation” of the text for NLP or machine learning, or just to provide some links to background info on the Wikipedia.
Installation
pip install -r requirements.txt
pip install .
Usage
To train a semanticizer, download a Wikipedia database dump from https://dumps.wikimedia.org/. Then issue:
python -m semanticizest.parse_wikidump
The result will be a semanticizer model (in SQLite 3 format, if you must know).
Alternatively, you can use the –download flag to instruct semanticizest to download the LATEST wikipedia dump. For example, to download and process the Scottish Wikipedia (which is small and useful for testing):
python -m semanticizest.parse_wikidump –download scowiki sco.model
will download https://dumps.wikimedia.org/scowiki/latest/scowiki-latest-pages-articles.xml.bz2 to scowiki.xml.bz2 and construct the model from it.
Documentation
Full documentation can be found at https://semanticize.github.io/semanticizest/