Open Source Text Processing Project: semanticizest

semanticizest: Standalone Semanticizer

Project Website:

Github Link:

Description

Semanticizest is a package for doing entity linking, also known as semantic linking or semanticizing: you feed it text, and it outputs links to pertinent Wikipedia concepts. You can use these links as a “semantic representation” of the text for NLP or machine learning, or just to provide some links to background info on the Wikipedia.

Installation

pip install -r requirements.txt
pip install .
Usage

To train a semanticizer, download a Wikipedia database dump from https://dumps.wikimedia.org/. Then issue:

python -m semanticizest.parse_wikidump
The result will be a semanticizer model (in SQLite 3 format, if you must know).

Alternatively, you can use the –download flag to instruct semanticizest to download the LATEST wikipedia dump. For example, to download and process the Scottish Wikipedia (which is small and useful for testing):

python -m semanticizest.parse_wikidump –download scowiki sco.model
will download https://dumps.wikimedia.org/scowiki/latest/scowiki-latest-pages-articles.xml.bz2 to scowiki.xml.bz2 and construct the model from it.

Documentation

Full documentation can be found at https://semanticize.github.io/semanticizest/


Leave a Reply

Your email address will not be published. Required fields are marked *