Open Source Text Processing Project: GATE

GATE: a full-lifecycle open source solution for text processing

Project Website:

Github Link: None


GATE is…

open source software capable of solving almost any text processing problem
a mature and extensive community of developers, users, educators, students and scientists
a defined and repeatable process for creating robust and maintainable text processing workflows
in active use for all sorts of language processing tasks and applications, including: voice of the customer; cancer research; drug research; decision support; recruitment; web mining; information extraction; semantic annotation
the result of a €multi-million R&D programme running since 1995, funded by commercial users, the EC, BBSRC, EPSRC, AHRC, JISC, etc.
used by corporations, SMEs, research labs and Universities worldwide
the Eclipse of Natural Language Engineering, the Lucene of Information Extraction, the ISO 9001 of Text Mining
a world-class team of language processing developers

GATE has grown over the years to include a desktop client for developers, a workflow-based web application, a Java library, an architecture and a process. GATE is:

an IDE, GATE Developer4: an integrated development environment for language processing components bundled with a very widely used Information Extraction system and a comprehensive set of other plugins
a web app: GATE Teamware a collaborative annotation environment for factory-style semantic annotation projects built around a workflow engine and a heavily-optimised backend service infrastructure
a framework, GATE Embedded: an object library optimised for inclusion in diverse applications giving access to all the services used by GATE Developer and more
an architecture: a high-level organisational picture of how language processing software composition
a process for the creation of robust and maintainable services

On top of the core functions GATE includes components for diverse language processing tasks, e.g. parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. GATE Developer and Embedded are supplied with an Information Extraction system (ANNIE) which has been adapted and evaluated very widely (numerous industrial systems, research systems evaluated in MUC, TREC, ACE, DUC, Pascal, NTCIR, etc.). ANNIE is often used to create RDF or OWL (metadata) for unstructured content (semantic annotation).

