Open Source Text Processing Project: WhatLanguage

WhatLanguage: A language detection library for Ruby that uses bloom filters for speed. Project Website: None Github Link: https://github.com/peterc/whatlanguage Description Text language detection. Quick, fast, memory efficient, and all in pure Ruby. Uses Bloom filters for aforementioned speed and memory … Continue reading

Open Source Text Processing Project: langid

langid.py: Stand-alone language identification system Project Website: None Github Link: https://github.com/saffsd/langid.py Description langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained over a large number of languages (currently 97) Not sensitive to domain-specific … Continue reading