Open Source Text Processing Project: WhatLanguage

WhatLanguage: A language detection library for Ruby that uses bloom filters for speed.

Project Website: None

Github Link:

Description

Text language detection. Quick, fast, memory efficient, and all in pure Ruby. Uses Bloom filters for aforementioned speed and memory benefits.

Works with Dutch, English, Farsi, French, German, Italian, Pinyin, Swedish, Portuguese, Russian, Arabic, Finnish, Greek, Hebrew, Hungarian, Korean, Norwegian, Polish and Spanish out of the box.

Important note

This library was first built in 2007 and has received a few minor updates over the years. There are now more efficient and effective algorithms for doing language detection which I am investigating for a WhatLanguage 2.0.

This library has been updated to be distributed and to work on modern Ruby implementations but other than that, has had no improvements.


Leave a Reply

Your email address will not be published. Required fields are marked *