42 - Language Identification
Identifying the language of a text is often done before you select the right language model.
Language identification is one of the first tasks you will do, because you have to select the right language-specific model. You can use the python library langdetect (55 lang, 99% acc) or fasttext (176 lang, 93% acc).
from langdetect import detect, detect_langs
detect("War doesn't show who's right, just who's left.")
>>> en
detect_langs("Otec matka syn.")
>>> [sk:0.572770823327, pl:0.292872522702, cs:0.134356653968]
This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.