Unsupervised Signaling

57 - Outlier Detection

Finding text that is exceptionally far from the mainstream text.

Outliers or Anomalies are generally defined as samples that are exceptionally far from the mainstream of (textual) data. The threshold when something is an outlier is very subjective. If you have a vocabulary, an outlier might be defined as a word that is Out-of-Vocabulary (OOV).

Another way is that the outlier is a result of an extreme class imbalance and can be measured in terms of its word- or document vector.




This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.