38 - Readability Scoring

Measuring the readability of a text by looking at the keyword density, syllable count and the average length of sentences and words in a document.

Rob van Zoest
Founder @ innerdoc.com | NLP Expert-Engineer-Enthusiast | Writes about how to get value from textual data | Lives in the Netherlands | Loves to travel around the globe | Dutchman | rob@innerdoc.com
More posts by Rob van Zoest.

Rob van Zoest

09 Oct 2020• 1 min read

Readability is the quality of the text that was written. If it’s too long and complicated, no one will understand it. Measuring the readability is about measuring the text quality. This can be done by looking at the keyword density, syllable count and the average length of sentences and words in a document. Also checking for simpler synonyms or words with a higher word prevalence can help. Word prevalence is about word knowledge in the crowd and refers to the number of people who know the word.

Well-known Readability measures are Flesch-Kincaid Grade Level and the Coleman-Liau Index. These are developed for English. For non-English languages there might be specific variants. However, the best language-agnostic linguistic proxy for readability is (not surprisingly) the average number of words per sentence.

You can try the English readability metrics with this python package.

^{Flesch-Kincaid Grade Levels (source)}

This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.

38 - Readability Scoring

Rob van Zoest

Rob van Zoest

37 - Grammar Checker

36 - Paragraph Segmentation

35 - Sentencizer

37 - Grammar Checker

39 - Deduplication