36 - Paragraph Segmentation
Splitting text into paragraphs requires more custom logic. A paragraph might contain a more comprehensive meaning than a sentence.
Detecting Paragraphs is somehow less mainstream. Mostly there is some custom logic like: split after two line-ends, or split before uppercased title. Maybe there is some layout-meta information, or a specific paragraph- and chapter numbering that could help.
Mostly, there just is no default way of determining the paragraph boundary and people tend to work with sentences. Still, the unit of a paragraph might be of a higher value than that of a sentence. Examples might be: coreference resolutions that overlap multiple sentences. Questions that find their answer throughout a whole paragraph. A reader that understands a paragraph better than an isolated sentence. It’s clear that the signal from a writer is best expressed in a paragraph.
This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.