Innerdoc

45 - Explaining Models

Explaining the outcomes of your Language Model is needed to prevent distrust and increase transparency.

44 - Evaluating Models

Evaluating the quality of a Language Model should be done by comparisons based on the right metrics for your model type.

Natural Language Models

43 - Training Models

Training Language Models should start with a simple baseline and be improved with more complex techniques.

Documents

42 - Language Identification

Identifying the language of a text is often done before you select the right language model.

Documents

41 - Meta-Info Extractor

Extracting text from a file should be accompanied with the extraction of meta-information.

Documents

40 - Raw Text Cleaning

Pre-processing text with the goal to increase the quality of subsequent NLP tasks.

Documents

39 - Deduplication

Finding texts that are exactly the same or show a high similarity. Similarity can be measured on lexicality or semantic meaning from embeddings.

Sentences and Paragraphs

38 - Readability Scoring

Measuring the readability of a text by looking at the keyword density, syllable count and the average length of sentences and words in a document.

Sentences and Paragraphs

37 - Grammar Checker

Improving the grammar on a sentence level.

Sentences and Paragraphs

36 - Paragraph Segmentation

Splitting text into paragraphs requires more custom logic. A paragraph might contain a more comprehensive meaning than a sentence.

Sentences and Paragraphs

35 - Sentencizer

Finding the words that together form a sentence, or from another viewpoint, detecting sentence boundaries.

Entity Enriching

34 - Text Anonymizer

Removing sensitive information before a document is shared with others. Deidentification and obfuscation of persons and organizations relies on Named Entity Recognition.

Entity Enriching

33 - Coreference Resolution

Finding all expressions that refer to the same entity in a text. You can compare this to Named Entity Linking, but it doesn’t necessarily use a knowledge base.

Entity Enriching

32 - Named Entity Linking

Assigning a unique identity from a knowledge base to a named entity.

Entity Enriching

31 - Temporal Parser

Finding strings that contains an indication of time and then extracting a normalized time format out of it.

Entity Enriching

30 - Geocoding

Parsing text into an address and converting an addresses into geographic coordinates like latitude and longitude.

Entity Enriching

29 - Price Parser

Extracting price and currency from raw text and normalize it into a standard format.

Phrases and Entities

28 - Abbreviation Finder

Abbreviations are an efficient way of writing, but it lowers text comprehension. To solve this, identify the long-form to enrich the short-form.

Phrases and Entities

27 - Named Entity Recognition

Identifying named entities is the task of assigning a NER category, like Persons, Locations or Organizations, to words in a sentence.

Phrases and Entities

26 - Dependency Nounchunks

Breaking text into verb- or noun-phrases result into semantically correct subphrases of a sentence that are deducted from the dependency structure.

Phrases and Entities

25 - Rulebased Phrasematcher

Finite size lookup tables might inspire your to build rulebased searches, especially when semantic info like lemma’s and POS- and Dependency tags can be used in the search pattern.

Phrases and Entities

24 - N-grams

Detecting N-grams results in common multi-word expression with a high probability of occurrence, like the Bi-gram ‘red wine’.

Deep Text Search. Made Intelligent.