12 - Textual Data Augmentation

Boost your performance by creating data out of data, instead of new data.

Rob van Zoest
Founder @ innerdoc.com | NLP Expert-Engineer-Enthusiast | Writes about how to get value from textual data | Lives in the Netherlands | Loves to travel around the globe | Dutchman | rob@innerdoc.com
More posts by Rob van Zoest.

Rob van Zoest

13 Sep 2020• 1 min read

The amount of available textual (training) data influences the performance of many NLP tasks. If collecting more data is not an option, there are different techniques for boosting performance on your NLP task.

Data augmentations are a standard part for Computer Visions tasks. However, due to the grammatical structure, the task is much more delicate for textual data and Natural Language Generation.

Here are some examples of how the textual data is transformed by Easy Data Augmentation (EDA) techniques and Back Translation:

^{Textual Data Augmentation Techniques (source)}

Data Augmentation might not help, but it’s worth the shot if you are stuck. Whatever you do; do not validate with augmented textual data!

This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.

12 - Textual Data Augmentation

Rob van Zoest

Rob van Zoest

13 - Rulebased Training Data

11 - Crowdsourcing Marketplace

10 - Training Data Provider

11 - Crowdsourcing Marketplace

13 - Rulebased Training Data