10

Terminologies in

NLP

Image Via Unsplash

01/10

Tokenization

Tokenization splits larger strings into smaller pieces which are called tokens. Tokens can be sentences, parts of any word, or punctuation. The process of creating tokens is called Tokenization.

Image Via Unsplash

02/10

Normalization

Normalization is the process in NLP that put all text in the same context. Like covering all text in the same case, number-to-word conversion, etc.

Image Via Unsplash

03/10

Stemming

Stemming in the process of NLP gives the word’s origin by removing affixes.  For example, the base word of eating eats and eaten is eat.

Image Via Unsplash

04/10

Lemmatization

Lemmatization is like Stemming. But Lemmatization gives a root word instead of a root stem. For example, if we pass studies to the stemming it will give studi. But the Lemmatization will give the study as an output.

Image Via Unsplash

05/10

Corpus

Corpus refers to the collection of text in the NLP. Corpus can be in one language or in the multi-languages.

Image Via Unsplash

06/10

Document

Each sentence in the NLP is called a Document. When multiple documents merged together then it is called a Corpus.

Image Via Unsplash

07/10

Stop Words

The type of words that do not contribute to the understanding of the content is called Stop Words.   For example ‘a’, ‘and’, ‘the’, etc are stop words in the English language.

Image Via Unsplash

08/10

Bag of words

Bag of words is the representation model which is used to simplify the context of the text. A bag of words gives the occurrence of each word of the text in order.

Image Via Unsplash

09/10

N-grams

N-grams are another representation model for simplifying the context of the text. This model preserves the contiguous sequences of N items from the text. There can be 2-gram, 3-gram, etc.

Image Via Unsplash

10/10

Regular Expression

Regular Expression or Regex is used to describe specific patterns for the set of text. Regex is the special text string itself.

Image Via Unsplash

SHARE IF YOU Liked this Story

Arrow