This repository contains the code and notes i am using/used to learn NLP(Natural Language Processing). Mainly it contains the basic spacy code syntax used for tokenization, Stemming and Lemmatization and different types of functions that can be used and this repository is created by following NLP playlist of the Codebasic channel.
General NLP Process Flowchart:
Text Preprocessing and cleanup:
Basic Syntax - Spacy_basics
Lemmatization and Stemming of Words with both Nltk and Spacy - Lemmatization and Stemming
POS tags and how it can be used to filter a doc or get info of number of tags - POS(Part of Speech)
NER(Named Entity Recognition) and how it can be modified - NER(Named entity Recognition)
Complete pre-processing pipeline using Spacy - Preprocessing Pipeline
Feature Engineering(converting text to numbers):
Bag of Words - BOW(Bag of Words) and Ngrams-BOW
TF-IDF(Term frequency and inverse document frequency) - TF-IDF(both Ngram and unigram)