Skip to content

This repository contains the code and notes i am using/used to learn NLP(Natural Language Processing).

Notifications You must be signed in to change notification settings

DharanHaarish/NLP-practise

Repository files navigation

NLP-practise

NLP

This repository contains the code and notes i am using/used to learn NLP(Natural Language Processing). Mainly it contains the basic spacy code syntax used for tokenization, Stemming and Lemmatization and different types of functions that can be used and this repository is created by following NLP playlist of the Codebasic channel.

General NLP Process Flowchart:

NLP Process

Text Preprocessing and cleanup:

Basic Syntax - Spacy_basics

Lemmatization and Stemming of Words with both Nltk and Spacy - Lemmatization and Stemming

POS tags and how it can be used to filter a doc or get info of number of tags - POS(Part of Speech)

NER(Named Entity Recognition) and how it can be modified - NER(Named entity Recognition)

Complete pre-processing pipeline using Spacy - Preprocessing Pipeline

Feature Engineering(converting text to numbers):

Bag of Words - BOW(Bag of Words) and Ngrams-BOW

TF-IDF(Term frequency and inverse document frequency) - TF-IDF(both Ngram and unigram)

About

This repository contains the code and notes i am using/used to learn NLP(Natural Language Processing).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published