Text Analysis Pipeline

A comprehensive NLP pipeline for text analysis, including preprocessing, tokenization, lemmatization, POS tagging, NER, sentiment analysis, topic modeling, keyword extraction, dependency parsing, summarization, spell checking, and visualization.

This project was developed as part of my final undergraduate project in 2022.

Installation

pip install -r requirements.txt

How to Use

Clone the repository.
Install dependencies using requirements.txt.
Run main.py to execute the pipeline.

Features

Text Preprocessing: Lowercasing, punctuation removal, and whitespace normalization.
Tokenization: Splitting text into tokens and removing stopwords.
Lemmatization: Reducing words to their base forms.
POS Tagging: Assigning part-of-speech tags to tokens.
Chunking: Grouping tokens into meaningful chunks (e.g., noun phrases).
Named Entity Recognition (NER): Identifying entities like names, dates, and locations.
Sentiment Analysis: Analyzing the sentiment of the text (positive, negative, neutral).
Topic Modeling: Identifying topics in the text using Latent Dirichlet Allocation (LDA).
Keyword Extraction: Extracting important keywords using TF-IDF.
Dependency Parsing: Analyzing grammatical relationships between words.
Text Summarization: Generating a summary of the text using LSA.
Spell Checking: Correcting spelling errors in the text.
Visualization: Word cloud, bar chart, and network graph for insights.
Saving Results: Saving processed data and visualizations to files.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
dependency_parsing.py		dependency_parsing.py
keyword_extraction.py		keyword_extraction.py
lemmatization.py		lemmatization.py
main.py		main.py
ner.py		ner.py
pipeline.py		pipeline.py
pos_tagging.py		pos_tagging.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
sentiment_analysis.py		sentiment_analysis.py
spell_checking.py		spell_checking.py
summarization.py		summarization.py
tokenization.py		tokenization.py
topic_modeling.py		topic_modeling.py
utils.py		utils.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Analysis Pipeline

Installation

How to Use

Features

About

Releases

Packages

Languages

solmaznsr/NLP-Text-Analysis-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Text Analysis Pipeline

Installation

How to Use

Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages