Skip to content

solmaznsr/NLP-Text-Analysis-Pipeline

Repository files navigation

Text Analysis Pipeline

A comprehensive NLP pipeline for text analysis, including preprocessing, tokenization, lemmatization, POS tagging, NER, sentiment analysis, topic modeling, keyword extraction, dependency parsing, summarization, spell checking, and visualization.

This project was developed as part of my final undergraduate project in 2022.

Installation

  1. pip install -r requirements.txt

How to Use

  1. Clone the repository.
  2. Install dependencies using requirements.txt.
  3. Run main.py to execute the pipeline.

Features

  • Text Preprocessing: Lowercasing, punctuation removal, and whitespace normalization.
  • Tokenization: Splitting text into tokens and removing stopwords.
  • Lemmatization: Reducing words to their base forms.
  • POS Tagging: Assigning part-of-speech tags to tokens.
  • Chunking: Grouping tokens into meaningful chunks (e.g., noun phrases).
  • Named Entity Recognition (NER): Identifying entities like names, dates, and locations.
  • Sentiment Analysis: Analyzing the sentiment of the text (positive, negative, neutral).
  • Topic Modeling: Identifying topics in the text using Latent Dirichlet Allocation (LDA).
  • Keyword Extraction: Extracting important keywords using TF-IDF.
  • Dependency Parsing: Analyzing grammatical relationships between words.
  • Text Summarization: Generating a summary of the text using LSA.
  • Spell Checking: Correcting spelling errors in the text.
  • Visualization: Word cloud, bar chart, and network graph for insights.
  • Saving Results: Saving processed data and visualizations to files.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages