Skip to content

Latest commit

 

History

History
33 lines (20 loc) · 1.36 KB

README.md

File metadata and controls

33 lines (20 loc) · 1.36 KB

Topic Modeling with Non Negative Matrix Factorization

A concise guide to uncovering hidden themes in text data.

Libraries Used 📚

  • NLTK: For text preprocessing
  • TfidfVectorizer: To convert text to numerical features
  • Non Negative Matrix Factorization: For topic modeling

Data Preprocessing 🧹

Clean the Text with TfidfVectorizer

  • Remove stop words
  • Tokenize text
  • Lemmatize/Stem words
  • Convert to lowercase

Feature Extraction with TfidfVectorizer

  • Create document-term matrix

Model Training 🧠

Initialize NMF

  • Set number of topics
  • Tune hyperparameters

Fit the Model

  • Train on preprocessed data