This project implements a BERT-based text classification model to perform sentiment analysis on IMDB movie reviews. It uses PyTorch and the Hugging Face Transformers library to fine-tune a BERT model for binary classification (positive/negative sentiment).
text-classification/
├── data/
│ └── imdb_data.csv
├── src/
│ ├── data/
│ │ └── dataset.py
│ ├── model/
│ │ └── classifier.py
│ ├── train.py
├── main.py
├── config.yaml
├── requirements.txt
└── README.md
-
Clone the repository:
git clone https://github.com/your-username/text-classification.git cd text-classification
-
Create and activate a Conda environment:
conda create --name bert-classifier conda activate bert-classifier
-
Install the required packages:
pip install -r requirements.txt
-
Prepare your data:
- Place your IMDB dataset (CSV format) in the
data/
directory. - Ensure your CSV file has 'review' and 'sentiment' columns.
- Place your IMDB dataset (CSV format) in the
-
Configure the project:
- Edit
config.yaml
to set your desired parameters, data paths, and model settings.
- Edit
To train the model:
python main.py
- BERT-based text classification
- Custom attention pooling mechanism
- Training with early stopping
- Weights & Biases integration for experiment tracking (optional)
The config.yaml
file contains all the configurable parameters for the project. Key configurations include:
- Data paths
- Model hyperparameters (e.g., learning rate, batch size)
- Training settings (e.g., number of epochs, early stopping threshold)
Contributions to this project are welcome! Please fork the repository and submit a pull request with your proposed changes.
- The BERT model implementation is based on the Hugging Face Transformers library.
- The IMDB dataset used for this project is publicly available.