Movie-Revenue-Predictor

A machine learning-based system designed to predict movie revenues by integrating diverse data types, including numerical, categorical, and textual features. The project leverages state-of-the-art techniques such as text embeddings (BERT), Principal Component Analysis (PCA), and feedforward neural networks to provide accurate revenue forecasts, aiding stakeholders in decision-making and resource optimization.

Introduction

The movie industry faces significant financial uncertainties. This project aims to mitigate these risks by forecasting box office revenue using historical data and machine learning techniques. The system incorporates heterogeneous data sources, such as:

Numerical: Budget, runtime, etc.
Categorical: Genres, languages.
Textual: Plot summaries, embedded using BERT.

Challenges like missing data, high dimensionality, and feature integration are addressed through advanced preprocessing techniques and predictive modeling, achieving an accuracy of approximately 94%.

Technologies Used

Programming Languages and Libraries

Python: Core programming language.
Machine Learning: Scikit-learn, TensorFlow.
Deep Learning: Transformers (BERT embeddings), TensorFlow.
Data Manipulation: Pandas, NumPy.
Visualization: Matplotlib, Seaborn.

Modeling Techniques

Text Embeddings: BERT for semantic representation of textual data.
Dimensionality Reduction: PCA for reducing high-dimensional data.
Predictive Modeling: Feedforward neural networks for regression.

Dataset Overview

Data Sources

Numerical Features: Budget, runtime, etc.
Categorical Features: Genres, production companies, etc.
Textual Features: Plot summaries.

Preprocessing Steps

Handling Missing Values: Imputation techniques.
Feature Scaling: Normalization and standardization for numerical features.
Categorical Encoding: One-hot encoding for categorical variables.
Dimensionality Reduction: PCA for computational efficiency.

Features

Revenue Prediction: Regression analysis to forecast movie revenue.
Text Embeddings: BERT-based embeddings to capture plot semantics.
Integrated Pipeline: Combines numerical, categorical, and textual features into a unified model.
Advanced Feature Engineering:
- PCA for dimensionality reduction.
- Encoding for categorical data.

Installation

Prerequisites

Ensure you have Python and the necessary libraries installed:

pip install pandas numpy scikit-learn tensorflow transformers matplotlib seaborn

Usage

Load Data:
- Use datasets with columns like budget, runtime, genres, and plot.
Preprocess Data:
- Clean and preprocess the data using the provided pipeline.
Model Training:
- Train the model using feedforward neural networks with integrated features.
Evaluate:
- Evaluate the model on test data using metrics like RMSE, MAE, and R².
Run Predictions:
- Use the trained model to predict revenue for new movies.

Experimental Results

Accuracy: ~94% for test data.
Sample Prediction:
- Movie: Pirates of the Caribbean: At World's End.
- Actual Revenue: $961M.
- Predicted Revenue: $904M.

Future Enhancements

Additional Features: Include actor popularity and production company success rates.
Hyperparameter Tuning: Use advanced techniques like Optuna or GridSearchCV.
Reframe Problem: Adapt the system to classify movies into revenue categories.
Real-Time Prediction: Develop a web interface using Flask or FastAPI.

How to Run the Project

1. Set Up Your Environment

Install Python:
- Ensure Python 3.8 or higher is installed. You can download it from python.org.

Create a Virtual Environment:

python -m venv env
source env/bin/activate  # For Linux/macOS
env\Scripts\activate     # For Windows

Install Dependencies: Install all required libraries using pip:

pip install pandas numpy scikit-learn tensorflow transformers matplotlib seaborn

2. Prepare the Dataset

Place the Dataset Files:
- Ensure your datasets (e.g., movie_dataset.csv, cast_popularity.csv, etc.) are in the same directory as the project or a specified data/ folder.
Verify Column Names:
- The dataset should include essential columns like:
  - budget
  - runtime
  - genres
  - plot
  - revenue (if available for training).

3. Run the Jupyter Notebook

Launch Jupyter Notebook:
```
jupyter notebook
```
Open the Project Notebook:
- Navigate to the project directory and open the Projet.ipynb file.
Run All Cells:
- Execute the cells in sequence to preprocess the data, train the model, and evaluate predictions.

4. Execute the Script

If a Python script (script.py) is provided for automating predictions:

Run the Script:
```
python script.py
```
Provide Input:
- Ensure the required input files (e.g., movie_dataset.csv) are present.
- The output (predictions) will be saved in a file like predictions.csv.

5. Evaluate Results

Use the notebook's visualizations and metrics (e.g., RMSE, MAE, R²) to analyze model performance.
Review the predictions file to see revenue forecasts.

6. Optional Enhancements

Deploy the Model: Use a web framework like Flask or FastAPI to create a real-time API for predictions.
Adjust Hyperparameters: Modify the training configuration in the notebook or script to experiment with model performance.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
Images		Images
.gitattributes		.gitattributes
.gitignore		.gitignore
Projet.ipynb		Projet.ipynb
README.md		README.md
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie-Revenue-Predictor

Table of Contents

Introduction

Technologies Used

Programming Languages and Libraries

Modeling Techniques

Dataset Overview

Data Sources

Preprocessing Steps

Features

Installation

Prerequisites

Usage

Experimental Results

Future Enhancements

How to Run the Project

1. Set Up Your Environment

2. Prepare the Dataset

3. Run the Jupyter Notebook

4. Execute the Script

5. Evaluate Results

6. Optional Enhancements

About

Releases

Packages

Languages

patel-ab/Movie-Revenue-Predictor

Folders and files

Latest commit

History

Repository files navigation

Movie-Revenue-Predictor

Table of Contents

Introduction

Technologies Used

Programming Languages and Libraries

Modeling Techniques

Dataset Overview

Data Sources

Preprocessing Steps

Features

Installation

Prerequisites

Usage

Experimental Results

Future Enhancements

How to Run the Project

1. Set Up Your Environment

2. Prepare the Dataset

3. Run the Jupyter Notebook

4. Execute the Script

5. Evaluate Results

6. Optional Enhancements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages