This project aims to develop a Movie Recommendation System that leverages the TMDB (The Movie Database) dataset to provide content-based recommendations.
- TMDB: A rich dataset containing movie information, including titles, overviews, genres, and more.
- Content-based Recommendation: This system will recommend movies based on item similarity using the content of the movies.
- Other Types of Recommendations:
- Item-based
- User-based
- Collaborative Filtering
To build this system, we will utilize the following technologies:
-
Data Storage and Processing:
- Snowflake or Kafka: For managing and processing the TMDB dataset.
-
Search and Recommendation Engine:
- VESPA or ElasticSearch: For efficient indexing and searching of movie records.
-
Workflow Management:
- Apache Airflow: To orchestrate the data pipeline.
- DAG 1: A Directed Acyclic Graph (DAG) to read the latest TMDB data into Snowflake, creating a TMDB table.
- DAG 2: A DAG to process the TMDB table records and push the processed data to VESPA for further recommendations.
- Apache Airflow: To orchestrate the data pipeline.
- Clone the repository:
git clone <repository-url> cd <repository-directory>