- Introduction
- Data Preparation
- Rank Estimation with Linear Regression
- College Clustering with K-means Algorithm
- Visualizations
- Summary Statistics
- Usage
- Contributions
- License
- Acknowledgements
With the power of data-driven growth, let’s help reimagine what's possible and empower the next generation of medical professionals. We solve two problems:
- Challenges in providing accurate counseling due to missing NEET scores and ranks in student profiles.
- Lack of insights into college preferences and attrition rates hinders effective guidance for students' academic choices.
Problem 1: Incomplete Data in Student Scores and Ranks Around 40% of the dataset contains missing scores for NEET Ranks.
Our study commences with an in-depth examination of data analytics, aiming to reveal valuable insights capable of influencing academic counseling practices. We scrutinized a sizable dataset comprising 100,000 entries, noting that 40% of the data contained gaps, posing a notable challenge for analysis.
To address the problem of colleges experiencing varying levels of attrition (dropout rates) among students and the lack of understanding about factors influencing student attrition, our project delved into Rank Estimation with Linear Regression.
Problem 2: Colleges experience varying levels of attrition (dropout rates) among students. Lack of understanding about factors influencing student attrition.
To address this, our project delved into College Clustering using the k-means algorithm. By applying this technique to a dataset of 400 colleges, we clustered them based on Round 1 closings and attrition rates. To refine our clustering process, we employed methods like the elbow curve and silhouette score. Our goal was to gain insights into the factors contributing to student attrition.
Continuing our analysis, we computed the "Attraction Index" for colleges, providing insights into their appeal. This index, derived from analyzing 324 colleges, revealed a mean score of 94.45, highlighting the prestige of various institutions.
We then moved on to compute the Attraction Index. Through K-means clustering, a robust unsupervised machine learning algorithm, we categorized the colleges based on their characteristics. This approach sets the stage for developing tailored counseling strategies, catering to the unique dynamics of each college.
Next, we conducted model validation by analyzing regional trends and evaluating our predictive models. Through box plots, we visually represented the differences in attrition rates across various states, providing insights into the educational landscape. Additionally, our Linear Regression model achieved a robust R-squared value of 0.8925 during validation, indicating its accuracy in predicting Expected Scores based on NEET Ranks.
If you're excited to leverage the potential of our project, follow these steps:
- Clone this repository to your local machine.
- Install the required libraries and dependencies listed in the
requirements.txt
file. - Execute the provided scripts to experience the power of predictive modeling and college clustering firsthand.
Your contributions are invaluable to us! Whether it's enhancing existing features, introducing novel insights, or refining the codebase, we welcome your input. Feel free to submit issues or pull requests to be a part of this project.