- Project Overview
- Key Features
- Dataset Description
- Model Comparison
- Performance Metrics
- Installation Guide
- Results Visualization
- Model Selection Strategy
- License
- Contact Information
This project compares the performance of three classification algorithms on music genre prediction. The implemented models achieve up to 97.5% accuracy on test data, demonstrating effective genre classification capabilities.
- Comprehensive data preprocessing
- StandardScaler feature normalization
- Cross-validation implementation
- Multiple classifier comparison
- Visual performance analysis
File: music_clean.csv
Samples: 7 (sample shown)
Features: 12 audio characteristics + genre label
Feature | Description | Range |
---|---|---|
popularity |
Track popularity score | 0-100 |
acousticness |
Confidence measure of acoustic sound | 0.0-1.0 |
danceability |
Suitability for dancing | 0.0-1.0 |
duration_ms |
Track length in milliseconds | >0 |
energy |
Perceived intensity/activity | 0.0-1.0 |
instrumentalness |
Presence of vocal content | 0.0-1.0 |
liveness |
Presence of live audience | 0.0-1.0 |
loudness |
Overall loudness (dB) | -60-0 |
speechiness |
Presence of spoken words | 0.0-1.0 |
tempo |
Estimated beats per minute | >0 |
valence |
Musical positiveness | 0.0-1.0 |
genre |
Target classification label | Categorical |
Model | Type | Parameters |
---|---|---|
Logistic Regression | Linear | Default sklearn params |
KNN | Instance-based | n_neighbors=5 |
Decision Tree | Tree-based | Gini impurity, depth=None |
Model | CV Accuracy (Mean) | Test Accuracy |
---|---|---|
Logistic Regression | 93% | 86% |
KNN | 92% | 86% |
Decision Tree | 100% | 97.5% |
- Clone repository:
git clone https://github.com/barisgudul/Classification-Model-Selection.git
cd Classification-Model-Selection
- Launch Jupyter:
jupyter notebook "Classification Model Comparison/main.ipynb"
Visual comparison of model stability across 6 folds - Decision Tree shows perfect consistency
Final evaluation metrics comparison - Decision Tree achieves 97.5% test accuracy
-
Decision Tree Dominance
🚩 Perfect 100% cross-validation accuracy suggests:- Potential overfitting to training data
- High variance in model structure
- Possible need for pruning/regularization
-
Algorithm Consistency
✅ Stable performers:- KNN: 92% → 86% (CV → Test)
- Logistic Regression: 93% → 86%
- Minimal performance drop indicates good generalization
Algorithm Type | Representative Model | Rationale | Key Characteristics |
---|---|---|---|
🧪 Linear Classifier | Logistic Regression | Baseline performance measurement | - Linear decision boundaries |
🔎 Instance-Based | KNN (k=5) | Local pattern capture | - Distance-based similarity |
🌳 Non-Parametric | Decision Tree | Complex relationship modeling | - Feature importance analysis |
Permissions:
✅ Free academic/research use
✅ Modification and redistribution
❌ Commercial use requires written consent
Full license terms available in LICENSE file.
Contribution Guidelines:
We welcome collaborations! Please reach out via email before submitting PRs.