Classification Model Comparison

📝 Table of Contents

Project Overview
Key Features
Dataset Description
Model Comparison
Performance Metrics
Installation Guide
Results Visualization
Model Selection Strategy
License
Contact Information

🌟 Project Overview

This project compares the performance of three classification algorithms on music genre prediction. The implemented models achieve up to 97.5% accuracy on test data, demonstrating effective genre classification capabilities.

🚀 Key Features

Comprehensive data preprocessing
StandardScaler feature normalization
Cross-validation implementation
Multiple classifier comparison
Visual performance analysis

📊 Dataset Description

File: music_clean.csv
Samples: 7 (sample shown)
Features: 12 audio characteristics + genre label

Feature	Description	Range
`popularity`	Track popularity score	0-100
`acousticness`	Confidence measure of acoustic sound	0.0-1.0
`danceability`	Suitability for dancing	0.0-1.0
`duration_ms`	Track length in milliseconds	>0
`energy`	Perceived intensity/activity	0.0-1.0
`instrumentalness`	Presence of vocal content	0.0-1.0
`liveness`	Presence of live audience	0.0-1.0
`loudness`	Overall loudness (dB)	-60-0
`speechiness`	Presence of spoken words	0.0-1.0
`tempo`	Estimated beats per minute	>0
`valence`	Musical positiveness	0.0-1.0
`genre`	Target classification label	Categorical

🧠 Model Comparison

Implemented Algorithms:

Model	Type	Parameters
Logistic Regression	Linear	Default sklearn params
KNN	Instance-based	n_neighbors=5
Decision Tree	Tree-based	Gini impurity, depth=None

📈 Performance Metrics

Model Evaluation Matrix

Model	CV Accuracy (Mean)	Test Accuracy
Logistic Regression	93%	86%
KNN	92%	86%
Decision Tree	100%	97.5%

📥 Installation Guide

Clone repository:

git clone https://github.com/barisgudul/Classification-Model-Selection.git
cd Classification-Model-Selection

Launch Jupyter:

jupyter notebook "Classification Model Comparison/main.ipynb"

📊 Results Visualization

Cross-Validation Accuracy Distribution

Visual comparison of model stability across 6 folds - Decision Tree shows perfect consistency

Test Set Performance Comparison

Final evaluation metrics comparison - Decision Tree achieves 97.5% test accuracy

🔍 Critical Insights

Model Behavior Analysis

Decision Tree Dominance
🚩 Perfect 100% cross-validation accuracy suggests:
- Potential overfitting to training data
- High variance in model structure
- Possible need for pruning/regularization
Algorithm Consistency
✅ Stable performers:
- KNN: 92% → 86% (CV → Test)
- Logistic Regression: 93% → 86%
- Minimal performance drop indicates good generalization

📚 Model Selection Strategy

Algorithm Type	Representative Model	Rationale	Key Characteristics
🧪 Linear Classifier	Logistic Regression	Baseline performance measurement	- Linear decision boundaries
🔎 Instance-Based	KNN (k=5)	Local pattern capture	- Distance-based similarity
🌳 Non-Parametric	Decision Tree	Complex relationship modeling	- Feature importance analysis

📄 License

MIT License

Permissions:
✅ Free academic/research use
✅ Modification and redistribution
❌ Commercial use requires written consent

Full license terms available in LICENSE file.

📧 Contact Information

Project Maintainer

Contribution Guidelines:
We welcome collaborations! Please reach out via email before submitting PRs.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Classification Model Comparison		Classification Model Comparison
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification Model Comparison

📝 Table of Contents

🌟 Project Overview

🚀 Key Features

📊 Dataset Description