This repository contains the code for a music genre classifier written in Python using Tensorflow and Flask. You can try it out for yourself on Heroku, though be warned that it might fail for large files due to the computational limits of Heroku.
This project was inspired by the FMA dataset, though due to technical issues with this dataset I decided to instead use the GTZAN dataset.
The model works by starting with a song and splitting it up into ten small chunks. Each chunk is then processed by extracting mel-frequency cepstral coefficients (MFCCs) over many tiny segments, producing an image like the one below:
The genre is then predicted using a convolutional neural network, a typical architecture suitable for image-like data such as this.
Follow these steps if you wish to try out the code on your own machine.
Install the prerequisites by creating a new anaconda environment:
conda env create -f environment.yml
conda activate genre_rec
If you want to test the server functionality with just a local flask server, follow these steps. Run the server:
python app.py
Then visit localhost:5000
in your web browser.
If you wish to recreate the training process, first download the GTZAN dataset and refer to the steps below.
Once you have downloaded the GTZAN dataset, run the preprocessing script:
python classifier/preprocess.py
This script will extract MFCCs (mel-frequency cepstral coefficients) from the .wav
files and store the
data and labels in a .json
file.
You can view available models to train in the models.py
file.
Currently there is logistic regression and a convolutional neural network avialable to train.
Modify the model creation section in train.py
and run
python classifier/train.py
which will give you a model summary, training information, and evaluation diagnostics.
- Refactor code into scripts
- Get Flask server working
- Ping server with client
- Deploy to Heroku