758 lines (578 loc) · 24.9 KB

Problem Statement

Our task here is to take some existing music data then train a model using this existing data. The model has to learn the patterns in music that we humans enjoy. Once it learns this, the model should be able to generate new music for us. It cannot simply copy-paste from the training data. It has to understand the patterns of music to generate new music. We here are not expecting our model to generate new music which is of professional quality, but we want it to generate a decent quality music which should be melodious and good to hear.

Understanding the Data

The Input data we are using for developing the model is from .mid file. Let's gain some domain Knowledge.

A MIDI file is not an audio recording. Rather, it is a set of instructions – for example, for pitch or tempo – and can use a thousand times less disk space than the equivalent recorded audio.

To process these files we use Music21

Music21 is a Python-based toolkit for computer-aided musicology.

People use music21 to answer questions from musicology using computers, to study large datasets of music, to generate musical examples, to teach fundamentals of music theory, to edit musical notation, study music and the brain, and to compose music (both algorithmically and directly).

pip install music21

Importing the necessary Libraries

from music21 import converter, instrument, note, chord
import glob
import pickle
import numpy as np
import pandas as pd
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout, Activation
from keras.callbacks import ModelCheckpoint
from keras.utils import plot_model
import os

Extracting the data from music notes

The .mid files are stored in music_notes folder and we are implementing the below code to extract the data from each file and store it in notes list. music21 library modules are utilized for parsing the files


for file in glob.glob('midi_songs/*.mid'):
    print("Parsing %s" % file)
    if parts:[0].recurse()
    for element in notes_to_parse:
        if isinstance(element,note.Note):
        elif isinstance(element, chord.Chord):
            notes.append('.'.join(str(n) for n in element.normalOrder))
saving the data in notes file for futher reuse

with open('notes', 'wb') as filepath:
        pickle.dump(notes, filepath)

Data preprocessing

The Neural network we are creating has LSTM Layers after Input Layer. we need to prepare our data as per it's requirement. At present our data is just a list of notes. We need to create a list of sequences as features and list of their next note as Target variable


A sequence with increment 10


If we take 3 steps and and our data has single feature

x y
10 20 30 40
20 30 40 50
30 40 50 60

and If we give 40,50,60 our model has to predict output as 70.

Our data example:

suppose we have only four notes. Let them be A, B, C, D

and input sequence is AABACCDB

we will create dictionary mapping them to integers

0 1 2 3

Now our input sequence became 00102231

Now we will create a list of sequences X

x y
0 0 1 0
0 1 0 2
1 0 2 2
0 2 2 3
2 2 3 1

Now Y is one hot encoded.

x y
0 0 1 0 0 0 0
0 1 0 0 0 1 0
1 0 2 0 0 1 0
0 2 3 0 0 0 1
2 3 4 0 1 0 0
pitchnames=sorted(set(item for item in notes))

Creating a dictionary mapping the pitched to integers


sequence_length = 100
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
    sequence_in = notes[i:i + sequence_length]
    sequence_out = notes[i + sequence_length]
    network_input.append([note_to_int[char] for char in sequence_in])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = np.reshape(network_input, (n_patterns, sequence_length, 1))
# normalize input
network_input = network_input / float(n_vocab)
network_output = np_utils.to_categorical(network_output)
(57077, 100, 1)
(57077, 358)

Model Building

Our model will take 100 notes and predict the 101 one and and now 102 note is produced by feeding 2-101 notes and so on...

Key Layer for our model is LSTM. Let's know a little bit about it.


  1. Forget Gate

Bob is nice person but Alice is evil

As soon as the first full stop after “person” is encountered, the forget gate realizes that there may be a change of context in the next sentence. As a result of this, the subject of the sentence is forgotten and the place for the subject is vacated. And when we start speaking about “Dan” this position of the subject is allocated to “Dan”. This process of forgetting the subject is brought about by the forget gate.

  1. Input Gate

Bob knows swimming. He told me over the phone that he served for navy for 4 years

Now the important information here is that “Bob” knows swimming and that he has served the Navy for four years. This can be added to the cell state, however, the fact that he told all this over the phone is a less important fact and can be ignored. This process of adding some new information can be done via the input gate.

  1. Output Gate

Bob fought single handedly with the enemy and died for his country. For his contributions brave____________

In this phrase, there could be a number of options for the empty space. But we know that the current input of ‘brave’, is an adjective that is used to describe a noun. Thus, whatever word follows, has a strong tendency of being a noun. And thus, Bob could be an apt output.

This job of selecting useful information from the current cell state and showing it out as an output is done via the output gate.

model = Sequential()
    input_shape=(network_input.shape[1], network_input.shape[2]),
model.add(LSTM(512, return_sequences=True))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
Saving the model and Model Graph

# Save the model'music_generator.h5')
plot_model(model, to_file='model.png')

Generating a sequence of notes

# pick a random sequence from the input as a starting point for the prediction
start = np.random.randint(0, len(network_input)-1)

int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

pattern = network_input[start]
prediction_output = []


for note_index in range(500):


Saving the sequence of notes into .mid file

from music21 import instrument, note, stream, chord

offset = 0
output_notes = []

# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
    # pattern is a note
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)

midi_stream.write('midi', fp='new_test_output_168.mid')