Improving simple melodies realtime!
When we play guitar hero or similar, there's a really simple music representation in the five buttons + rhythm thing, yet it still feels like we're playing. It also clearly has some kind of relationship to the melody. What if we let an AI reconstruct music from a simplifyed representation like that? So we could play a really basic melody, and it tries to complete the rest.
Since data availability for guitar hero levels + sheet music/midi isn't great, I will instead focus on transforming a simple midi melody (e.g. played on the keyboard) to full music! This should be possible, since there is plenty of free midi music available for training data.
- parse midi into some
text based formatvector of bytes standardize rhythmsjust use midi quantization- generate simple melody version of training data by stripping chords, drums, fast notes
- transpose to all 12 keys (?)
- train
seq2seqmodel on generating complex data from simplifyed - implement streaming midi to text/image conversion
- feed into network
- restore output to midi
2020-11-26
Read midi, remove percussion, extract melody (? on last one).
2020-11-27
Remove broken files from training data. Discover training data uses full range
of notes from 0 to 127. Implement conversion to string of bytes.
2020-11-28
Simplify melody by removing chords. Implement autocorrelation for finding beat
though performance is not amazing (though maybe ok enough) and simpified rhythm
to have no more than one note per calculated beat.
2020-11-29
Transpose and save files with sensible filename. Set the transposition to
-5 -> +6 semitones, to ensure that there should be good data for most melodies.
I would do more, but had to save on harddrive space... We'll see if it's enough.
2020-11-30
Implement some seq2seq (according to a reference, see code). also implement loading
data and parsing into torch tensors.
2020-12-01
Implement model training (same reference). Probably need to read up a bit more on
exactly how torch wants the data, because there's a lot of errors. Either way,
I can probably get rid of the encoding layer? Also: #devember I guess.
2020-12-02
Time to jump back and start with a minimal working example of pytorch seq2seq!
The minimal example seems to have worked, though I changed the data representation
slightly, making each token a rhythmic time step. This way, the rhythm of the
input should be better preserved during the transformation.
2020-12-03
Initial work on redoing file conversion to be compatible with the new code. I think
I should do the translation dictionaries right away so that I dont have to save
any intermediate (large) tuple formats.
2020-12-04
New conversion up and running, but in the interest of saving hard drive space
I'm gonna have to switch to dynamically generating the training tensors on demand
rather than precomputing and storing them.
2020-12-05
Dynamic generation has a separate issue, in that we need to process everything
so that we can accurately set the output dimension of the network. I think that
using a single network to produce all the output instruments necessitates an
excessively large output language that just won't work. New idea: Use a single
input network and several output networks that play together.
2020-12-06
New training data generation, intended to train four separate seq2seq networks
simultaneously. I have a suspicion that this four-brained-ness won't sound very
good, but I'm gonna try it before I trash it.
2020-12-07
Model training setup with the new data. Chunked the data a bit for performance.
It is still outrageously slow however. I may have to switch approach again,
though I kinda wanna try training it overnight to see what it learns.
2020-12-09
No progress yesterday, as I messed up and crashed the training by giving it an
html file among the midis. Anyway... I'm gonna train once more but also save the
whole model, not just the state dict for easier loading (I think). Also added
some machinery needed to playing the result as sound.
2020-12-10
It learned to always be silent. Nice. Time for a new approach.
2020-12-11
I am going to try a convolutional approach. Basically, represent music as a 2d
vector of time and pitch. The last (time) column holds just the current simplifyed
melody, the rest holds the all the notes played so far. The output is the full
melody at the current step (i.e. what notes are currently playing). So, task
number one will be to convert midi to such a 2d format (and back).
2020-12-12
Beginning of convnet implementation, so I'm learning how to set it up.
2020-12-14
I think it's working and training, so it's time to run for a bit and see if
it works any better! The network architecture is a mixture between a random
guess and the biggest I could fit in RAM. Next time I buy a computer, I guess
I'm gonna bump it up a bit.
2020-12-16
Wrote some code to generate test output. Had to do some extremely silly numpy-
torch interconversion to avoid memory leak from assigning slices in place.
Supposedly setting requires_grad=False should stop that, but it didn't seem to work.
In addition, this is far too slow for real time music generation running on my cpu.
It needs to be about 100-1000 times faster I think. Ultimately, it only learned
to play one note, so, that's not very impressive either...
2020-12-19
I'm gonna try a final attempt using a straight up RNN, though the part I'm
struggling with is the new note hinting from a simplified melody. Ultimately,
the network should process a simple note, but not remember it. The output
from that should be the output, but then also fed back into the network and
remembered. It's easy to achieve during final usage I think, but I'm not sure how
to do it during training.
Why chiptuner?
The training data I have is gameboy/nes music, so, that's what it's gonna learn.