This code accompanies the "Simulating protein sequences using Recurrent Neural Networks" post at BioLearning blog.
A character-level Long Short-Term Memory (LSTM) RNN model to simulate protein sequences. It takes a multi-sequence fasta file of a set of proteins as input for training and generates a desired number of simulated protein sequences and saves them into a multi-sequence fasta file. User can tune the hyperparameters of the model, for example the number of training epochs before starting to simulate the sequences, in the train.py file.
- Python 3
- Tensorflow > 1.0.1
- keras 2.0.6
python train.py