The aim of the course was to introduce students with a computer science and mathematics background to data-driven problems from molecular biology (focusing on analysis of protein and nucleic acid sequences). We have been presented some of the mathematical models and computational methods used today in molecular sequence analysis.
As we are spending our time in quarantine, I thought we might try and consider reconstructing some phylogenetic trees that will tell us something about the evolutionary history of the coronavirus 2019-nCov. In particular, we will try and replicate (not completely, but the general idea) the panels b and c from Figure 2 in this paper. The idea is to create and compare two phylogenetic trees created from the whole genome sequence of the coronavirus and from its spike protein.
tree building method | complete genome | spike protein |
---|---|---|
upgma | ||
nj |
In our second assignment, we will try to combine a few of the problems we have encountered: Searching for related sequences, identifying regulatory regions and motifs and enrichment analysis.
Full assignment available here.
Results are available in a report (only in Polish)