Based on collected contigs and a Hi-C signal, it is necessary to assemble the genome, that is, to orient and order the contigs and estimate the distances between them.
- Randomly choose the initial state (orientation or order)
- Repeat iteratively
- Making a random change (changing orientation or order)
- If the likelihood function (P) has increased, then we accept the change, otherwise we accept with probability: P(new state)/P(old state)
- Run
main_orientation.py
to orient contigs. - Run
main_ordering.py
to order contigs. - Run
gap_size.py
to simulate gap size estimating.
- Python 3.7
Python dependencies can be installed with pip:
pip install -r requirements.txt
Data has to contain information about Hi-C reads in to follow a format
- pairs.txt
| * | name of contig which contains the first piece of read | position first piece of read in contigs | name of contig which contains the second piece of read | position second piece of read in contigs | * | * |
- len.tsv
| name of contig | its length |
- layout.txt has to contain order of contigs with the current orientation
(see data)
- MCMC algorithms http://statweb.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf