Skip to content

Code to create a PRG from a Multiple Sequence Alignment file

License

Notifications You must be signed in to change notification settings

bricoletc/make_prg

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

make_prg

Code to create a PRG for input to Pandora (https://github.com/rmcolq/pandora) from a Multiple Sequence Alignment file.

Requirements Expects you to have python3 and nextflow installed and in your path and a config file for nextflow set up if you are working on a cluster. Can run on python2.7+ if the command in nextflow file is edited. Note that nextflow does not play nicely when files are in mounted or shared folders.

Usage

Usage: nextflow run make_prg_nexflow.nf <arguments>

Required arguments:
  --tsv_in  FILENAME  An index file of MSA to build PRGs of

Optional arguments:

Download and install

git clone https://github.com/rmcolq/make_prg.git
cd make_prg
python3 setup.py test
python3 setup.py install

That installed the script make_prg. The nextflow script make_prg_nexflow.nf assumes that make_prg is installed.

Input Multiple Sequence Alignment files for genes/dna sequences for which we want PRGs, and an tab-separated index of these in the form:

sample_id       infile
GC0000001   /absolute/path/to/GC0000001_na_aln.fa.gz
GC0000002   /absolute/path/to/GC0000002_na_aln.fa

Changing parameters

There are some parameters at the top of the nextflow file which could be changed but which I have not made command line parameters:

max_nesting             This is the maximum number depth of bubbles in PRG, setting to 1 will allow variants, \\
                        but no nesting
min_match_length        Controls graph complexity
alignment_format        Any format accepted by biopython's AlignIO
max_forks_make_prg      If working on a cluster which allows unlimited parallel jobs per user, this will be \\
                        used by nextflow to control maximum number of processes of this type that can run in \\
                        parallel.
max_forks_make_fasta

About

Code to create a PRG from a Multiple Sequence Alignment file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.9%
  • Nextflow 3.1%