Skip to content

kalmSveta/AS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

AS

PrecalculateStackingEnergeis.py -k <kmer_length> -g <gt_amount_in_kmer_max>

Need to be run once before usage. Precalculates stacking energies for kmers. -k <kmer_length> - is a minimal length of consicutive stacked nt pairs. Must be the same as used for ./FindPanhandles.py Recommended = 5 -g <gt_amount_in_kmer_max> - maximal number of GT pairs in kmer. Recommended for k = 5 is 2

#######################################################

SelectIntervals.py -a -c <cons_regions> -l <handle_len_min>

Selects conservative intronic regions in coding genes (input for FindPanhandles.py)

-a - path to annotation file in gtf format -c <cons_regions> - path to phastConsElements100way file -l <handle_len_min> - min length of ph handles

##########################################################

FindPanhandles.py -i <intervals_df> -g <genome.fa> -k <kmer_lentgh> -p <panhandle_len_max> -a <handle_len_min> -t -e <energy_max> -s <have_seqs_in_input_df> -u <need_suboptimal> Predicts panhandles with handles in the intervals.

where -i <intervals_df> is a tab-separated tsv file with header like this :

chr start_interval end_interval strand start_gene end_gene chr1 136219 136245 - 134901 139379 chr1 136282 136300 - 134901 139379 chr1 136330 136355 - 134901 139379 chr1 136382 136403 - 134901 139379

or like this: chr start_interval end_interval strand start_gene end_gene sequences chr1 136219 136245 - 134901 139379 GGCTTTGATAAAAA chr1 136282 136300 - 134901 139379 TTTTTATAAAGCC chr1 136330 136355 - 134901 139379 GGCCAGCAGATGG chr1 136382 136403 - 134901 139379 TGACAAACCACAGGACACTACAC

if column 'sequences' is abscent, -g <genome.fa> must be provided. It is a fasta file with sequences which include interval sequences. E.g. a whole human genome. 'sequences' colunm will be extracted automatically

-k <kmer_length> is a minimal length of consicutive stacked nt pairs. Must be the same as used for PrecalculateStackingEnergeis.py. Recommended = 5 -p <panhandle_len_max> - amximum length of looped out region of panhandle. Recommened = 10000 -a <handle_len_min> - minimal length of handles. Recommended = 10 -t - number of threads run in parallel. -e <energy_max> - maximum energy i kcal/mol. Recommended = -15 -u <need_suboptimal> - if True, will try to find suboptimal structures -d <gt_amount_in_kmer_max> - maximal number of GT pairs in kmer. Recommended for k = 5 is 2

###################################################################

AssessFNRate.py -i <intervals_df> -g <genome.fa> -k <kmer_lentgh> -p <panhandle_len_max> -a <handle_len_min> -t -e <energy_max> -u <need_suboptimal> -d <gt_threshold> -N <N_seeds>

Predicts panhandles with handles in the intervals, but randomizes genes

where -i <intervals_df> is a tab-separated tsv file with header like this :

chr start_interval end_interval strand start_gene end_gene chr1 136219 136245 - 134901 139379 chr1 136282 136300 - 134901 139379 chr1 136330 136355 - 134901 139379 chr1 136382 136403 - 134901 139379

or like this: chr start_interval end_interval strand start_gene end_gene sequences chr1 136219 136245 - 134901 139379 GGCTTTGATAAAAA chr1 136282 136300 - 134901 139379 TTTTTATAAAGCC chr1 136330 136355 - 134901 139379 GGCCAGCAGATGG chr1 136382 136403 - 134901 139379 TGACAAACCACAGGACACTACAC

if column 'sequences' is abscent, -g <genome.fa> must be provided. It is a fasta file with sequences which include interval sequences. E.g. a whole human genome. 'sequences' colunm will be extracted automatically

-k <kmer_length> is a minimal length of consicutive stacked nt pairs. Must be the same as used for PrecalculateStackingEnergeis.py. Recommended = 5 -p <panhandle_len_max> - amximum length of looped out region of panhandle. Recommened = 10000 -a <handle_len_min> - minimal length of handles. Recommended = 10 -t - number of threads run in parallel. -e <energy_max> - maximum energy i kcal/mol. Recommended = -15 -u <need_suboptimal> - if True, will try to find suboptimal structures -d <gt_amount_in_kmer_max> - maximal number of GT pairs in kmer. Recommended for k = 5 is 2 -N <N_seeds> - repeats randomization N time

###################################################################

fold.py -f <first_seq> -s <second_seq> -k <kmer_lentgh> -a <handle_len_min> -e <energy_max> -u <need_subopt>

Looks for panhandles in 2 sequences

-f <first_seq> -s <second_seq> - two DNA sequences in witch panhandles handles mihjt appear -a <handle_len_min> - minimal length of handles. Recommended = 10 -k <kmer_length> is a minimal length of consicutive stacked nt pairs. Must be the same as used for PrecalculateStackingEnergeis.py. Recommended = 5 -e <energy_max> - maximum energy i kcal/mol. Recommended = -15 -u <need_suboptimal> - if True, will try to find suboptimal structures -d <gt_amount_in_kmer_max> - maximal number of GT pairs in kmer. Recommended for k = 5 is 2

Outputs a list of structures. Each element consists of: energy in kcal/mol, start_alignment_1, end_alignment1, start_alignment2, end_alignment2, alignment1_sequence1, alignment1_sequence2, structure_in_dor_bracket_notation

##############################################################

MakePretty.py -p <path_to_ph> -g <need_genes> Takes as input output file from FindPanhandles.py. Makes it human-readable

-p <path_to_ph> - path to FindPanhandles.py output file -g <need_genes> - set True, if want add gene id and gene name from annotation file

################################################################

RecalculateAfterMutations.py -p <path_to_ph> -t -r <need_random> -i <title> -m <path_to_mut> -w <path_to_ph_with_mut>

Takes file with panhandles and coords of mutations as input. Intersects panhandle handles and mutations. Recalculates energies of ph after mutations introduction. Recalculates energies of the ph with mutations in the same positions but in random nt (from 2 not taken). Alternatevely can take file with intersected panhandles and mutations as input and just perform recalculation of energies and shuffling.

-p <path_to_ph> - path to input table with panahndles -m <path_to_mut> - path to mutations file OR -w <path_to_ph_with_mut> - path to file with panhandles and mutations

-t - number of threads to run in parallel -r <need_random> - if need recalculating energies for random mutations -i <title> - additional title for the files

################################################################

./Make_big_bed.R ../python_scripts/folding_pretty_copy/out/panhandles_preprocessed.tsv 'all_panhandles'

Makes bed and bigbed from standard panhandle file

################################################################

./Make_handles.R

Makes bed4 from all (left AND right) handles

#################################################################

./TestMutDensity.R path.to.gr1 path.togr2 path.to.mut

Performs Puasson test for mutations density enrichment in target intervas versus population of intervals

#################################################################

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published