Skip to content

tskir/CWL_viral_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CWL_viral_pipeline

Running full pipeline from CLI

Structure of pipeline

Input: assembly file NAME.fasta

1. Filtering contigs
Filter contigs by length threshold in kb (default: 5kb).
Input: NAME.fasta
Output: NAME_filt500bp.fasta

2.1. VirSorter
Mining viral signal from microbial genomic data. Tool generates folder Predicted_viral_sequences (relevant are VIRSorter_cat-[123].fasta and VIRSorter_prophages_cat-[45].fasta).
Input: NAME_filt500bp.fasta
Output: Predicted_viral_sequences

2.2. VirFinder
R package for identifying viral sequences from metagenomic data using sequence signatures.
Input: NAME_filt500bp.fasta
Output: VirFinder_output.tsv

3. Parsing virus files
According of results on previous steps script generates High_confidence, Low_confidence and Prophages files. Some of output files may be missing.
Input:
- NAME_filt500bp.fasta
- VirFinder_output.tsv
- Predicted_viral_sequences
Output:
- High_confidence.fna
- Low_confidence.fna
- Prophages.fna

4. Prodigal
Tool predicts proteins for each input fasta-file.
Input: output files of step #3
- High_confidence.fna
- Low_confidence.fna
- Prophages.fna
Output:
- High_confidence_prodigal.faa
- Low_confidence_prodigal.faa
- Prophages_prodigal.faa

5. HMMSCAN
HMMSCAN is used to search protein sequences against collections of protein profiles.
Input: output files of step #4
- High_confidence_prodigal.faa
- Low_confidence_prodigal.faa
- Prophages_prodigal.faa
Output:
- High_confidence_prodigal_hmmscan.tbl
- Low_confidence_prodigal_hmmscan.tbl
- Prophages_prodigal_hmmscan.tbl

6. Table(s) processing
Scripts add titles to columns and separate columns with tabs.
Input:
- High_confidence_prodigal_hmmscan.tbl
- Low_confidence_prodigal_hmmscan.tbl
- Prophages_prodigal_hmmscan.tbl
Output:
- High_confidence_prodigal_hmmscan_modified.faa
- Low_confidence_prodigal_hmmscan_modified.faa
- Prophages_prodigal_hmmscan_modified.faa

7. Ratio evalue table
Generates tabular file (File_informative_ViPhOG.tsv) listing results per protein, which include the ratio of the aligned target profile and the abs value of the total Evalue.
Input:
- High_confidence_prodigal_hmmscan_modified.faa
- Low_confidence_prodigal_hmmscan_modified.faa
- Prophages_prodigal_hmmscan_modified.faa
Output:
- High_confidence_prodigal_hmmscan_modified_informative.tsv
- Low_confidence_prodigal_hmmscan_modified_informative.tsv
- Prophages_prodigal_hmmscan_modified_informative.tsv

8. Annotation
Script generates tabular output for each viral prediction file which summarizes the ViPhOG annotations for all the corresponding predicted proteins.
Input:
- High_confidence.fna
- High_confidence_prodigal_hmmscan_modified_informative.tsv
- High_confidence.fna
- Low_confidence.fna
- Low_confidence_prodigal_hmmscan_modified_informative.tsv
- Low_confidence.fna
- Prophages.fna
- Prophages_prodigal_hmmscan_modified_informative.tsv
- Prophages.fna
Output:

- High_confidence_prodigal_hmmscan_modified_informative_prot_ann_table.tsv
- Low_confidence_prodigal_hmmscan_modified_informative_prot_ann_table.tsv
- Prophages_prodigal_hmmscan_modified_informative_prot_ann_table.tsv

9.1. Mapping
Script creates an output directory for each viral prediction file and generates contig maps for each viral contig in pdf format, which are then stored in the created output director.
Input:

- High_confidence_prodigal_hmmscan_modified_informative_prot_ann_table.tsv
- Low_confidence_prodigal_hmmscan_modified_informative_prot_ann_table.tsv
- Prophages_prodigal_hmmscan_modified_informative_prot_ann_table.tsv
Output:

- High_confidence_mapping_results
- Low_confidence_mapping_results
- Prophages_mapping_results

9.2. Assign taxonomy
Script generates tabular file with taxonomic assignment of viral contigs based on ViPhOG annotations.
Input:

- High_confidence_prodigal_hmmscan_modified_informative_prot_ann_table.tsv
- Low_confidence_prodigal_hmmscan_modified_informative_prot_ann_table.tsv
- Prophages_prodigal_hmmscan_modified_informative_prot_ann_table.tsv
Output:
- High_confidence_prodigal_hmmscan_modified_informative_prot_ann_table_tax_assign.tsv
- Low_confidence_prodigal_hmmscan_modified_informative_prot_ann_table_tax_assign.tsv
- Prophages_prodigal_hmmscan_modified_informative_prot_ann_table_tax_assign.tsv

          Assembly
             |
          Length filter
             |        \
             |         \
          VirFinder  VirSorter
             |         /
             |        /
          Parsing virus files
                   |
                   |
                Prodigal             -- S
                   |    \               u
               HMMscan   \              b
                   |      \             W
            Modification   |            o
                   |      /             r
                   |     /              k
                  Annotation            F
                     |    \             l
                     |     \            o
                  Mapping   Assign   -- w
                                              

Example output directory structure


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published