Fonda

Fonda is a framework that offers a scalable and automatic analysis of multiple NGS sequencing data types.

Fonda Prebuilt binaries
Required environment setup
Build Fonda
Fonda installation
Available workflows in Fonda
Before running Fonda…
Run Fonda
Contributors
Publications

Fonda Prebuilt binaries

All the binaries, built by the CI process (described in CONTRIBUTING.md) are available via the Download page and the GitHub Release page

Required environment setup

Unix
Java 8

Build Fonda

To launch all unit and integration tests run the command:

./gradlew test

To launch all unit and integration tests, to perform the source code analysis (via PMD), to check the code adherement to a coding standard (via checkstyle) and to count the code coverage (via JaCoCo) run the command:

./gradlew check

To build Fonda run the command:

./gradlew clean build zip

clean - deletes the Fonda build directory for a fresh compile
build - creates Fonda .jar file and src folder in build/libs
zip - packs Fonda .jar and src folder into a zip file located in build/distributions

Note: before building a specific Fonda version, please check the Fonda version in the build.gradle file is the correct one.

Fonda installation

Fonda package contains two components:

Fonda .jar file
src folder

If the src_scripts option in global config is not set, please make sure src folder and .jar file are put in the same parental directory for proper usages. This is necessary because Fonda needs to call some external scripts from src folder (python and R subfolders) in some pipeline usages.
For different pipeline utilities, the user needs to make sure the corresponding software prerequisites are properly installed before executing a specific Fonda pipeline. The user can check the required software and databases in the global_config files.

Available workflows in Fonda

Workflow	Description
DnaCaptureVar_Fastq	DNA Captured sequencing data for genomic variant detection using fastq data
DnaCaptureVar_Bam	DNA Captured sequencing data for genomic variant detection using bam data
DnaAmpliconVar_Fastq	DNA Amplicon sequencing data for genomic variant detection using fastq data
DnaAmpliconVar_Bam	DNA Amplicon sequencing data for genomic variant detection using bam data
DnaWgsVar_Fastq	DNA whole genome sequencing data for genomic variant detection using fastq data
DnaWgsVar_Bam	DNA whole genome sequencing data for genomic variant detection using bam data
RnaCaptureVar_Fastq	RNA Captured sequencing data for genomic variant detection using fastq data
HlaTyping_Fastq	DNA sequencing data for genomic HLA type prediction using fastq data
Bam2Fastq	Convert bam file to fastq files
RnaExpression_Fastq	RNA sequencing data for gene expression analysis using fastq data
RnaExpression_Bam	RNA sequencing data for gene expression analysis using bam data
scRnaExpression_Fastq	single cell RNA sequencing data for gene expression analysis using fastq data
scRnaExpression_CellRanger_Fastq	10X single cell RNA/TCR/BCR sequencing data for gene expression and immune profiling analysis using fastq data
scRnaExpression_Bam	single cell RNA sequencing data for gene expression analysis using bam data
RnaFusion_Fastq	RNA sequencing data for gene fusion detection using fastq data
TcrRepertoire_Fastq	DNA or RNA sequencing data for TCR or BCR repertoire detection using fastq data

Before running Fonda…

Show help message

java -jar fonda-<VERSION>.jar -help

Possible options:

Option	Description
Required
`-global_config` <arg>	Configuration file for the particular workflow
`-study_config` <arg>	Configuration file for the specific study
Non-required
`-detail`	Show the details of the Fonda framework
`-local`	Default: no. Running the job on local machine
`-test`	Default: no. Test the commands without actually running the job
`-sync`	Default: no. Running Fonda in asynchronous mode, waiting for all tasks to complete
`-master`	Default: no. Running the main master script to manage all Fonda created scripts
`-help`	Show help utility message

Elaboration of required config arguments

-global_config file - sets a configuration file for a particular pipeline version (such as RnaExpression_Fastq 1095.1). In the config file, there are 4 sections:

[all_tools] - contains paths to used tools
[Databases] - contains input data/paths to input datasets
[Pipeline_Info] - contains workflow and toolset settings
[Queue_Parameters] - contains sge settings

If the user likes to change a parameter, a new version should be generated and recorded. However, different studies can share an identical pipeline.

Available parameter options for the global_config files you can see here.
Examples of the global_config files you can see here.

Please keep in mind that in each global_config file the only tools and databases are included that are required for executing this specific pipeline version.
For example, global_config_RnaExpression_Fastq_v1.1.txt may list out the databases, tools and parameters for a particular RnaExpression_Fastq pipeline version 1. Later on, global_config_RnaExpression_Fastq_v1.2.txt may be prepared for another RnaExpression_Fastq pipeline version 2. In the second config the required databases, tools and parameters might be quite different from the first one.
Therefore, all potential databases, tools and parameter options for each available workflow shall be listed out to make sure users can take the full advantage of using Fonda in different projects.

To control the line-endings behavior the line_ending option was introduced in the [Pipeline_Info] section. The option can be specified as LF (Unix-style end-of-line marker) or CRLF (Windows-style end-of-line marker) value. If the option is not specified, the LF line separator was set as the default one.

-study_config file - sets a configuration file for a particular study - for cases when a specific study is selected to perform the NGS data analysis. In this config file, there is 1 section - [Series_Info].
Required parameters for each workflow:

Parameter	Description
job_name	Sets the job ID
dir_out	Sets the output directory for the analysis
fastq_list / bam_list	Sets the path to the input manifest file
LibraryType	Sets the sequencing library type - DNAWholeExomeSeq_Paired, DNAWholeExomeSeq_Single, DNATargetSeq_Paired, DNATargetSeq_Single, DNAAmpliconSeq_Paired, RNASeq_Paired, RNASeq_Single, etc.
DataGenerationSource	Sets the data generation source - Internal, IGR, Broad, etc.
Date	Sets the sequencing run date
Project	Sets the project ID
Run	Sets the run ID

The format of input manifest files see here.
Examples of the study_config files you can see here.

Elaboration of additional arguments

-help - to show the help message
-detail - to show the workflow details available in the current Fonda framework
-local - to run the job on the local machine without being submitted to the cluster
-test - to have a pilot run in the command line interface without actually submitting jobs to the cluster

Run Fonda: actual example for RnaExpression_Fastq workflow

Test mode

java -jar /path_to_data/fonda/<VERSION>/fonda-<VERSION>.jar -global_config /path_to_data/fonda/global_config/global_config_RnaExpression_Fastq_v1.1.txt -study_config /path_to_data/config_RnaExpression_Fastq_test.txt -test

For the test mode, no job will be submitted to the cluster for actual run. In this case, you will be able to check whether the contents in each shell scripts are properly organized. This is important for debugging purposes.

Submit jobs to cluster

java -jar /path_to_data/fonda/<VERSION>/fonda-<VERSION>.jar -global_config /path_to_data/fonda/global_config/global_config_RnaExpression_Fastq_v1.1.txt -study_config /path_to_data/config_RnaExpression_Fastq_test.txt

Local machine mode

java -jar /path_to_data/fonda/<VERSION>/fonda-<VERSION>.jar -global_config /path_to_data/fonda/global_config/global_config_RnaExpression_Fastq_v1.1.txt -study_config /path_to_data/config_RnaExpression_Fastq_test.txt -local

For the local machine mode, the individual jobs will be run on the local machine, without being submitted to the cluster.
In this case, scripts will be the same as in the cluster mode. The only difference is the jobs are not submitted to the cluster. This is important for debugging purpose.

Contributors

Shu Yan ¹
Tenghui Chen ¹
Joon Sang Lee ¹
Chandra Sekhar Pedamallu ¹
Mark Magid ¹
Quan Wan ¹
Ei-Wen Yang ¹
Donald Jackson ¹
Jack Pollard ¹
Aleksandr Sidoruk ²
Mariia Zueva ²
Mikhail Alperovich ²
Yulia Kamyshova ²

¹ Sanofi, 270 Albany Street, Cambridge, MA, USA

² EPAM Systems, Inc.

Publications

Links to publications that contain Fonda references

A Comprehensive Sample Tracking and Data Processing Workflow for Next Generation Sequencing

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
config		config
doc		doc
example		example
gradle/wrapper		gradle/wrapper
src		src
.appveyor.yml		.appveyor.yml
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
lombok.config		lombok.config
settings.gradle		settings.gradle

License

epam/fonda

Folders and files

Latest commit

History

Repository files navigation