Skip to content

TAGC-NetworkBiology/MetamORF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MetamORF: A repository of unique short Open Reading Frames identified by both experimental and computational approaches for gene-level and meta analysis

Goal of the repository

This GitHub repository contains the instructions and material to reproduce the build of the MetamORF database (web interface available at http://metamorf.hb.univ-amu.fr). Extensive documentation, source code, scripts and containers are available in this repository. Builded Docker/Singularity images are available on download and required data may be downloaded from their original sources. Instructions necessary to reproduce the analysis are provided below.

To build the database, you first need to prepare the environments and then follow the steps described there.

Description of the datasets

In order to reproduce the build of MetamORF, it is first necessary to download data from the 6 original data sources. Data have to be downloaded manually from the editors' or authors' websites.

Name of the data source Species Associated publication doi URL to the publication Description
Erhard2018 H.sapiens Erhard et al., Nat. Meth., 2018 10.1038/nmeth.4631 https://www.nature.com/articles/nmeth.4631 "Supplementary Table 3: Identified ORFs (Union of all ORFs detected either by PRICE,RP-BP or ORF-RATER, or contained in the annotation (Ensembl V75))". The two first lines of the file have to be manually removed.
Johnstone2016 H.sapiens Johnstone et al., EMBO, 2016 10.15252/embj.201592759 http://emboj.embopress.org/content/35/7/706.long "Dataset EV2: Location and translation data for all analyzed transcripts and ORFs in human"
Johnstone2016 M.musculus Johnstone et al., EMBO, 2016 10.15252/embj.201592759 http://emboj.embopress.org/content/35/7/706.long "Dataset EV3: Location and translation data for all analyzed transcripts and ORFs in mouse"
Laumont2016 H.sapiens Laumont et al., Nat. Commun., 2016 10.1038/ncomms10238 https://www.nature.com/articles/ncomms10238 "Supplementary Data 2: List of all cryptic MAPs detected in subject 1. Table presenting the genomic and proteomic features of all cryptic MAPs". The two first rows have to be manually removed.
Mackowiak2015 H.sapiens Mackowiak et al., Genome Biol., 2015 10.1186/s13059-015-0742-x https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0742-x "Additional file 2: Table S1. All sORF information for human". The file header has to be removed manually (45 first rows).
Mackowiak2015 M.musculus Mackowiak et al., Genome Biol., 2015 10.1186/s13059-015-0742-x https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0742-x "Additional file 3: Table S2. All sORF information for mouse". The file header has to be removed manually (45 first rows).
Samandi2017 H.sapiens Samandi et al., eLIFE, 2017 10.7554/eLife.27860 https://elifesciences.org/articles/27860 "Homo sapiens alternative protein predictions based on RefSeq GRCh38 (hg38) based on assembly GCF_000001405.26. Release date 01/01/2016". The TSV file has been used.
Samandi2017 M.musculus Samandi et al., eLIFE, 2017 10.7554/eLife.27860 https://elifesciences.org/articles/27860 "Mus musculus alternative protein predictions based on annotation version GRCm38. Release date 01/01/2016". The TSV file has been used.
sORFs_org_Human H.sapiens Olexiouk et al., Nucl. Ac. Res., 2018 10.1093/nar/gkx1130 https://academic.oup.com/nar/article/46/D1/D497/4621340 H. sapiens database downloaded from sORFs.org using the Biomart Graphic User Interface. The following parameters were used to query the database: "Homo sapiens" > "no filters" > "select all MAIN_ATTRIBUTES" > "results" > "download data".
sORFs_org_Mouse H.sapiens Olexiouk et al., Nucl. Ac. Res., 2018 10.1093/nar/gkx1130 https://academic.oup.com/nar/article/46/D1/D497/4621340 M. musculus database downloaded from sORFs.org using the Biomart Graphic User Interface. The following parameters were used to query the database: "Mus musculus" > "no filters" > "select all MAIN_ATTRIBUTES" > "results" > "download data".

System requirement and dependencies

The source code must be executed with Python 2.7. The MetamORF database has been build on a Linux system using Docker and Singularity containers, but it could be run on any operating system where dependencies are satisfied. A minimum of 62 GB of RAM memory, 12 threads is highly recommended to run the software but we highly recommend to use systems with at least 40 threads, 192 GB of RAM available if you intend to build the full database. A stable connection to the Internet is also required as some information are queried from different databases accessible online.

Environment preparation

In order to prepare the environment for analysis execution, it is required to:

  1. Clone the current GitHub repository and set the WORKING_DIR environment variable
  2. Download the data sources and the cross-references
  3. Download the Docker image (.tar.gz) and Singularity image (.img) files
  4. Install Docker, [Docker-compose](https://docs.docker.com/compose/ and Singularity
  5. Load the Docker images on your system and start the containers

This section provides additional information for each of these steps.

Clone the GitHub repository

Use you favorite method to clone this repository in a chosen folder (see GitHub documentation for more information). This will create a folder called MetamORF containing all the code and documentation.

Then, set an environment variable called WORKING_DIR with a value set to the path to this folder. For instance, if you cloned the repository in /home/choteaus/workspace, then the WORKING_DIR variable needs be set to /home/choteaus/workspace/MetamORF.

On Linux:

export WORKING_DIR=/home/choteaus/workspace/MetamORF

Download the data sources and the cross-references

Cross-references

Cross-references have been downloaded manually from the HGNC and NCBI websites. Cross-references from the HUGO Gene Nomenclature Committee (HGNC) have been downloaded using the graphic user interface. Cross-references for M.musculus from the National Center for Biotechnology Information (NCBI) may be download using the following command line (on Linux):

mkdir -p $WORKING_DIR/07_input/cross_references \
  && curl -o $WORKING_DIR/07_input/cross_references/mmusculus.gene_info.gz \
    -O ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz \
  && gunzip $WORKING_DIR/07_input/cross_references/mmusculus.gene_info.gz

As these databases evolve quickly, a copy of the cross-reference files we used to build MetamORF is available on Zenodo (DOI) as a .tar.gz archive. On Linux, run the following command line to download the archive and uncompress it at the appropriate location:

wget https://zenodo.org/record/4014738/files/MetamORF_07_input.tar.gz?download=1 -O $WORKING_DIR/MetamORF_07_input.tar.gz \
  && tar xzvf $WORKING_DIR/MetamORF_07_input.tar.gz --directory $WORKING_DIR \
  && rm $WORKING_DIR/MetamORF_07_input.tar.gz
Name of the cross-reference Species Filename
HGNC H.sapiens hsapiens_HGNC.txt
NCBI M.musculus mmusculus.gene_info

Data sources

Raw data need to be downloaded manually from editors' or authors' website using information provided in the Description of the datasets section of the current documentation. Once downloaded, the files need to be saved in the $WORKING_DIR/07_input/ORF_datasources folder and renamed using the following rules:

Name of the data source Species Filename
Erhard2018 H.sapiens hsapiens_Erhard2018.csv
Johnstone2016 H.sapiens hsapiens_Johnstone2016.txt
Johnstone2016 M.musculus mmusculus_Johnstone2016.txt
Laumont2016 H.sapiens hsapiens_Laumont2016.csv
Mackowiak2015 H.sapiens hsapiens_Mackowiak2015.txt
Mackowiak2015 M.musculus mmusculus_Mackowiak2015.txt
Samandi2017 H.sapiens hsapiens_Samandi2017.tsv
Samandi2017 M.musculus mmusculus_Samandi2017.tsv
sORFs_org_Human H.sapiens hsapiens_sORFs.org.txt
sORFs_org_Mouse M.musculus mmusculus_sORFs.org.txt

Once done, you should obtain the following subfolder structure:

07_input
β”œβ”€β”€cross_references
β”‚   β”œβ”€β”€ hsapiens_HGNC.txt
β”‚   └── mmusculus.gene_info
└── ORF_datasources
    β”œβ”€β”€ hsapiens_Erhard2018.csv
    β”œβ”€β”€ hsapiens_Johnstone2016.txt
    β”œβ”€β”€ mmusculus_Johnstone2016.txt 
    β”œβ”€β”€ hsapiens_Laumont2016.csv
    β”œβ”€β”€ hsapiens_Mackowiak2015.txt
    β”œβ”€β”€ mmusculus_Mackowiak2015.txt
    β”œβ”€β”€ hsapiens_Samandi2017.tsv
    β”œβ”€β”€ mmusculus_Samandi2017.tsv
    β”œβ”€β”€ hsapiens_sORFs.org.txt
    └── mmusculus_sORFs.org.txt

Download the Docker and Singularity images

Docker image tar file and Singularity img files are available to download on Zenodo (DOI).

If you intend to build yourself the containers, the dockerfile and their documentation are available in the 02_container folder. Note that some links of the dockerfile may be obsolete and we strongly suggest to use the containers stored on Zenodo.

To download the containers from Zenodo, open a shell and execute the following commands to download the tarball file and untar it (on Linux):

rm -R $WORKING_DIR/02_container \
  && wget https://zenodo.org/record/4014738/files/MetamORF_02_container.tar.gz?download=1 -O $WORKING_DIR/MetamORF_02_container.tar.gz \
  && tar xzvf $WORKING_DIR/MetamORF_02_container.tar.gz --directory $WORKING_DIR \
  && rm $WORKING_DIR/MetamORF_02_container.tar.gz

These commands will replace the 02_container folder and you should obtain the following subfolder structure:

02_container/
β”œβ”€β”€ mysql
β”‚Β Β  β”œβ”€β”€ docker-compose.yml
β”‚Β Β  └── readme.txt
β”œβ”€β”€ script
β”‚Β Β  β”œβ”€β”€ 01_crossreferences_download
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ dockerfile
β”‚Β Β  β”‚Β Β  └── readme.txt
β”‚Β Β  └── 03_orf_datasources_analysis
β”‚Β Β      β”œβ”€β”€ dockerfile
β”‚Β Β      └── readme.txt
└── src
    β”œβ”€β”€ dockerfile
    β”œβ”€β”€ readme_dev.txt
    β”œβ”€β”€ readme.txt
    └── tagc-uorf-orf-datafreeze-src.img

Install Docker and Singularity

You need to install Docker (v18.09), Docker-compose (v1.24) and Singularity (v2.5) on your system. Please, read their official documentation for more information.

Load Docker images on the system

In order to build the MetamORF database, you must load the provided docker images onto your Docker. Docker must be installed on your system.

Start the MySQL service

  1. Update the line 12 and 13 of the 02_container/mysql/docker-compose.yml file. The ~/MetamORF/data/MySQL path needs to be changed by the place where you are willing to store the MySQL files. The lines 11 to 13 can eventually be removed if you do not intend to mount a volume on the container.
  2. Eventually update the line 15 to 17 to update the MySQL user name (15), user (16) or root (17) password.
  3. Eventually update the ports used by the server. If you do not know which ports to use, we strongly suggest to keep the one defined there.

Load the MySQL Docker using one of the following command (on Linux):

docker load -i $WORKING_DIR/02_container/mysql/mysql:8.0.16.tar.gz

or

docker pull mysql:8.0.16

Load the Adminer Docker one of the following command (on Linux):

docker load -i $WORKING_DIR/02_container/mysql/adminer:4.7.17.tar.gz

or

docker pull adminer:4.7.17

Start the Docker using the command

cd $WORKING_DIR/02_container/mysql \
  && docker-compose up -d

Please, note that Adminer is not necessary to build the database and is only provided in order to allow the use and managment of the database through an user-friendly web interface. Then, more recent version of Adminer or other services (such as phpMyAdmin) may be used instead. You may also wish to run solely the MySQL Docker, without Adminer.

Source code docker image

To run the source code, we advice to use the Singularity image, which do not require the root privileges to be used. Nevertheless, if you are willing to use Docker instead of Singularity, archives necessary to do so are available. In such case, you need to load the Docker image and create a new container. Please, read the documentation associated with the containers and the Docker official documentation for more information.

Other containers

Other containers provided in this repository are not necessary to run the source code and to build the database. They have been created to allow the download of cross-references (from NCBI, Ensembl and MGI) and to perform an estimate of the proportion of short Open Reading Frames that are spliced in the each data source. Please, read their documentation for more information.

Dependencies

If you do not wish to use Docker and Singularity images, then you need to ensure the following dependencies are successfully installed on your system:

  • Python 2.7, with packages:
    • SQLAlchemy
    • Pandas
    • PyEnsembl
    • PyBiomart
    • PyLiftOver
    • wget
    • statistics
    • BioPython
    • mysql-connector-python
    • pathos
  • R, with packages:
    • getopt
    • devtools
    • Bioconductor: ensembldb, AnnotationHub
  • MySQL
  • SQLite3
  • MUSCLE (Multiple sequence alignment software)
  • UCSC utils
    • fetchChromSizes
    • bedToBigBed

Please note that we highly recommend to use the Docker and Singularity images we provide in order to ensure the reproducibility of the results. Please see the official documentation of these sotwares and packages for more information regarding their installation.

Build the database - Quick start

An extensive documentation providing information about the tools provided by the MetamORF builder are available in the manual. This section provides the most important and some complementary information to build the MetamORF database. The source code needs to be run once for each species and will result in the creation of two databases (DS and PRO) for each.

Prepare the configuration file

The config file needs to be edited manually. Example of config files are available in the 04_config folder.

The following lines need to be checked / updated:

  • In the DATABASE section:

    • The name of the DS database: DS_DATABASE_NAME.
    • The name of the PRO database: PRO_DATABASE_NAME.
    • The name of the species: DATABASE_SPECIES (Hsapiens and Mmusculus allowed).
    • The IP and the port of the MySQL host: DATABASE_HOST_IP and DATABASE_PORT (NB: SQLite database may also be used, see the manual for more information).
    • The database username and password: DATABASE_USER_NAME and DATABASE_USER_PASSWD.
  • In the GENE_LIST and DATASOURCE sections, all the occurrences of $WORKING_DIRmust be replaced by its actual value (i.e. by the absolute path to the working directory).

Numerous options may be set in the config file. Please read the manual for more information regarding this topic. If you do not know how to configure the config file, we advice to use this default setting.

Prepare the running scripts

The MetamORF H.sapiens and M.musculus databases have been build by sequentially running several strategies. These strategies have been run in the following order:

  • DatabaseCheck
  • Insertion
  • LiftOver
  • Merge
  • ComputeMissingInfo
  • ComputeRelCoord
  • ComputeKozakContext
  • AnnotateORF
  • GenerateBEDContent

If you intend to run the strategies in a different order or to run other strategies, update the 03_workflow/datafreeze/full_build.sh or create a new bash script in this folder. The 03_workflow/datafreeze/model.sh may be used as a template to create a new script. Several other scripts are available in this folder if you are willing to run a particular strategy.

To get more information about the strategies available, their execution, the options that may be used, input and outputs, please read the manual available at pdf and html formats in 01_documentation/manual. If you do not know which strategies to run, we advice to use the full_build.sh script.

Build the database

To build the database, move to the working directory and start the running script previously created.

You may used one of the following command line (on Linux):

singularity exec 02_container/src/tagc-uorf-orf-datafreeze-src.img \
  03_workflow/datafreeze/full_build_min.sh \
  --config $configfileName \
  > 09_log/MetamORF.log

or

singularity exec 02_container/src/tagc-uorf-orf-datafreeze-src.img \
  03_workflow/datafreeze/full_build.sh \
  --config $configfileName \
  --dbtype MySQL \
  --dsdbname $DS_DB \
  --prodbname $PRO_DB \
  --dbhost $DB_HOST \
  --dbport $DB_PORT \
  --dbuser $DB_USER \
  --dbpassword $DB_PASSWD
  > 09_log/MetamORF.log

The variable $configfileName needs to be replaced by the name of the config file.

The variables $DS_DB, $PRO_DB, $DB_HOST, $DB_PORT, $DB_PORT, $DB_USER and $DB_PASSWD need to be replaced by the appropriate values. Note that the options to provide may depend on the strategy. This second command line allows to add information about the release in the Metadata table of the databases.

The same procedure should be followed for both H.sapiens and M.musculus, using the appropriate configuration files.

Additional information

Documentation

An user manual is available at the PDF and HTML format in 01_documentation/manual and provides extensive information about the methods.

A documentation dedicated to developers is available in the 01_documentation/src/html folder (generated with Doxygen). You need to open the index.html file with a web browser to display and navigate through this documentation.

Dates of download

Data sources

Name of the data source Species Date of download
Erhard2018 H.sapiens 04/01/2019
Johnstone2016 H.sapiens 04/01/2019
Johnstone2016 M.musculus 04/01/2019
Laumont2016 H.sapiens 04/01/2019
Mackowiak2015 H.sapiens 20/03/2019
Mackowiak2015 M.musculus 20/03/2019
Samandi2017 H.sapiens 04/01/2019
Samandi2017 M.musculus 04/01/2019
sORFs_org_Human H.sapiens 08/06/2020
sORFs_org_Mouse M.musculus 08/06/2020

Cross-references

Name of the cross-reference Species Date of download
HGNC H.sapiens 27/06/2019
NCBI M.musculus 06/06/2020

Tree view

At the end of the procedure, i.e. after having successfully:

  • Clone the GitHub repository
  • Download the data sources and cross-references
  • Download the Dockers and Singularity images
  • Install Docker, Docker-compose and Singularity
  • Load the Docker images on your system and start the containers
  • Run the source code to build the databases for one species (either for H.sapiens or M.musculus)

the tree folder should look like the following one:

.
β”‚
β”œβ”€β”€ 01_documentation                            [Documentation of the project]
β”‚Β Β  β”œβ”€β”€ manual                                  [User's manual]
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ manual.html
β”‚Β Β  β”‚Β Β  └── manual.pdf
β”‚Β Β  β”œβ”€β”€ metadata                                [Metadata of the ORF datafreeze]
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ DCMI_schema.xsd
β”‚Β Β  β”‚Β Β  └── metadata.xml
β”‚Β Β  β”œβ”€β”€ src                                     [Source code documentation, generated with doxygen in HTML format]
β”‚Β Β  β”‚Β Β  └── html
β”‚Β Β  └── workbook                                [Useful information regarding the source code]
β”‚Β Β      β”œβ”€β”€ appendices                          [Workbook appendices helping dev and use of the source code]
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ cell_contexts                   [Information regarding the cell contexts registered in the database]
β”‚Β Β      β”‚Β Β  β”‚Β Β  └── cell_context.csv
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ db_schema                       [Database UML schemas]
β”‚Β Β      β”‚Β Β  β”‚Β Β  └── database_schema.pdf
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ log_codes                       [List of error and warning codes that could be logged by the source code]
β”‚Β Β      β”‚Β Β  β”‚Β Β  └── LogCodes.ods
β”‚Β Β      β”‚Β Β  └── miscellaneous                   [Miscellaneous information susceptible to help the users and developers]
β”‚Β Β      β”‚Β Β      └── Ensembl_biotypes.csv
β”‚Β Β      β”œβ”€β”€ computation_times.csv               [Information regarding the expected time of execution of the source code]
β”‚Β Β      └── datafreeze_workflow.png             [Datafreeze adviced workflow at png format]
β”œβ”€β”€ 02_container                                [Dockerfiles and singularity images]
β”‚Β Β  β”œβ”€β”€ mysql                                   [Docker compose to start mysql and adminer servers]
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ adminer:4.7.1.tar.gz
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ docker-compose.yml
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ mysql:8.0.16.tar.gz
β”‚Β Β  β”‚Β Β  └── readme.txt
β”‚Β Β  β”œβ”€β”€ script                                  [Containers for scripts]
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 01_crossreferences_download         [Containers for 01_crossreferences_download scripts]
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ dockerfile
β”‚Β Β  β”‚Β Β  β”‚Β Β  β”œβ”€β”€ readme.txt
β”‚Β Β  β”‚Β Β  β”‚Β Β  └── tagc-uorf-orf-datafreeze-script-cross_ref_dl.tar.gz
β”‚Β Β  β”‚Β Β  └── 03_orf_datasources_analysis         [Containers for 03_orf_datasources_analysis scripts]
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ dockerfile
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ readme.txt
β”‚Β Β  β”‚Β Β      └── tagc-uorf-orf-datafreeze-script-orf_ds_analysis.tar.gz
β”‚Β Β  └── src                                     [Container to run the source code]
β”‚Β Β      β”œβ”€β”€ dockerfile
β”‚Β Β      β”œβ”€β”€ readme.txt
β”‚Β Β      β”œβ”€β”€ tagc-uorf-orf-datafreeze-src.img
β”‚Β Β      └── tagc-uorf-orf-datafreeze-src.tar.gz
β”œβ”€β”€ 03_workflow                                 [Scripts and files necessary to run the source code]
β”‚Β Β  └── datafreeze                              [Scripts allowing to build the datafreeze]
β”‚Β Β      β”œβ”€β”€ declare_variables.sh                [Script allowing to declare environment variables necessary to run the source code]
β”‚Β Β      β”œβ”€β”€ model.sh                            [Model of script to use to run one or several strategies]
β”‚Β Β      β”œβ”€β”€ AnnotateORF.sh
β”‚Β Β      β”œβ”€β”€ AssessDatabaseContent.sh
β”‚Β Β      β”œβ”€β”€ ComputeKozakContext.sh
β”‚Β Β      β”œβ”€β”€ ComputeMissingInfo.sh
β”‚Β Β      β”œβ”€β”€ ComputeRelCoord.sh
β”‚Β Β      β”œβ”€β”€ DatabaseCheck.sh
β”‚Β Β      β”œβ”€β”€ full_build_min.sh
β”‚Β Β      β”œβ”€β”€ full_build.sh
β”‚Β Β      β”œβ”€β”€ GenerateBEDContent.sh
β”‚Β Β      β”œβ”€β”€ GenerateBEDFile.sh
β”‚Β Β      β”œβ”€β”€ GenerateFastaFile.sh
β”‚Β Β      β”œβ”€β”€ GenerateGFFFile.sh
β”‚Β Β      β”œβ”€β”€ GenerateStatFiles.sh
β”‚Β Β      β”œβ”€β”€ GenerateTrackDbFile.sh
β”‚Β Β      β”œβ”€β”€ Insertion.sh
β”‚Β Β      β”œβ”€β”€ LiftOver.sh
β”‚Β Β      β”œβ”€β”€ Merge.sh
β”‚Β Β      └── ResumeMerge.sh
β”œβ”€β”€ 04_config                                   [Config files necessary to run the source code]
β”‚Β Β  β”œβ”€β”€ HsapiensConfigfile
β”‚Β Β  └── MmusculusConfigfile
β”œβ”€β”€ 05_script                                   [Scripts for pre-processing analysis]
β”‚Β Β  β”œβ”€β”€ 01_crossreferences_download             [Scripts allowing to download the cross-references]
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ DefaultOutputFolder.txt
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ download.sh
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ensembl_gene_lists.R
β”‚Β Β  β”‚Β Β  └── readme.txt
β”‚Β Β  β”œβ”€β”€ 02_orf_datasources_download             [Scripts allowing to download the ORF datasources - Empty]
β”‚Β Β  └── 03_orf_datasources_analysis             [Scripts allowing a preliminary analysis of ORF datasources]
β”‚Β Β      β”œβ”€β”€ estimate_splicing
β”‚Β Β      └── readme.txt
β”œβ”€β”€ 06_src                                      [See doxygen documentation for full documentation of this folder content]
β”œβ”€β”€ 07_input                                    [Input files]    
β”‚   β”œβ”€β”€ cross_references                        [Cross references]
β”‚   β”‚Β Β  β”œβ”€β”€ hsapiens_HGNC.txt
β”‚   β”‚Β Β  └── mmusculus.gene_info
β”‚   └── ORF_datasources                         [ORF datasources]
β”‚    Β Β  β”œβ”€β”€ hsapiens_Erhard2018.csv
β”‚    Β Β  β”œβ”€β”€ hsapiens_Johnstone2016.txt
β”‚    Β Β  β”œβ”€β”€ hsapiens_Laumont2016.csv
β”‚    Β Β  β”œβ”€β”€ hsapiens_Mackowiak2015.txt
β”‚    Β Β  β”œβ”€β”€ hsapiens_Samandi2017.tsv
β”‚    Β Β  β”œβ”€β”€ hsapiens_sORFs.org.txt
β”‚    Β Β  β”œβ”€β”€ mmusculus_Johnstone2016.txt
β”‚    Β Β  β”œβ”€β”€ mmusculus_Mackowiak2015.txt
β”‚    Β Β  β”œβ”€β”€ mmusculus_Samandi2017.tsv
β”‚    Β Β  └── mmusculus_sORFs.org.txt
β”œβ”€β”€ 08_output                                   [Output files]
β”‚   └── datafreeze
β”‚       β”œβ”€β”€ execution.log 
β”‚       β”œβ”€β”€ generefwarnings.log
β”‚       β”œβ”€β”€ merged_data_analysis
β”‚       β”‚Β Β  β”œβ”€β”€ dsota_count_for_orf_tr.csv
β”‚       β”‚Β Β  └── orf_tr_count_for_dsota.csv
β”‚       β”œβ”€β”€ content_consistency_assessment
β”‚       β”‚Β Β  β”œβ”€β”€ DSDatabaseAssessment.tsv
β”‚       β”‚Β Β  └── PRODatabaseAssessment.tsv
β”‚       β”œβ”€β”€ bed_files
β”‚       β”‚Β Β  β”œβ”€β”€ MetamORF_Hsapiens.bed
β”‚       β”‚Β Β  └── MetamORF_Hsapiens_without_scaffold.bed
β”‚       β”œβ”€β”€ fasta_files
β”‚       β”‚Β Β  β”œβ”€β”€ MetamORF_Hsapiens_aa.fasta
β”‚       β”‚Β Β  β”œβ”€β”€ MetamORF_Hsapiens_aa_wo_seq_with_stop.fasta
β”‚       β”‚Β Β  β”œβ”€β”€ MetamORF_Hsapiens_nt.fasta
β”‚       β”‚Β Β  └── MetamORF_Hsapiens_nt_wo_seq_with_stop.fasta
β”‚       β”œβ”€β”€ track_files
β”‚       β”‚   β”œβ”€β”€ hg38.chrom.sizes
β”‚       β”‚   β”œβ”€β”€ MetamORF.as
β”‚       β”‚   β”œβ”€β”€ MetamORF.bb
β”‚       β”‚   β”œβ”€β”€ MetamORF.bed
β”‚       β”‚   └── trackDb.txt
β”‚       └── stat_files
β”‚       Β Β  β”œβ”€β”€ log_code_counts.csv
β”‚       Β Β  └── log_level_counts.csv
└── README.md

NB: Descriptions of the file / directory and additional information are provided between [brackets].

Releases

No releases published

Packages

No packages published