Skip to content

A collection of tools to download, parse, and standardize sequence metadata from NCBI databases.

Notifications You must be signed in to change notification settings

Remimstr/Standardize_Metadata

Repository files navigation

Standardize_Metadata

A collection of tools to download, parse, and standardize sequence metadata from NCBI databases.
Written by Remi Marchand between May 13, 2016 and August 26, 2016.

Scope

This collection of tools, by default, manipulates data from the Sequence Read Archive (SRA) database.
The database can be found here: http://www.ncbi.nlm.nih.gov/sra

Usage

metadata.py

Main program that queries and downloads xml files based on organism name and date.
Usage: metadata.py options (run python metadata.py -h to see options)
Download in Bulk: bash download.sh organism start_date end_date

Standard_Tools/standardize.py

Main program that standardizes relevant columns from input csv files.
Usage: standardize.py csv_files

Installation

You may need to install the following modules

Add to the python path

  • If on a Mac: export PYTHONPATH="${PYTHONPATH}:Path_to_Standardize_Metadata"

About

A collection of tools to download, parse, and standardize sequence metadata from NCBI databases.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published