Skip to content

dbvirus/dbvirus

Repository files navigation

DB Virus

Maintainability Test Coverage

Pipeline for transcriptomic data analysis with the goal of identifying novel virus sequences in publicly available data.

Architecture

The pipeline is comprised of three main steps:

  1. Searching SRA for a given query and downloading the results metadata
  2. Downloading the RNA sequences found by step 1
  3. Analyzing the sequences acquired by step 2

From a system design point of view, it makes sense to implement each step as different module that shares a mutual understanding. The first step promises the second one that it will store the data in a given way and step two promises step three that it will download and store the RNA sequences in another given way. In Computer Science parlance, this structure is usually called a 'micro services oriented architecture', or simply 'microsservices'. This is in direct oposition to writing a monolith, a single – and probably quite big – code that handles all the tasks.

Each step will be implemented as separate Python code.

About

Pipeline for transcriptomic data analysis with the goal of identifying virus sequence.

Resources

License

Stars

Watchers

Forks

Packages

No packages published