GitHub - rna-seq/raisin.recipe.extract: A Buildout recipe for loading data into the Raisin data warehouse

rna-seq / raisin.recipe.extract Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A Buildout recipe for loading data into the Raisin data warehouse

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
raisin		raisin
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGES.txt		CHANGES.txt
LICENSE.txt		LICENSE.txt
README.txt		README.txt
setup.py		setup.py
version.txt		version.txt

Repository files navigation

====================
raisin.recipe.extract
====================
-----------------------------------------
Extract Raisin data warehouse information
-----------------------------------------

**raisin.recipe.extract** extracts information for the Raisin data warehouse

Background
==========

The Raisin project builds a data warehouse for Grape (Grape RNA-Seq Analysis Pipeline
Environment). Grape is a pipeline for processing and analyzing RNA-Seq data developed at 
the Bioinformatics and Genomics unit of the Centre for Genomic Regulation (CRG) in 
Barcelona. 

Important Note
==============

The raisin.recipe.extract package is a Buildout recipe used by Grape, and is not
a standalone Python package. It is only going to be useful as installed by the 
grape.buildout package.

To learn more about Grape, and to download and install it, go to the Bioinformatics 
and Genomics website at:

http://big.crg.cat/services/grape

Motivation
==========

The Raisin data warehouse is used to configure the Raisin web server. It is also very
useful for any projects that access to meta data concerning the RNA-Seq pipelines.
 
Here at the CRG, we configure all our RNASeq pipeline runs in a central place
before running the Grape pipelines. A data warehouse with all the meta data provides
a great overview of all RNA-Seq projects.

Installation
============

The grape.recipe.extract package is already installed by grape.buildout, so
you don't have to do this. 

Configuration
=============

The buildout part that configures the raisin.recipe.extract needs to know about
the location of a few folders

[extract]
recipe = raisin.recipe.extract
workspace = ${buildout:directory}/etl/workspace
annotations_file = ../../annotations/db.cfg
genomes_file = ../../genomes/db.cfg
pipeline_dumps = ../../workflows/pipelines/rnaseq_pipeline_common_dump/output/dump

The workspace is the place where the raisin.recipe.extract will put the extracted
information.

In order to enrich the annotation and genome information, a central file can be read
containing more meta data.

The pipeline dumps are MySQL database dumps that can be optionally extracted. If they
exist, they can give additional information on what the current state of the database
is.