Skip to content

xubo245/spark_eqtl

 
 

Repository files navigation

Spark eQTL

This code enables eQTL analysis in Apache Spark, using Spark's python API and has been tested with Spark 1.3.1 and 1.4.0.

Klick here for the correspoding report, explaining motivation, design and outline of the algorithm.

Requirements:

  • Spark and Python installation
  • scipy.stats

Quick start guide:

  1. Start a spark master and submit some workers
  2. Set up your Spark context within a python shell (see spark_context.py for an example, no of cores and amount of memory is defined here.)
  3. Define paths to your data inside the trans_analysis.py
  4. Logging behavious is defined in your spark directory (very verbose by default).
  5. Within the python shell call trans_analysis.py. Command line arguments define the output name and chromosome to be analyzed. E.g.:: $run trans_analyis.py 'full_analysis_chrom_1' 'chr1'

An shell script that automates all these task and analyzes the whole genome is: run_full_example.sh

About

eQTL analysis in Apache Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Shell 1.0%