Overview

This repository holds directories of config files for published studies that can be processed locally with hubward.

The goal is for users to mix-and-match (and contribute!) studies from this repository to build their own custom track hubs, enabling easy visualization of published data alongside their own data.

The manifest lists what tracks are currently available in the repo.

Getting started

This first section will get you set up and running a small test data set to confirm that everything is set up correctly.

Install required software. It's all available in bioconda. There are two versions: just the Python package, or Python package plus all outside dependencies like UCSC utilities, BEDTools, and more. If you're unsure, choose the second option:

# option 1: just the Python package
conda install -c bioconda hubward

# option 2: Python package plus extras
conda install -c bioconda hubward-all

Clone the github repo:

git clone https://github.com/daler/hubward-studies.git

Go to that directory, and install any additional requirements:

conda install -c r -c bioconda --file requirements.txt

and process the test studies:

cd hubward-studies
hubward process test/lieberman-2009 test/yip-2012

You should get a lot of log output something like this:

[2015-12-03 14:03:54,666] hubward INFO Study: lieberman-2009, in "test/lieberman-2009"
[2015-12-03 14:03:54,667] hubward INFO     test/lieberman-2009/processed-data/chr11_5200000_5299999.bigwig does not exist
[2015-12-03 14:03:56,454] hubward INFO     test/lieberman-2009/processed-data/chr11_5200000_5299999.bigwig is up to date
[2015-12-03 14:03:56,455] hubward INFO     test/lieberman-2009/processed-data/chr11_5200000_5299999.bigbed does not exist
[2015-12-03 14:03:57,713] hubward INFO     test/lieberman-2009/processed-data/chr11_5200000_5299999.bigbed is up to date
[2015-12-03 14:03:57,760] hubward INFO Study: yip-2012, in "test/yip-2012"
[2015-12-03 14:03:57,761] hubward INFO     test/yip-2012/processed-data/drm_k562_enhancers.bigbed does not exist
pass1 - making usageList (23 chroms): 2 millis
pass2 - checking and writing primary data (12046 records, 4 fields): 14 millis
[2015-12-03 14:03:57,816] hubward INFO     test/yip-2012/processed-data/drm_k562_enhancers.bigbed is up to date
[2015-12-03 14:03:57,816] hubward INFO     test/yip-2012/processed-data/drm_k562_weak-enhancers.bigbed does not exist
pass1 - making usageList (23 chroms): 2 millis
pass2 - checking and writing primary data (10787 records, 4 fields): 13 millis
[2015-12-03 14:03:57,872] hubward INFO     test/yip-2012/processed-data/drm_k562_weak-enhancers.bigbed is up to date

If you get errors saying that a program can't be found (like fetchChromSizes or bedToBigBed), then try installing all the external dependencies as in option 2 above.

The Lieberman et al data are in hg18 coordinates, so lift them over:

hubward liftover \
    --from_assembly hg18 \
    --to_assembly hg19 \
    test/lieberman-2009 \
    lieberman-2009-hg19

You can now use the processed data for downstream analysis, or if you have a server available you can upload the data to a trackhub to visualize in the UCSC Genome Browser. To do so, edit the example-group.yaml file to reflect your server information. The studies section is already filled in to include the two test studies. When you're ready, upload:

hubward upload example-group.yaml

Preparing all configured studies

Run the preprocessing script metaprocess.py. This will:

look for all directories containing a metadata.yaml file
exclude those that have been lifted over so that only the original studies are documented
create a manifest.rst file
add the directory, study description, and configured tracks to the manifest.rst file as documentation
add a line to to_process.sh for each original study
add a final line to to_process.sh that will perform the liftovers as needed.

Then edit the liftovers.tsv file to indicate which studies should be lifted over. Note that this file is processed sequentially, so studies that need to be incrementally lifted over are handled properly.

Then run the to_process.sh script to process and update all studies as needed.

Finally, re-run metaprocess.py to get an updated list of which studies to upload. If a study was lifted over to another assembly, only the lifted-over version will be included.

A summary:

# initial run to find all configured studies
python metaprocess.py

# then edit liftovers.tsv to add any liftovers

# process all configured studies and then perform liftovers. May take a while.
bash to_process.sh

# re-run to get an updated `to_upload.txt`
python metaprocess.py

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
alvarez-dominguez-2014		alvarez-dominguez-2014
dowen-2014-cohesin-chiapet		dowen-2014-cohesin-chiapet
filion-2010		filion-2010
fuda-2015		fuda-2015
hou-2012		hou-2012
jain-2015		jain-2015
moshkovich-2011		moshkovich-2011
sahlen-2015-hicap		sahlen-2015-hicap
schoenfelder-2015-capture-HiC		schoenfelder-2015-capture-HiC
sexton-2012		sexton-2012
soruco-2013		soruco-2013
test		test
vanBemmel-2010		vanBemmel-2010
wood-2011		wood-2011
.gitignore		.gitignore
README.md		README.md
do_liftovers.py		do_liftovers.py
example-group.yaml		example-group.yaml
liftovers.tsv		liftovers.tsv
manifest.rst		manifest.rst
manifest_template.rst		manifest_template.rst
metaprocess.py		metaprocess.py
requirements.txt		requirements.txt
setup_env.sh		setup_env.sh

daler/hubward-studies

Folders and files

Latest commit

History

Repository files navigation

Overview

Getting started

Preparing all configured studies

About

Resources

Stars

Watchers

Forks

Languages