RNA-sequence-tools

Tools for RNA seq and gene annotation

Note: Most of the single cell analysis has migrated to scicast: https://github.com/iandriver/scicast and is no longer maintained here.

Tophat Cluster submission contains scripts for processing raw RNA-seq files for submission to a linux cluster (qb3 at UCSF specifically). Additionally it contains some scripts for testing command formatting and managing files on a cluster.

FPKM parsing contains scripts for turning multiple tophat/cufflinks output files into FPKM matrices suitable for analysis in Fluidigm Singular R package.

RNA Seq analysis contains scripts for clustering and pca analysis of RNA-seq data

Gene Ontology contains files for fetching and organizing Entrez Gene ontology information from lists of genes

Sample FPKM Work Flow:

Use tophat_qsub.py to submit sequencing to cluster -> Output: tophat and cufflinks fpkm files
Use cuffnorm_qsub to create sample sheet and normalize sequecing reads with cuffnorm -> Output: cuffnorm gene fpkm table

Sample Count based Work Flow:

Use tophat_qsub.py to submit sequencing to cluster -> Output: tophat and cufflinks fpkm files
Use sort_htseq_count.py to clean up accepted hits and generate htseq counts and picard metric information (3' to 5' bias, CG, etc.)
Use R-scripts: DESeq or edgeR to process raw counts.

Data Analysis Tools:

Use filter_outliers to filter on mapping rates,genes expressed, or other rule based metrics. -> outlier_filtered matrix
Run cluster.py to do a broad unbiased clustering and subclustering search. Produces hierarchical clustering and pca and correlation groups for all cells and cell subgroups (down to a defined threshold for minimal number of cells in a subgroup)
3)Use cluster1.py for a more targeted search uses selected cells or gene files to establish starting point and does targeted significance searching based a single gene. 4)corr_search.py allows searching for correlation with any one gene for a whole matrix. 5)make_monocle_data.py only works with cuffnorm output (per monocles requirements: http://cole-trapnell-lab.github.io/monocle-release/) and produces the 3 files necassary for running monocle (gene and cell feature sheets). It requires gene_lookup.py, which fetches GO terms to populate the gene feature sheet. Also required is a table with fluidigm capture data (single clean cell capture or not.) Terms can be added to classify populations by gene expression.
run_monocle.R runs takes the make_monocle_data.py output and runs through a basic monocle workflow (making spanning tree and plotting genes in pseudotime) and then finds the clustergroups and plots representative groups in psuedotime. 7)make_scLVM.R is a script to do the ERCC normalization and cell cycle analysis using the scLVM package (https://github.com/PMBio/scLVM).

Name		Name	Last commit message	Last commit date
Latest commit History 268 Commits
Count_Parsing		Count_Parsing
FPKM_Parsing		FPKM_Parsing
Gene_Ontology		Gene_Ontology
RNA_Seq_analysis		RNA_Seq_analysis
R_scripts		R_scripts
Tophat_Cluster_submission		Tophat_Cluster_submission
.gitignore		.gitignore
GSE_raw_to_matrix.py		GSE_raw_to_matrix.py
Gene		Gene
LICENSE		LICENSE
PCA_variability_explained.py		PCA_variability_explained.py
README.md		README.md
ballgown_make_sample_list.py		ballgown_make_sample_list.py
boxplot.py		boxplot.py
create_monocle_from_non_emsembl.R		create_monocle_from_non_emsembl.R
esmbl_transcript_to_gene.py		esmbl_transcript_to_gene.py
file_mover3.py		file_mover3.py
file_mover4.py		file_mover4.py
file_rename.py		file_rename.py
file_rename_barcodes_v2.py		file_rename_barcodes_v2.py
hd5_subsample.py		hd5_subsample.py
hg19_bt2_build.py		hg19_bt2_build.py
hisat_build.py		hisat_build.py
make_geo_list.py		make_geo_list.py
make_geo_list2.py		make_geo_list2.py
make_refflat.sh		make_refflat.sh
merge_GSE_files.py		merge_GSE_files.py
monocle2.R		monocle2.R
monocle_from_cuffnorm_mm10.R		monocle_from_cuffnorm_mm10.R
picard_sort_insertmetrics.py		picard_sort_insertmetrics.py
prepDE.py		prepDE.py
prepDE_multifile.py		prepDE_multifile.py
qpcr.py		qpcr.py
qpcr3.py		qpcr3.py
qpcr_hu.py		qpcr_hu.py
rsem_build.py		rsem_build.py
rsem_matrix_call.py		rsem_matrix_call.py
rsem_to_matrix.py		rsem_to_matrix.py
sra_fastq_dump.py		sra_fastq_dump.py
test_count_matrix.py		test_count_matrix.py

License

iandriver/RNA-sequence-tools

Folders and files

Latest commit

History

Repository files navigation

RNA-sequence-tools

About

Resources

License

Stars

Watchers

Forks

Languages