scRNA

Python framework for single-cell RNA-seq clustering with special focus on transfer learning. This package contains methods for generating artificial data, clustering, and transfering knowledge from a source to a target datasets.

This software was written by Nico Goernitz, Bettina Mieth, Marina Vidovic, Alex Gutteridge.

News

(3.4.17) Added Travis-CI
(3.4.17) Added string label support
Simple example available
Website is up and running
Wiki with detailed information (e.g. command line arguments)
Please report Bugs or other inconveniences
scRNA can now be conveniently installed using the pip install git+https://github.com/nicococo/scRNA.git command (see Installation for further information)
Command line script available

Getting started

Installation

We assume that Python >2.7 is installed and the pip command is callable from the command line. If starting from scratch, we recommend installing the Anaconda open data science platform (w/ Python 2.7) which comes with a bunch of most useful packages for scientific computing.

The scRNA software package can be installed using the pip install git+https://github.com/nicococo/scRNA.git command. After successful completion, three command line arguments will be available for MacOS and Linux only:

scRNA-generate-data.sh
scRNA-source.sh
scRNA-target.sh

Example

Step 1: Installation with pip install git+https://github.com/nicococo/scRNA.git

Step 2: Check the scripts

Step 3: Create directory /foo. Go to directory /foo. Generate some artificial data by simply calling the scRNA-generate-data.sh (using only default parameters).

This will result in a number of files:

Gene ids
Source- and target data
Source- and target ground truth labels

Step 4: NMF of source data using the provided gene ids and source data. Ie. we want to turn off the cell- and gene-filter as well as the log transformation. You can provide source labels to be used as a starting point for NMF. If not those labels will be generated via NMF Clustering. Potential problems:

If a ''Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.'' occurs and Anaconda open data science platform is used, then use conda install mkl first.
Depending on the data and cluster range, this step can take time. However, you can speed up the process by tuning off the t-SNE plots using the --no-tsne command (see Wiki for further information)

This will result in a number of files:

t-SNE plots (.png) for every number of cluster as specified in the --cluster-range argument (default 6,7,8)
Output source model in .npz format for every number of cluster as specified in the --cluster-range argument (default 6,7,8)
A summarizing .png figure
True cluster labels - either as provided from user or as generated via NMF Clustering - (and corresponding cell id) in .tsv format for every number of cluster as specified in the --cluster-range argument (default 6,7,8)
Model cluster labels after NMF (and corresponding cell id) in .tsv format for every number of cluster as specified in the --cluster-range argument (default 6,7,8)

Step 5: Now, it is time to cluster the target data and transfer knowledge from the source model to our target data. Therefore, we need to choose a source data model which was generated in Step 4. In this example, we will pick the model with 8 cluster (src_c8.npz).

Depending on the data, the cluster range and the mixture range, this step can take a long time. However, you can speed up the process by tuning off the t-SNE plots using the --no-tsne command (see Wiki for further information)

Which results in a number of files (for each value in the cluster range).

Predicted cluster labels after transfer learning (and corresponding cell id) in .tsv format for every number of cluster as specified in the --cluster-range argument (default 6,7,8)
t-SNE plots with predicted labels (.png)
Data and gene ids in .tsv files

In addition there is a summarizing .png figure of all accs and a t-SNE plot with the real target labels, if they were provided.

Command line output shows a number of results: unsupervised and supervised (if no ground truth labels are given this will remain 0.) accuracy measures.

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
.idea		.idea
R		R
bin		bin
doc		doc
notebook		notebook
scRNA		scRNA
scripts		scripts
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
_config.yml		_config.yml
requirements.txt		requirements.txt
setup.py		setup.py

License

tuqiang2014/scRNA

Folders and files

Latest commit

History

Repository files navigation

scRNA

News

Getting started

Installation

Example

About

Resources

License

Stars

Watchers

Forks

Languages