Guiding longitudinal sampling in inflammatory bowel diseases cohorts

Supplementary Materials

This repository includes all the source code, tests and notebooks to generate the figures used in the (Vazquez-Baeza et al. 2017).

Notes to the reader

While we do not provide all the data in this repository we provide the BIOM tables, sample information, alpha and beta diversity tables. The raw and quality controlled sequences from which the BIOM table originates can be found in Qiita study 2538 (remember to login).

Notebooks

This is a brief description of computations contained in each notebook. The first three sections depend on QIIME 1.9.1 and Python 2, the fourth section depends on the scientific python stack and Python 3. Environment files used to create these can be found in the env-files directory.

Note, for the QIIME 1.9.1 environment, you'll also need to install from source the diptest package, these notebooks used the repository at SHA-1 bf005a8662d6e866842d5c0f387a011f773c5b04.

Setup

In the notebook 01.1-setup.ipynb, we remove blank samples, add alpha and beta diversity and add some of this information to the metadata so it can be used in other notebooks. While we include the tabular files with the collated alpha diversity, intermediate files and plots are not included.

Alpha

In the notebook 02.1-alpha-diversity.ipynb, we calculate a few measures of alpha diversity, and compare distributions by diagnosis of IBD and whether or not the subjects underwent surgery.

Beta

There are three steps to the use of beta diversity, first in 03.1-beta-diversity-stats.ipynb we compare the groups using PERMANOVA and ANOSIM, then in 03.2-beta-diversity-distributions.ipynb we calculate the beta-diversity stability over time, and finally in 03.3-beta-diversity-regressions.ipynb we compare the microbial stability to the microbial dysbiosis index.

Classification

This section is the most computationally expensive. While prototyping, the tests were executed through the Jupyter notebook interface, however to test the pipeline with a reasonable number of iterations, we ran the comparisons using a dedicated compute cluster using the script provided here.

In 04.1-classification.ipynb and 04.2-classification-jansson.ipynb, we compare how good of a classification can you achieve depending on the number of samples used per subject. The main difference between these two noebooks is the data they use.

The ROC curves used in the paper are also included here, see notebooks/roc-curves/gg/, while we expect to see some variation from re-running this, we've observed that the same trends hold (as reported in the paper).

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
env-files		env-files
img		img
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Supplemental-Materials.ipynb		Supplemental-Materials.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

env-files

env-files

img

img

notebooks

notebooks

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Supplemental-Materials.ipynb

Supplemental-Materials.ipynb

Repository files navigation

Guiding longitudinal sampling in inflammatory bowel diseases cohorts

Supplementary Materials

Notes to the reader

Notebooks

Setup

Alpha

Beta

Classification

About

Releases

Packages

Languages

License

randomeffect/longitudinal-ibd

Folders and files

Latest commit

History

Repository files navigation

Guiding longitudinal sampling in inflammatory bowel diseases cohorts

Notes to the reader

Notebooks

Setup

Alpha

Beta

Classification

About

Resources

License

Stars

Watchers

Forks

Languages