Below is a list of recent, public projects or talks I have been a part of:
-
Genomics of Drug Sensitivity in Cancer [notebook] - Summary of results for predictive models applied to genomic data as a means of determining patient sensitivity to a large number of theraputic cancer drugs. The primary challenge in this research was not only to predict sensitivity well but to do so with relatively concise explanations for the rationale behind those predictions. In this case, this was acheived through the use of a Bayesian Transfer Learning model (built using Tensorflow and Edward).
-
Deep Learning [slides | notebook] - A Charleston Data Analytics talk covering neural networks, theory on how depth in networks affect expressiveness, gradient descent and Tensorflow. Also includes examples of how to build and apply custom Tensorflow models to clinical research data (Alzheimer's Disease in this case).
-
Bayesian Analysis [slides | notebook] - Covers several ideas in Bayesian Modeling and Reasoning like:
- Bayesian ranking and modeling approaches for smaller data sets (examples in Python and Stan)
- Hierarchical Maximum Likelihood modeling within the context of forecasting crime rates for various Carribean countries
- Creating a paint-by-numbers from a digital image through the use of nonparametric, Bayesian clustering algorithms (i.e. Dirichlet Process)
-
Predicting Sales Through Music Anatomy [project] - Analyzing the relationship between iTunes sales and traits of music like tempo, loudness, danceability, acousticness, and more (Forbes.com Article).
-
HBlocks (Java, Pig, Oozie, Bash, HDFS, MySQL) - White paper on production storage system at Next Big Sound that spans multiple Hadoop subsystems to create a large scale (many terabyte) data revision control platform. No code uploaded yet, just the paper for now.
-
High Performance Transformations for 10M+ Record Impala Result Sets (R) - data.table optimizations applied to common transformations on large data frames in R. This was helpful at Next Big Sound for processing huge data frames when base R or plyr functions wouldn't cut it.
-
Next Big Sound Chart Calculator (Pig) - Computes a list of artists most likely to appear on the Billboard 200 using likelihoods produced by a particular supervised learning technique and stored in HBase, semi-structured event data from MongoDB, and artist meta data from MySQL.
-
HDFS Disaster Recovery (Bash) - Shell script used to backup critical HDFS paths into rolling "archive" directories for offsite delivery or immediate DR.