GitHub

Project
Data Science Specialization
Author	Expertise	Tool	Industry
Darryl Buswell	Data Applications Exploratory Analysis Machine Learning Statistical Inference	R/R-Studio Shiny	Energy Environment Health Care Healthcare Information Technology Transportation
Description
Concepts and tools needed throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. Includes: Practical application of statistical computing through reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Basic data cleaning of an 'activity recognition dataset' of 30 subjects who wore waist-mounted smartphone sensors. Includes R code to load the raw dataset and processing instructions formalized in a markdown based codebook. Exploratory analysis techniques in R for summarizing data, including how to implement multivariate statistical techniques and use plotting systems in order to summarize high-dimensional data. Use of R tools to generate data analysis in a markdown document with a focus on providing results which can be easily reproduced. R markdown code integrates live R code, knitr and related tools. Collection of R scripts which employ fundamentals of statistical inference, including broad theories such as frequentists, Bayesian, and likelihood. Regression analysis performed on a collection of cars in order to explore the relationship between car features and fuel consumption. Includes special cases of the regression model, ANOVA and ANCOVA with analysis of dummy variable, multivariable adjustment, residuals and variability. Application of machine learning algorithms (decision tree, random forest and generalized boosted regression) using R, in order to explore personal activity data and predict the manner in which individuals completed particular exercises. A simple, yet scalable, web application built using Shiny, R packages, and interactive graphics, with a focus on automating statistical inference of a dataset related to passengers onboard the Titanic.
Dataset
Air pollution monitoring data at 332 locations in the US. [link] Patient quality of care statistics for over 4,000 US hospitals from the Medicare.gov Hospital Compare service. [link] Activity recognition data set built from the recordings of 30 subjects performing basic activities and postural transitions while carrying a waist-mounted smartphone with embedded inertial sensors, from the UCI Machine Learning Repository. [link] Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years, from the UCI Machine Learning Repository. [link] Fine particulate matter (PM2.5) air pollutant data for the US for the period of 1999-2008, from the EPA National Emissions Inventory. [link] Data from a monitoring device (number of steps taken) worn by an anonymous individual worn between Oct-Nov 2014. [link] Storm Data' publication data from the National Oceanic and Atmospheric Administration (NOAA) for the period of 1950-2011. [link] The response in the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs who received one of three dose levels of vitamin C by one of two delivery methods. [link] Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). [link] Weight lifting exercise data from accelerometers on the belt, forearm, arm, and dumbbell of six participants. [link] Passenger data (age, gender, fare, cabin etc.) who were onboard the Titanic. [link]

Project
Fundamentals of Computing Specialization
Author	Expertise	Tool	Industry
Darryl Buswell	Data Applications Statistical Inference	Python	Entertainment Information Technology
Description
Introduction to Python, with a focus on mathematical and programming techniques, and mathematical tools for reasoning about the correctness and efficiency of algorithms. Includes: A number of basic interactive applications (games) built using Python, including 'Rock-Paper-Scissors-Lizard-Spock', 'Guess the Number' and 'Stopwatch: The Game', and 'Pong'. A number of basic/intermediate interactive applications (games) built using Python, including 'Memory', 'Blackjack', and 'Spacerocks'. A number of intermediate interactive applications (games) built using Python, including 'Solitaire Mancala', '2048', and 'Tic-Tac-Toe'. Algorithmic thinking to solve real-world problems, including; 1) understanding the problem; 2) formulating the problem mathematically; 3) designing an algorithm; 4) implementing the algorithm; and 5) solving the original scientific problem.
Dataset

Project
Machine Learning
Author	Expertise	Tool	Industry
Darryl Buswell	Machine Learning	Matlab/Octave	Education Environment Food, Beverages and Tobacco Housing Information Technology Manufacturing
Description
Machine learning, datamining, and statistical pattern recognition utilizing GNU Octave. Including, 1) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks); 2) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning); and 3) Best practices in machine learning (bias/variance theory and innovation process in machine learning and AI). Includes: Implementation of linear regression analysis with one variable (city population) to predict profits for a food truck which is to operate in different cities. Linear regression analysis with multiple variables (including living area size and number of bedrooms) to predict the house prices in Portland, Oregon. Implementation of logistic regression analysis to predict the chance a student would be admitted into a University based on their results from two standardized tests. Predict whether microchips from a fabrication plant would pass quality assurance standards based on results from two tests. Logistic regression analysis and feedforward propagation neural network, used to recognize images of handwritten digits (from 0 to 9). Backward propagation algorithm to learn parameters for a neural network, used to recognize images of handwritten digits (from 0 to 9). Regularized linear regression to predict the amount of water flowing out of a dam using the change of water level in reservoir. Implementation of support vector machine classifier to build an email spam filter. Implementation of K-means clustering algorithm to compress the size of an image by reducing its number of colors. Principle component analysis (PCA) to perform dimensionality reduction on a dataset of 5,000 face images. Anomaly detection algorithm, applied in order to detect failing servers on a network. Utilization of collaborative filtering in order to build a recommender system for movies.
Dataset
Food truck profit and population data for the various cities those food trucks operate. Housing data for Portland, Oregon, including house price, living area and number of bedrooms. Dataset representing 80 students who were/were not admitted into college based on results of two standardized tests. Quality assurance data for microchips from a fabrication plant. Examples of handwritten digits from the MNIST database. [link] Dataset of a dam water level through time. Collection of spam and non-spam emails from a subset of the SpamAssassin Public Corpus. [link] Image of a small bird. 5,000 face images. Server performance data, including throughput (mb/s) and latency (ms) for 307 servers. Dataset of movie ratings for 1,682 movies, ranked by 943 users on a scale from 1 to 5. [link]

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
Big Data Specialization		Big Data Specialization
Data Science Specialization		Data Science Specialization
Fundamentals of Computing Specialization		Fundamentals of Computing Specialization
Machine Learning Specialization		Machine Learning Specialization
Machine Learning		Machine Learning
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big Data Specialization

Big Data Specialization

Data Science Specialization

Data Science Specialization

Fundamentals of Computing Specialization

Fundamentals of Computing Specialization

Machine Learning Specialization

Machine Learning Specialization

Machine Learning

Machine Learning

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

Repository files navigation

About

Releases

Packages

Languages

SyedTauhidUllahShah/Coursera

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages