GitHub - jtorrente/nyc-data-analysis: Analysis of NYC subway ridership data using python

NOTE:Project reviewed according to feedback received in the first round

This repository contains my first project for Udacity's Data Science Nanodegree (Module 1): https://www.udacity.com/course/data-analyst-nanodegree--nd002. That is, the project for module "Introduction to data science", titled "Analyzing the New York Subway Dataset".

The answers to the 'short questions' can be found in the docs folder (https://github.com/jtorrente/nyc-data-analysis/tree/master/docs), along with an example of the output the program generates and large resolution version of the visualizations created. Direct link to short questions PDF file: https://github.com/jtorrente/nyc-data-analysis/blob/master/docs/Answers.pdf

The source code of the project can be found in folder https://github.com/jtorrente/nyc-data-analysis/tree/master/nycsubway/module1. The main file is called 'project1.py'. Direct link to this file: https://github.com/jtorrente/nyc-data-analysis/blob/master/nycsubway/module1/project1.py This repository also contains the code used to complete problem sets 1-4 of the course. Therefore, this code contains lots of contributions from Udacity, so I cannot be considered the sole author. This code is located in folder https://github.com/jtorrente/nyc-data-analysis/tree/master/nycsubway/module1/problemsets.

References used for this project are described in the 'short questions' file. The most relevant source for information and contents I have used is Udacity course materials. Most of the code needed to complete this project was provided by Udacity to help the student complete the different problems and exercises of the Intro to Data Science course. On top of that code base, I have produced new code and improved the existing one to complete the project.

Data files included in the data folder (https://github.com/jtorrente/nyc-data-analysis/tree/master/data) have been downloaded directly from the downloads section of the Udacity course.

Apart from Udacity’s materials, I have used additional sources to get deeper insight into Mann-Whitney’s U test, especially how effect sizes should be reported for this test. It is often argued that when reporting statistical analyses inference tests should be accompanied not only by the value of the statistic used (e.g. ‘t’ or ‘U’) and the p-value (probability of likelihood of the null hypothesis), but also by an estimator of the effect size. This has several benefits. First, it allows for discussing how important the relationship found between dependent and independent variable is. Second, it facilitates meta-review of research results in a particular topic.

In this regard, I have used the rank-biserial coefficient as an estimator of effect size. I have used the next three references about this topic:

http://yatani.jp/teaching/doku.php?id=hcistats:mannwhitney https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Rank-biserial_correlation

Wendt, H. W. (1972). Dealing with a common problem in Social science: A simplified rank-biserial coefficient of correlation based on the U statistic. European Journal of Social Psychology, 2(4), 463-465. http://doi.org/10.1002/ejsp.2420020412

I have also used ggplot's, numpy's and panda's online documentation to solve questions and fix problems that came along the way. I have also accessed some threads on stackoverflow, but no code was copied from any of these sources: http://stackoverflow.com/questions/22543776/python-ggplot-issues-plotting-8-stocks-and-legend-is-cutoff http://stackoverflow.com/questions/3606697/how-to-set-limits-for-axes-in-ggplot2-r-plots

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
docs		docs
nycsubway		nycsubway
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

nycsubway

nycsubway

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

About

Releases

Packages

Languages

License

jtorrente/nyc-data-analysis

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages