This repository is a collection of examples using different data analysis libraries, tools, and techniques.
CSV-parsing: Python, NumPy, Pandas
Reading a CSV file line-by-line is a common problem with a lot of solutions. Which is fastest?
outlier identification: iPython, SciPy, matplotlib
We'll implement the generalized ESD test to automate the task of finding bad data points.
machine learning, classification: iPython, scikit-learn
Learning about supervised and unsupervised classification with scikit-learn
.
web scraping, data analysis: iPython, requests, BeautifulSoup, Pandas
The Discworld is one of the longest series of books ever, with 41 books. If you haven't read anything in Discworld, where should you start? Let's use Goodreads to learn more.
web scraping, data analysis: iPython, Pandas
Everyone knows their favorite actors. Only movie buffs know their favorite directors. But no one knows their favorite movie writers. Let's use IMDB to find out more.
web scraping: iPython, BeautifulSoup, requests
Randal Monroe posited that if you start on any Wikipedia article and take the first link, then take the first link on that article, and repeat, you always end up on the Philosophy page. Let's automate that.