Skip to content

abelsonlive/scrape-the-gibson

Repository files navigation

Scrape the Gibson

These code snippets are the core of a post I wrote about web scraping in python. It's addressed at people who have already done a bit of coding but want to explore scraping in python in more depth. The workshop will be much easier if you have a Mac or Linux-based computer.

Dependencies

  1. Download repo: https://github.com/abelsonlive/scrape-the-gibson

  2. Install dependencies

  • If you don't have pip installed, type:
sudo easy_install pip
  • change directories
cd nyu-skill-share-scraping
  • now run:
sudo pip install -r requirements.txt

Topics

Introduction

  • Getting started with Scraping in Python using requests
  • Exploring HTML documents and extracting the data, with BeautifulSoup
  • Saving scraped data to a database with dataset

Advanced

  • Thinking about ETL (Extract, Transform, Load)
  • Keep your source data around.
  • Running multiple requests in parallel to scrape faster
  • Regular Expressions to Extract More Data
  • Programmatic crawling of entire sites.

Links

There are plenty of existing resources on scraping. A few links:

About

Code snippets for a workshop on web scraping.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages