Skip to content

hackerup/coursera_scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Coursera Scraper App.... Scrapin' Away...

  1. Synopsis

  2. Project Information

  3. Technologies Used

  4. How to run this town....(I mean script)

  5. License

Synopsis

Want to find out what new and hip courses are out in Coursera? Want to see what Coursera is offering? Hopefully Coursera doesn't change their page structure, otherwise this fabulous program will scrape https://www.coursera.org/courses?languages=en for the organization, author, title, start date, and duration.

Project Information

The coursera scraper has a couple of components to it (see below). This scraper downloads the asynchronus Coursera course page by mimicking browser behavior. The data is then stored in PosgreSQL defined by the SQLalchemy schema.

  • scraper
  • model
  • seed

##Executing script

  • Fork this repo
  • cd coursera_scraper_app/spiders
  • pip install -r requirements.txt
  • python scraper.py
  • DOWNLOAD POSTGRES
  • (uncomment line in main(), python seed.py This initializes db tables,
  • Then comment same line in main(), python seed.py
  • See POSTGRES docs for more info on queries and the likes

Technologies Used

  • Python 3.4.2
  • PostgreSQL
  • Selenium
  • BeautifulSoup
  • SQLAlchemy

License

Private end-user license agreement

About

This is a mini coursera scraper that grabs: authors, title, start date, duration, and organization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%