GitHub - paradigm72/ndn_scraper: HTML scraping example using Python and BeautifulSoup

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.idea		.idea
django_code		django_code
README.txt		README.txt
SQLiteDB		SQLiteDB
getData.js		getData.js
hostingstart-python.html		hostingstart-python.html
index.html		index.html
jquery-1.8.3.min.js		jquery-1.8.3.min.js
style.css		style.css
web.config		web.config

Repository files navigation

Example of a web-scraping web app that uses Python (and the BeautifulSoup parsing library). It abuses the Django framework just to get a quick way to execute a Python script from an AJAX call - not good Django practice at all I don't think.

Mostly done as an exercise to learn a real-world application of Python, and to create a version of NDNation that is easier to read. Could be adapted to other websites by updating the logic in NDNScraper_private.py, as much as of that is site-specific.

Update 12/27/18: In the process of porting this over to run on an Azure instance, I'm working through changing this to be served using a newer instance of Django, which has temporarily broken things. The core scraping logic would be portable pretty easily to a different framework if needed. This isn't currently running on my site due to the cost of keeping a Python instance running on Azure.

Reminder - boot sequence:
1. Prereq: python 2.7 installed
2. Navigate to /django_code
3. python manage.py runserver
4. Connect browser to the site (domain/~user/HTMLScraper/index.html)