Skip to content

NuvoHome/FlaskScrapper

 
 

Repository files navigation

Flask Web App With Selenium and Beautiful Soup

A simple flask app integrated with selenium to run over facebook to mine data also using embedded twisted reactor along with gunicorn WSGI server for Heorku Usage

Python Version Build Status Build Status

This is a project for scrapping likers and commenters of a given post URL of facebook It also collects the likers likings and commenters likings from their profile and store all the data into mongo databases collections. All collected data is Publicly available by facebook.

Installation & Setup (Development Environment)

OS X & Linux & Windows:

git clone https://github.com/PandorAstrum/FlaskScrapper.git
pip install -r requirements.txt

Download (Extras):

Configuration the settings for flask can be found on config.cfg file. Edit it to get your desired settings

Binary folders contains all the web drivers for both mac and windows.

Usage example (Development Environment)

To run the flask project on MAC:

make sure the webdriver for mac has read and write access

  • Select the webdriver, then choose File > Get Info, or press Command-I.
  • Click the disclosure triangle next to Sharing & Permissions to expand the section.
  • Click the pop-up menu next to your user name to see the permissions settings. ...
  • Change the permissions to either Read & Write or “Read only.”

N.B: Here mongo used with URI from mlab, use your own mongo environment if you chnage the database URL in config.cfg file

navigate to the root folder

sudo python app.py

Helpers Library

csv_helpers.py

Library that helps on writing csv file and reading csv file

generic_helpers.py

Library that helps on various task such as get download file name or get time in provided format

scrapper.py

Library that helps on scrapping process

Release History

  • 1.0.0
    • Add: flask web app
    • Add: MongoDB integrations
    • Add: Scrapy framework integrations
    • Add: helpers library

Meta

Ashiquzzaman Khan – @dreadlordn

Distributed under the Apache License 2.0. See LICENSE for more information.

https://github.com/PandorAstrum/FlaskScrapper

About

A scrapper built upon flask framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 43.5%
  • Python 35.0%
  • JavaScript 21.4%
  • Shell 0.1%