Skip to content

mxizhang/ScrapyCounty

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScrapyCounty

Implemented with Python 2.7 by Xi Zhang for Noah Luk at LNH Holding LLC

Overview:

Collecting sales data from <http://salesweb.civilview.com/>
Counties supported:
- Morris
- Essex
- Bergen
- Hunterdon  <http://www.co.hunterdon.nj.us/sheriff/SALES/sold.pdf>
- Middlesex  <http://www.middlesexcountynj.gov/Government/Departments/PSH/Pages/Foreclosures.aspx>
- Mercer
- Union
- Monmouth and Passaic comming soon

Installation:

Check pip:

  • pip is already installed if you're using Python 2 >=2.7.9 or Python 3 >=3.4 downloaded from python.org, but you'll need to upgrade ir.<pip install -U pip>
  • Download pip https://pip.pypa.io/en/stable/installing/

Part 1: (For scrapy crawler)

  • Scrapy http://scrapy.org/

    Install: $ pip install scrapy

    Windows x32:

    $ pip install pypiwin32

    On Windows with Error: ** make sure the development packages of libxml2 and libxslt are installed **

    1. download lxml & twisted wheel from here: http://www.lfd.uci.edu/~gohlke/pythonlibs/

    2. pip install C:\Users\Home\Downloads\lxml- ......... .whl

  • Selenium https://pypi.python.org/pypi/selenium

    Install: $ pip install selenium

  • PhantomJS http://phantomjs.org/

    Install: $ sudo pkg install phantomjs

      Tip for Windows:
      	Change path for PhantomJS first
      Recommand path:
      	C:/phantomjs-2.1.1-windows/bin/phantomjs.exe
    

Part 2:

Before run:

* Make sure change spreadsheets address
* Make sure share spreadsheets with client in credentials

Run:

python scrapycounty.py

About

Implemented by Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages