pystock-crawler

pystock-crawler is a utility for crawling historical data of US stocks, including:

Ticker symbols listed in NYSE and NASDAQ from NASDAQ
Daily prices from Yahoo Finance
Fundamentals from 10-Q and 10-K filings on SEC EDGAR

Example Output

NYSE ticker symbols:

DDD   3D Systems Corporation
MMM   3M Company
WBAI  500.com Limited
...

Apple's daily prices:

symbol,date,open,high,low,close,volume,adj_close
AAPL,2014-04-28,572.80,595.75,572.55,594.09,23890900,594.09
AAPL,2014-04-25,564.53,571.99,563.96,571.94,13922800,571.94
AAPL,2014-04-24,568.21,570.00,560.73,567.77,27092600,567.77
...

Google's fundamentals:

symbol,end_date,amend,period_focus,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin
GOOG,2009-06-30,False,Q2,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0
GOOG,2009-09-30,False,Q3,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0
GOOG,2009-12-31,False,FY,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0
...

Installation

Prerequisites:

Python 2.7

pystock-crawler is based on Scrapy, so you will also need to install prerequisites such as lxml and libffi for Scrapy and its dependencies. See Scrapy's installation guide for more details.

Install with virtualenv (recommended):

pip install pystock-crawler

Or do system-wide installation:

sudo pip install pystock-crawler

Quickstart

Example 1. Google's and Yahoo's daily prices ordered by date:

pystock-crawler prices GOOG,YHOO -o out.csv --sort

Example 2. Daily prices of all companies listed in ./symbols.txt:

pystock-crawler prices ./symbols.txt -o out.csv

Example 3. Facebook's fundamentals during 2013:

pystock-crawler reports FB -o out.csv -s 20130101 -e 20131231

Example 4. Fundamentals all companies in ./nyse.txt and direct the logs to ./crawling.log:

pystock-crawler reports ./nyse.txt -o out.csv -l ./crawling.log

Example 5. All ticker symbols in NYSE and NASDAQ:

pystock-crawler symbols NYSE,NASDAQ -o out.txt

Usage

Type pystock-crawler -h to see command help:

Usage:
  pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR] [--sort]
  pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD] [-l LOGFILE] [-w WORKING_DIR] [--sort]
  pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]  [-l LOGFILE] [-w WORKING_DIR] [--sort]
  pystock-crawler (-h | --help)
  pystock-crawler (-v | --version)

Options:
  -h --help       Show this screen
  -o OUTPUT       Output file
  -s YYYYMMDD     Start date [default: ]
  -e YYYYMMDD     End date [default: ]
  -l LOGFILE      Log output [default: ]
  -w WORKING_DIR  Working directory [default: .]
  --sort          Sort the result

There are three commands available:

pystock-crawler symbols grabs ticker symbol lists
pystock-crawler prices grabs daily prices
pystock-crawler reports grabs fundamentals

<exchanges> is a comma-separated string that specifies the stock exchanges you want to include. Only NYSE and NASDAQ are supported.

The output file of pystock-crawler symbols can be used for <symbols> argument in pystock-crawler prices and pystock-crawler reports commands.

<symbols> can be an inline string separated with commas or a text file that lists symbols line by line. For example, the inline string can be something like AAPL,GOOG,FB. And the text file may look like this:

# This line is comment
AAPL    Put anything you want here
GOOG    Since the text here is ignored
FB

Use -o to specify the output file. For pystock-crawler symbols command, the output format is a simple text file. For pystock-crawler prices and pystock-crawler reports the output format is CSV.

-l is where the crawling logs go to. If not specified, the logs go to stdout.

By default, the crawler uses the current directory as the working directory. If you don't want to use the current directoy, you can specify it with -w option. The crawler keeps HTTP cache in a directory named .scrapy under the working directory. The cache can save your time by avoid downloading the same web pages. However, the cache can be quite huge. If you don't need it, just delete the .scrapy directory after you've done crawling.

The rows in the output file are in an arbitrary order by default. Use --sort option to sort them by symbols and dates. But if you have a large output file, don't use --sort because it will be slow and eat a lot of memory.

Developer Guide

Installing Dependencies

pip install -r requirements.txt

Running Test

Install test requirements:

pip install -r requirements-test.txt

Then run the test:

py.test

This will download the test data (a lot of XML files) from from SEC EDGAR on the fly, so it will take some time and disk space. The test data is saved to pystock_crawler/tests/sample_data directory. It can be reused on the next time you run the test. If you don't need them, just delete the sample_data directory.

Name		Name	Last commit message	Last commit date
Latest commit History 470 Commits
bin		bin
pystock_crawler		pystock_crawler
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
pytest.ini		pytest.ini
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

pystock_crawler

pystock_crawler

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.rst

README.rst

pytest.ini

pytest.ini

requirements-test.txt

requirements-test.txt

requirements.txt

requirements.txt

scrapy.cfg

scrapy.cfg

setup.py

setup.py

Repository files navigation

pystock-crawler

Example Output

Installation

Quickstart

Usage

Developer Guide

Installing Dependencies

Running Test

About

Releases

Packages

License

justinleoye/pystock-crawler

Folders and files

Latest commit

History

Repository files navigation

pystock-crawler

Example Output

Installation

Quickstart

Usage

Developer Guide

Installing Dependencies

Running Test

About

Resources

License

Stars

Watchers

Forks