Skip to content

An app that retrieves daily 8-K and Form D filings from EDGAR database for certain states. This is a data journalism project of the Missouri Business Alert.

Notifications You must be signed in to change notification settings

kolgusheva/edgar-sec-scraping

Repository files navigation

The Edgar SEC Scraper

This app retrieves records for filings of 8-K, 10-K, 10-Q and Form D filings from EDGAR database for all companies registered within Missouri and close areas of interest for journalists (in this case the Kansas City and the St.Louis areas). We use the EDGAR FTP for this project.

This is a data journalism project of the Missouri Business Alert.

App logic

  • App is designed to ideally be run by CRON every morning, since data on EDGAR is updated every evening or so, except for holidays and weekends. The file you need CRON (or you) to run is application.sh
  • When run, app opens a web page with all filings for yesterday by going to ftp://ftp.sec.gov/edgar/daily-index/form.YYYYMMDD.idx", you can check that URL by pasting proper date and opening link in a browser.
    • Scrapes that page and looks for filings for the preset forms
    • Opens every filing page that is in list of forms and checks ZIP codes to make sure it's the area we're interested in.
  • If there are matches for form type and based on the zip code in the Business Address field, we have a match. At the end all matches are written to an HTML page generated by Flask and Frozen Flask and as plain csv.
  • App opens a preset Google Spreadsheet with list of email addresses - this way it's easy to manage who receives emails from the scraper.
  • App emails the plain and HTML versions to receivers in the list
  • Also, the app will create an sqlite database file for you and will update it with new matches each time you run it (or CRON runs it).

Setting things up

There are several important things you need to do to run this app!:

  • You can create a Gmail account the app will use to send emails, or use an existing one, but make sure you allow less secure apps to use it. Here's a good tutorial on what is needed: https://support.google.com/accounts/answer/6010255. Alternatively, you can set up your own email server or use more secure login techniques.
  • If you want to use a Google Spreadsheet to manage email recipients, which is built in by default, you will be using gspread and will need to set things up for using it with Google and OAuth. You will want to follow this tutorial: http://gspread.readthedocs.org/en/latest/oauth2.html. Alternatively you could just hard-code the list of recipients into the app code (it's declared in send_email.py).
    • In the spreadsheet, you should use the first column (column A) for emails of recipients, except for the very first cell (cell A1) because that one is for the column name.
  • Make sure you remane example_cred.json from this repo to just cred.json, updating the file with information you get from setting up Google Drive API (if you will be using a Google Spreadsheet). The JSON file structure should be self-explanatory.

Requirements

All requirements generated via pip freeze. You can install them automatically using the requirements.txt file provided in this repo. To do that just run pip install -r requirements.txt.

BeautifulSoup==3.2.1 beautifulsoup4==4.4.0 cssutils==1.0 Flask==0.10.1 Frozen-Flask==0.11 itsdangerous==0.24 Jinja2==2.8 MarkupSafe==0.23 peewee==2.6.3 pynliner==0.5.2 Werkzeug==0.10.4 wheel==0.24.0

All suggestions and comments are welcome. Released under the Apache license.

About

An app that retrieves daily 8-K and Form D filings from EDGAR database for certain states. This is a data journalism project of the Missouri Business Alert.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published