Skip to content

Extracts and builds a database of preliminary financial statement items from the SEC Edgar system.

Notifications You must be signed in to change notification settings

swidoff/edgar_prelim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

edgar_prelim

Extracts and builds a database of preliminary financial statement items from the SEC Edgar system.

badge

  • Browse the database on Heroku
  • Play with the notebooks on Binder

US Companies publish their 10-K and 10-Q annual and quarterly financial statements in (mostly) machine-readable XBRL format on the SEC Edgar website. In the weeks leading up to the final announcement date, many companies also publish 8-K preliminary, unaudited versions that nevertheless preview useful information. Getting early access to those preliminaries can allow an investor to calculate more accurate valuations ahead of the announcement. However, these preliminary announcements are usually submitted as loosely structured HTML documents that aren't very consistent across companies. If you are running a systematic strategy on a large universe, combing through the HTML by hand is prohibitive.

Here is a snippet of the CITIGROUP 2019Q1 preliminary income statement:

CITIGROUP 2019Q1 raw!

edgar_prelim systematically discovers, scrapes and conforms HTML preliminary announcement data from the SEC Edgar website with as few errors as possible.

Here's what edgar_prelim collects:

CITIGROUP 2019Q1 edgar_prelim!

View the full CITIGROUP 2019Q1 edgar_prelim report.

The committed sqlite database file contains the following preliminary income statement items and ratios for preliminary announcements from the largest 373 US banks (by total assets) through 2019Q1:

  • Book value per share
  • Interest income
  • Net income
  • Net interest income
  • Provision for loan losses
  • Total revenue (where applicable)

There is a lot of variety between companies on the format of the income statement tables, but it is not unlimited. edgar_prelim combs through all company 8-Ks and:

  • Identifies those that are preliminary announcements
  • Extracts all tables, using heuristics to find their titles and the table units
  • Filters tables to those whose titles are relevant to the items according to item metadata
  • Conforms the tables to a standardized format through a series of transformations
  • Locates the item rows and the most recent fiscal period columns again using some heuristics
  • Parses the values and collects them in a clean, table format
  • Produces a report for each company that
    • Maps each value to its location in the source report
    • Plots the values over time, which makes it easy to spot outliers
    • Flags missing values and other potential errors

About

Extracts and builds a database of preliminary financial statement items from the SEC Edgar system.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published