Scraping Foreign Principals

The following projects was created for scraping Active Foreign Principals from FARA.

Installation

This project uses Pipenv for dependency management. To install it, please follow Pipenv installation guide.

To install dependencies, run:

pipenv sync

If you really don't want to install Pipenv, you can install requirements with:

pip install -r requirements.txt

Usage

Scraping

To run the scraper, issue the following command:

pipenv run scrapy crawl principals -o output.json

The output will be in output.json file. Single row has the following format:

{
  "url": "https://efile.fara.gov/pls/apex/f?p=185:200:5957581211008::NO:RP,200:P200_REG_NUMBER,P200_DOC_TYPE,P200_COUNTRY:6367,Exhibit%20AB,JORDAN",
  "foreign_principal": "Royal Hashemite Court of Jordan",
  "date": "2016-08-10T00:00:00Z",
  "address": "Amman",
  "state": null,
  "country": "JORDAN",
  "registrant": "West Wing Writers, LLC",
  "reg_num": "6367",
  "exhibit_url": "http://www.fara.gov/docs/6317-Exhibit-AB-20180417-5.pdf"
}

Additional parameters

If you want to change number of principals that are downloaded at once, you can provide extra argument:

pipenv run scrapy crawl principals -o output.json -a rows=30

By default, 30 rows are downloaded.

Testing

To run tests, run:

pipenv run python scraping_foreign_principals/tests.py

Linters

To run formatting checks (pylint and black), you need to install dev dependencies with:

pipenv sync --dev

Then issue the following commands:

pipenv run pylint scraping_foreign_principals
pipenv run black --check -Sl 80 scraping_foreign_principals

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.circleci		.circleci
scraping_foreign_principals		scraping_foreign_principals
.gitignore		.gitignore
.pylintrc		.pylintrc
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.circleci

.circleci