Skip to content
This repository has been archived by the owner on Mar 29, 2022. It is now read-only.

Agrendalath/scraping_foreign_principals

Repository files navigation

Scraping Foreign Principals

CircleCI Code style: black

The following projects was created for scraping Active Foreign Principals from FARA.

Installation

This project uses Pipenv for dependency management. To install it, please follow Pipenv installation guide.

To install dependencies, run:

pipenv sync

If you really don't want to install Pipenv, you can install requirements with:

pip install -r requirements.txt

Usage

Scraping

To run the scraper, issue the following command:

pipenv run scrapy crawl principals -o output.json

The output will be in output.json file. Single row has the following format:

{
  "url": "https://efile.fara.gov/pls/apex/f?p=185:200:5957581211008::NO:RP,200:P200_REG_NUMBER,P200_DOC_TYPE,P200_COUNTRY:6367,Exhibit%20AB,JORDAN",
  "foreign_principal": "Royal Hashemite Court of Jordan",
  "date": "2016-08-10T00:00:00Z",
  "address": "Amman",
  "state": null,
  "country": "JORDAN",
  "registrant": "West Wing Writers, LLC",
  "reg_num": "6367",
  "exhibit_url": "http://www.fara.gov/docs/6317-Exhibit-AB-20180417-5.pdf"
}

Additional parameters

If you want to change number of principals that are downloaded at once, you can provide extra argument:

pipenv run scrapy crawl principals -o output.json -a rows=30

By default, 30 rows are downloaded.

Testing

To run tests, run:

pipenv run python scraping_foreign_principals/tests.py 

Linters

To run formatting checks (pylint and black), you need to install dev dependencies with:

pipenv sync --dev

Then issue the following commands:

pipenv run pylint scraping_foreign_principals
pipenv run black --check -Sl 80 scraping_foreign_principals

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published