The following projects was created for scraping Active Foreign Principals from FARA.
This project uses Pipenv for dependency management. To install it, please follow Pipenv installation guide.
To install dependencies, run:
pipenv sync
If you really don't want to install Pipenv, you can install requirements with:
pip install -r requirements.txt
To run the scraper, issue the following command:
pipenv run scrapy crawl principals -o output.json
The output will be in output.json
file. Single row has the following format:
{
"url": "https://efile.fara.gov/pls/apex/f?p=185:200:5957581211008::NO:RP,200:P200_REG_NUMBER,P200_DOC_TYPE,P200_COUNTRY:6367,Exhibit%20AB,JORDAN",
"foreign_principal": "Royal Hashemite Court of Jordan",
"date": "2016-08-10T00:00:00Z",
"address": "Amman",
"state": null,
"country": "JORDAN",
"registrant": "West Wing Writers, LLC",
"reg_num": "6367",
"exhibit_url": "http://www.fara.gov/docs/6317-Exhibit-AB-20180417-5.pdf"
}
If you want to change number of principals that are downloaded at once, you can provide extra argument:
pipenv run scrapy crawl principals -o output.json -a rows=30
By default, 30 rows are downloaded.
To run tests, run:
pipenv run python scraping_foreign_principals/tests.py
To run formatting checks (pylint and black), you need to install dev dependencies with:
pipenv sync --dev
Then issue the following commands:
pipenv run pylint scraping_foreign_principals
pipenv run black --check -Sl 80 scraping_foreign_principals