This program is broken into three steps:
-
Logs into the EPC feed website with a pre-registered email/token combination. The token is obtained from the email link provided when registering.
-
Downloads feed into local scratch space.
-
Scans through the feed zip for each CSV regarding each local authority. Each CSV is loaded into a pandas dataframe and filtered on the required columns. Once a dataframe is loaded, it's then upserted into a target table in a local postgres instance.
Project is unfinished. It can be run by adding a valid LOGIN_TOKEN
and LOGIN_EMAIL
to the docker-compose.yml and executing
docker-compose up --build
- Tests:
- HEAD request to check if login worked before downloading.
- Check data structure with dummy CSV
- Check upsert successful with dummy CSV
- Use ORM for creating table
- Fix errors around upserts
- Error handling and monitoring
- Linting
- Scaling imports
- Currently needs scratchspace as large as the feed zip
- Could either use API for updates or use cloud storage (S3) to sink data out without touching local disk