Intro

Project was tested on python3.8. It requires 3 libraries to run:

Django
petl
requests

Project relay on external API under: https://swapi.dev.

For simplicity, it uses sqlite database.

Installation

create new virtualenv and install requirements:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Run migrations

python manage.py migrate

Run server

python manage.py runserver

Make sure ./media folder has written access
Open browser http://localhost:8000

Tests

To run tests:

python manage.py test

Developer Notes

SECRET_KEY is hardcoded in settings for simplicity
SWAPI helper (star_wars_people.sw_api.py:SWAPI) uses generator to load data in parts.
Project test coverage is 100% - but probably it could be done better.
I assumed that all characters would eventually need all planets, so I downloaded planets in first place. Those are stored in memory, but in case this wuould get huge, it could be replaced with Cache/Redis or Database.
Actually this was the first time I used petl. Nice library.
First page is not paginated, probably would be good to implement that if we would expect a lot of records.
Filtering and "load more" works together.
Setup CI on Github: https://github.com/mzyndul/sw/actions , Flake8 and tests

Speed Notes / Improvements

Running more thread/processes would increas amount of pages that could be downloaded at once ( on Fetch ). Each process would create its own CSV file and at the end when all processes would finish, the last operation would be to merge all files together. This would prevent any "write access" collision.
1. Adjust page size - on multithread it could be worth to "experiment" with different page size to find which one is most efficient.
For better frontend expirience it would be good to implement SPA or at least partly ajax ( for "load more" for example).
Rendering table directly from etl to html probably would be faster.
Initially I thought that on second "fetch" I could use "edited" flag to minize number of requests, but that data is available on details so anyway I had to download it again. In other hand if this would be solution for mass data, then fetching data and checking if it exists in some local storage would save "saving time" of this record.
Since API is kind limited I was not able to change page_size limit OR filter by "edited" field which would allow minimaze requests.
Some django middlewares were removed for speed improvments ( micro optimization )
In current shape, store files in ramdisk could speed up read/write operations ( on "collection details" file could be copied to ramdisk and then ETL used, which should be faster as it goes with big memory usage, something that petl tried to avoid.)
Implementing queue system like celery would speed up the process.
Storing everything in SLQ/NOSQL would speed up operations.
Implementing cache on django would speed up page rendering ( template cache, default collection view cache)
Using E-tag on request would speed up initial page rendering
Assuming SWAPI can be hosted on multiple servers:
1. use multithread to use all available CPUS ( or more machines if needed)
2. load data from multiple places (avoiding network limitation for one api provider). For example having 10 API endpoints for the same database but different URL, system could calculate how to split all requsts and split it into available servers.
3. put data in celery ( or other queue system )
4. process data on a separate server dedicated for celery tasks processing

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
djangoProject		djangoProject
media		media
star_wars_people		star_wars_people
templates		templates
.gitignore		.gitignore
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

djangoProject

djangoProject

media

media

star_wars_people

star_wars_people

templates

templates

.gitignore

.gitignore

README.md

README.md

manage.py

manage.py

requirements.txt

requirements.txt

Repository files navigation

Intro

Installation

Tests

Developer Notes

Speed Notes / Improvements

About

Releases

Packages

Languages

mzyndul/sw

Folders and files

Latest commit

History

Repository files navigation

Intro

Installation

Tests

Developer Notes

Speed Notes / Improvements

About

Resources

Stars

Watchers

Forks

Languages