This app parses and indexes RSS feeds, so that their entries can be searched and queried via API calls.
- Overview and Features
- Tech Stack and Dependencies
- Setup and Configuration
- Admin and Authentication
- RSS Feeds and Entries
- API Reference
- Contributing
- License
The service allows you to:
- Register any valid RSS 2.0 feed for parsing
- Index multiple feeds in a single PostgreSQL database backend
- Set schedule and frequency for retrieving newly published entries from feed sources
- Expose the indexed entries via REST API endpoints
- Enable filtering based on different fields (date published, keyword, publisher, etc.) (to be implemented)
- Handle user management and permiessions/authentication
The app needs the following things to work:
- Python 3.6
- Django
- Django Rest Framework
- feedparser
- Celery
- Redis
- PostgreSQL
- Gunicorn
- Nginx
- Docker
This service can be run directly in your local environment (suitable for development) or as a multi-container Docker app (recommended for production).
The app needs the following environment variables set in a .env
file in the project's root directory:
SECRET_KEY
- a random string, preferably very long, and very hard to guessPOSTGRES_USER
- name of user that owns the app's databasePOSTGRES_PASSWORD
- password of above userPOSTGRES_DB
- name of the database used by the appDB_PORT
- database port number (optional, defaults to 5432)ADMIN_USER
- username of default admin user (optional)ADMIN_PASSWORD
- password of default admin user (optional)ADMIN_EMAIL
- email address of default admin user (optional)
- Create a Postgres database with the same details as specified in your environment variables
- Create and activate a virtual environment
- Install the development dependencies:
$ pip install -r requirements.txt
- Run the needed migrations:
$ python manage.py migrate
- Create a superuser:
$ python manage.py createsuperuser
- Check if setup is ok:
$ pytest
- Run the Django development server:
$ python manage.py runserver
- Run a Redis server accessible via port 6379
- Open a new terminal,
cd
into project root, and run a Celery worker:$ celery -A rss_apifier worker -l INFO
- Open a new terminal,
cd
into project root, and run Celery Beat:$ celery -A rss_apifier beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler
Notes:
- The above steps should run the service on
localhost:8000
; try making an API call via curl and check if response is 200:curl http://localhost:8000/api/entries/
- See section below on generating an authentication token for admin users
- See section below on adding new feeds
- See section on configuring periodic schedules
- Build the images:
$ docker-compose build
- Run the whole app:
$ docker-compose up
Notes:
- Step two runs the app in a production-ready configuration:
- gunicorn as app server behind nginx listening on port 80
- PostgreSQL database, Redis, Celery worker, and Celery Beat in separate containers
- Django production settings
- Try making an API call with curl:
curl http://localhost/api/entries/
In order to add, modify, and delete RSS feeds, a user needs to have admin privileges and must be authenticated with a token. The app provides several ways to obtain these requirements:
When launching the app through Docker (using the docker-compose.yml
file in the root directory), a default admin user and auth token will be created based on the values of the environment variables ADMIN_USER
, ADMIN_PASSWORD
, and ADMIN_EMAIL
. The default admin user and token will be created only if all three variables have valid values.
When running the app directly on your local environment, however, the default admin user will not be created automatically, so you need to create this yourself and then follow the instructions in the next sections to generate an auth token.
The Site administration page (available via hostname/admin
) allows you to set and change auth tokens for any valid user.
. Log on to the Site admin page 2. Click 'Add' under the
AUTH TOKEN` table
3. Choose the user you want to generate an auth token for
4. Click 'Save'
Notes:
- The above steps require a user with
superuser
privileges. - The steps are similar to how you change a user's existing auth token.
The app also exposes an API endpoint for obtaining an existing user's current auth token via the following URL:
/accounts/token/
Notes:
- The endpoint accets post requests only and expects a JSON payload that contains
{"username": "some_username", "password": "some_password"}
. - If user credentials are valid, the endpoint returns a JSON object that contains
{"token": "SOMEAUTH_TOKEN"}
. - The endpoint generates a new auth token if the user currently doesn't have one yet.
- For more, see the section on obtaining an auth token via API endpoint
Another way to generate an auth token for a user is to use DRF's custom management command:
$ python manage.py drf_create_token NAME_OF_SUPERUSER
The app ships with several features for easily managing feeds and entries, as well as setting schedules for fetching and updating newly published items.
Only admin users with the appropriate authentication token can add, view, edit, and delete RSS feeds. These requirements can be obtained through the following:
- The
Feeds
table on the Site admin page: To add an RSS feed, you need to provide only the feed's URL. The app automatically fetches a feed's details (e.g., name, description, RSS version, etc.) once you hit the 'Save' button. You can also edit a feed's details or delete a feed altogether on the Site admin page. - Various API endpoints: The app exposes a number of API endpoints for admin users to manage feeds. (see the Feed section under API Reference for more)
The app automatically fetches, parses, and saves new entries from each registered RSS feed. To control how often to check feeds for newly published items, please do the following steps:
- Log in to the Site admin page
- Click 'Add' on either the
Crontabs
orIntervals
row of thePERIODIC TASKS
table - Specify the values you want for your task schedule and hit 'Save'
- Go back to the
PERIODIC TASKS
table and click 'Add' on thePeriodic tasks
row - Enter an appropriate name for the scheduled task, then choose 'fetch-entries' from the 'Task' dropdown menu
- Choose the schedule you created in step 2 from either the 'Interval Schedule' or 'Crontab Schedule' dropdown menu
- Specify values in the other fields as appropriate and click 'Save'
Note: For more on managing periodic tasks, see https://github.com/celery/django-celery-beat
This section gives a brief overview on the service's API endpoints, requests, and responses.
The service exposes API endpoints for interacting with saved RSS feeds, indexed feed entries, and registered users. Here's a brief rundown of these endpoints organized by resource.
Contains details associated with a published news article, blog post, or other content. Details include link, title, summary, and published date.
Description:
Retrieves all feed entries currently on record
Endpoint:
GET /api/entries/
Path Parameters:
None
Query Parameters:
See section Query parameters for endpoints that return paginated results
Data Parameters:
None
Success Response:
- Status Code: 200
- Content: See section Response body for endpoints that return paginated results>
Contains information about a saved RSS feed such as title, description, link, RSS version, etc.
Description:
Retrieves all RSS feeds on record
Endpoint:
GET /api/feeds/
Path Parameters:
None
Query Parameters:
See section Query parameters for endpoints that return paginated results
Data Parameters:
None
Request Headers: See section Request header for endpoints that require authentication
Success Response:
- Status Code: 200
- Content: See section Response body for endpoints that return paginated results>
Description:
Retrieves a single RSS feed using the feed's ID
Endpoint:
GET /api/feeds/{feed_id}/
Path Parameters:
feed_id
(integer): the feed's unique ID (required)
Query Parameters:
None
Data Parameters:
None
Request Headers: See Request header for endpoints that require authentication
Success Response:
- Status Code: 200 OK
- Content: See section Feed objects in response content
Description:
Saves a new RSS feed object into the database
Endpoint:
POST /api/feeds/
Path Parameters:
None
Query Parameters:
None
Data Parameters:
This endpoint expects a JSON payload with the following fields/values:
link
(string): URL that points to the RSS feed (required), maximum of 400 characterstitle
(string): the feed's title (optional), maximum of 1,024 charactersdescription
(string): the feed's description (optional), maximum of 2,048 characters
Request Headers: See Request header for endpoints that require authentication
Success Response:
- Status Code: 201 CREATED
- Content: See section Feed objects in response content
Description:
Changes or updates details of a particular feed
Endpoint:
PUT /api/feeds/{feed_id}/
Path Parameters:
feed_id
(integer): the feed's unique ID (required)
Query Parameters:
None
Data Parameters:
This endpoint expects a JSON payload with the following fields/values:
link
(string): URL that points to the RSS feed (required), maximum of 400 characterstitle
(string): the feed's title (optional), maximum of 1,024 charactersdescription
(string): the feed's description (optional), maximum of 2,048 characters
Request Headers: See Request header for endpoints that require authentication
Success Response:
- Status Code: 200
- Content: See section Feed objects in response content
Description:
Removes a saved RSS feed from the database
Endpoint:
DELETE /api/feeds/{feed_id}/
Path Parameters:
feed_id
(integer): the feed's unique ID (required)
Query Parameters:
None
Data Parameters:
None
Request Headers: See Request header for endpoints that require authentication
Success Response:
- Status Code: 204
- Content: None
Description:
Retrieves all entries associated with a given RSS feed
Endpoint:
GET /api/feeds/{feed_id}/entries/
Path Parameters:
feed_id
(integer): the feed's unique ID (required)
Query parameters:
See section Query parameters for endpoints that return paginated results
Data parameters:
None
Request Headers: See Request header for endpoints that require authentication
Success Response:
- Status Code: 200
- Content: See section Response body for endpoints that return paginated results>
Includes information on users, permissions, and authentication details
Description:
Obtains a user's current auth key or creates a new one if it doesn't already exist
Endpoint:
POST /api/accounts/token/
Path Parameters:
None
Query Parameters:
None
Data Parameters:
This endpoint expects a JSON payload with the following fields/values:
username
password
Success Response:
- Status Code: 200
- Content: JSON object with field
token
This section dives into some parameters and request attributes common to all (if not most) of the service's API endpoints.
By default, all endpoints that fetch a collection of objects automatically paginate their results. This behavior can be controlled with the following query parameters:
page
(integer): the results page number to return (optional)page_size
(integer): the number of entries per page to return (optional, defaults to 100)
All API endpoints that interact with feed objects require authentication. These endpoints expect the user's auth token to be included in the request header as follows:
Authorization: Token 705cf7xa9303e013b3c2300408c3dpd6390qcwdf
This section goes over some response content and schemas returned by most of the service's API endpoints.
API endpoints that return paginated results have the following JSON response content:
count
: total number of items foundnext
: URL to next results pageprevious
: URL to previous results pageresults
: array of objects; this can either be an array of feed objects or array of entry objects
A feed entry is represented by the following JSON object:
link
: URL to the article/blog post/contenttitle
: the entry's titlesummary
: the entry's summarypublished
: ISO-formatted datetime string
An RSS feed is represented by the following JSON object:
id
: the feed's IDtitle
: the feed's titledescription
: the feed's descriptionlink
: URL that points to the RSS feedversion
: the feed's RSS versionentries_count
: the total number of entries associated with this feeedentries_list
: URL that points to the list of entries associated with this feed
- Fork this repo at https://github.com/ralphqq/rss-apifier
- Clone your fork into your local machine
- Follow steps in development setup
- Create your feature branch:
$ git checkout -b feature/some-new-thing
- Commit your changes:
$ git commit -m "Develop new thing"
- Push to the branch:
$ git push origin feature/some-new-thing
- Create a pull request