rss-apifier

This app parses and indexes RSS feeds, so that their entries can be searched and queried via API calls.

Overview and Features

The service allows you to:

Register any valid RSS 2.0 feed for parsing
Index multiple feeds in a single PostgreSQL database backend
Set schedule and frequency for retrieving newly published entries from feed sources
Expose the indexed entries via REST API endpoints
Enable filtering based on different fields (date published, keyword, publisher, etc.) (to be implemented)
Handle user management and permiessions/authentication

Tech Stack and Dependencies

The app needs the following things to work:

Python 3.6
Django
Django Rest Framework
feedparser
Celery
Redis
PostgreSQL
Gunicorn
Nginx
Docker

Setup and Configuration

This service can be run directly in your local environment (suitable for development) or as a multi-container Docker app (recommended for production).

Setting environment variables

The app needs the following environment variables set in a .env file in the project's root directory:

SECRET_KEY - a random string, preferably very long, and very hard to guess
POSTGRES_USER - name of user that owns the app's database
POSTGRES_PASSWORD - password of above user
POSTGRES_DB - name of the database used by the app
DB_PORT - database port number (optional, defaults to 5432)
ADMIN_USER - username of default admin user (optional)
ADMIN_PASSWORD - password of default admin user (optional)
ADMIN_EMAIL - email address of default admin user (optional)

Running in local environment

Create a Postgres database with the same details as specified in your environment variables
Create and activate a virtual environment
Install the development dependencies:
```
$ pip install -r requirements.txt
```
Run the needed migrations:
```
$ python manage.py migrate
```
Create a superuser:
```
$ python manage.py createsuperuser
```
Check if setup is ok:
```
$ pytest
```
Run the Django development server:
```
$ python manage.py runserver
```
Run a Redis server accessible via port 6379
Open a new terminal, cd into project root, and run a Celery worker:
```
$ celery -A rss_apifier worker -l INFO
```

Open a new terminal, cd into project root, and run Celery Beat:

$ celery -A rss_apifier beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler

Notes:

The above steps should run the service on localhost:8000; try making an API call via curl and check if response is 200: curl http://localhost:8000/api/entries/
See section below on generating an authentication token for admin users
See section below on adding new feeds
See section on configuring periodic schedules

Running with Docker

Build the images:
```
$ docker-compose build
```
Run the whole app:
```
$ docker-compose up
```

Notes:

Step two runs the app in a production-ready configuration:
- gunicorn as app server behind nginx listening on port 80
- PostgreSQL database, Redis, Celery worker, and Celery Beat in separate containers
- Django production settings
Try making an API call with curl: curl http://localhost/api/entries/

Admin and Authentication

In order to add, modify, and delete RSS feeds, a user needs to have admin privileges and must be authenticated with a token. The app provides several ways to obtain these requirements:

Using auth token from default admin user

When launching the app through Docker (using the docker-compose.yml file in the root directory), a default admin user and auth token will be created based on the values of the environment variables ADMIN_USER, ADMIN_PASSWORD, and ADMIN_EMAIL. The default admin user and token will be created only if all three variables have valid values. When running the app directly on your local environment, however, the default admin user will not be created automatically, so you need to create this yourself and then follow the instructions in the next sections to generate an auth token.

Generating/Changing auth tokens via the Site admin page

The Site administration page (available via hostname/admin) allows you to set and change auth tokens for any valid user.

. Log on to the Site admin page 2. Click 'Add' under the AUTH TOKEN` table 3. Choose the user you want to generate an auth token for 4. Click 'Save'

Notes:

The above steps require a user with superuser privileges.
The steps are similar to how you change a user's existing auth token.

Obtaining an auth token via API endpoint

The app also exposes an API endpoint for obtaining an existing user's current auth token via the following URL:

/accounts/token/

Notes:

The endpoint accets post requests only and expects a JSON payload that contains {"username": "some_username", "password": "some_password"}.
If user credentials are valid, the endpoint returns a JSON object that contains {"token": "SOMEAUTH_TOKEN"}.
The endpoint generates a new auth token if the user currently doesn't have one yet.
For more, see the section on obtaining an auth token via API endpoint

Generating an auth token in the command line

Another way to generate an auth token for a user is to use DRF's custom management command:

$ python manage.py drf_create_token NAME_OF_SUPERUSER

RSS Feeds and Entries

The app ships with several features for easily managing feeds and entries, as well as setting schedules for fetching and updating newly published items.

Adding and managing RSS feeds

Only admin users with the appropriate authentication token can add, view, edit, and delete RSS feeds. These requirements can be obtained through the following:

The Feeds table on the Site admin page: To add an RSS feed, you need to provide only the feed's URL. The app automatically fetches a feed's details (e.g., name, description, RSS version, etc.) once you hit the 'Save' button. You can also edit a feed's details or delete a feed altogether on the Site admin page.
Various API endpoints: The app exposes a number of API endpoints for admin users to manage feeds. (see the Feed section under API Reference for more)

Fetching entries from feeds

The app automatically fetches, parses, and saves new entries from each registered RSS feed. To control how often to check feeds for newly published items, please do the following steps:

Log in to the Site admin page
Click 'Add' on either the Crontabs or Intervals row of the PERIODIC TASKS table
Specify the values you want for your task schedule and hit 'Save'
Go back to the PERIODIC TASKS table and click 'Add' on the Periodic tasks row
Enter an appropriate name for the scheduled task, then choose 'fetch-entries' from the 'Task' dropdown menu
Choose the schedule you created in step 2 from either the 'Interval Schedule' or 'Crontab Schedule' dropdown menu
Specify values in the other fields as appropriate and click 'Save'

Note: For more on managing periodic tasks, see https://github.com/celery/django-celery-beat

API Reference

This section gives a brief overview on the service's API endpoints, requests, and responses.

Resources and Endpoints

The service exposes API endpoints for interacting with saved RSS feeds, indexed feed entries, and registered users. Here's a brief rundown of these endpoints organized by resource.

Entry

Contains details associated with a published news article, blog post, or other content. Details include link, title, summary, and published date.

Retrieve all feed entries

Description:
Retrieves all feed entries currently on record

Endpoint:
GET /api/entries/

Path Parameters:
None

Query Parameters:
See section Query parameters for endpoints that return paginated results

Data Parameters:
None

Success Response:

Status Code: 200
Content: See section Response body for endpoints that return paginated results>

Feed

Contains information about a saved RSS feed such as title, description, link, RSS version, etc.

Retrieve all saved RSS feeds

Description:
Retrieves all RSS feeds on record

Endpoint:
GET /api/feeds/

Path Parameters:
None

Query Parameters:
See section Query parameters for endpoints that return paginated results

Data Parameters:
None

Request Headers: See section Request header for endpoints that require authentication

Success Response:

Status Code: 200
Content: See section Response body for endpoints that return paginated results>

Retrieve a single RSS feed

Description:
Retrieves a single RSS feed using the feed's ID

Endpoint:
GET /api/feeds/{feed_id}/

Path Parameters:

feed_id (integer): the feed's unique ID (required)

Query Parameters:
None

Data Parameters:
None

Request Headers: See Request header for endpoints that require authentication

Success Response:

Status Code: 200 OK
Content: See section Feed objects in response content

Add a new RSS feed

Description:
Saves a new RSS feed object into the database

Endpoint:
POST /api/feeds/

Path Parameters:
None

Query Parameters:
None

Data Parameters:
This endpoint expects a JSON payload with the following fields/values:

link (string): URL that points to the RSS feed (required), maximum of 400 characters
title (string): the feed's title (optional), maximum of 1,024 characters
description (string): the feed's description (optional), maximum of 2,048 characters

Request Headers: See Request header for endpoints that require authentication

Success Response:

Status Code: 201 CREATED
Content: See section Feed objects in response content

Modify an existing RSS feed

Description:
Changes or updates details of a particular feed

Endpoint:
PUT /api/feeds/{feed_id}/

Path Parameters:

feed_id (integer): the feed's unique ID (required)

Query Parameters:
None

Data Parameters:
This endpoint expects a JSON payload with the following fields/values:

link (string): URL that points to the RSS feed (required), maximum of 400 characters
title (string): the feed's title (optional), maximum of 1,024 characters
description (string): the feed's description (optional), maximum of 2,048 characters

Request Headers: See Request header for endpoints that require authentication

Success Response:

Status Code: 200
Content: See section Feed objects in response content

Delete an existing RSS feed

Description:
Removes a saved RSS feed from the database

Endpoint:
DELETE /api/feeds/{feed_id}/

Path Parameters:

feed_id (integer): the feed's unique ID (required)

Query Parameters:
None

Data Parameters:
None

Request Headers: See Request header for endpoints that require authentication

Success Response:

Status Code: 204
Content: None

Retrieve all entries from a specific RSS feed

Description:
Retrieves all entries associated with a given RSS feed

Endpoint:
GET /api/feeds/{feed_id}/entries/

Path Parameters:

feed_id (integer): the feed's unique ID (required)

Query parameters:
See section Query parameters for endpoints that return paginated results

Data parameters:
None

Request Headers: See Request header for endpoints that require authentication

Success Response:

Status Code: 200
Content: See section Response body for endpoints that return paginated results>

Account

Includes information on users, permissions, and authentication details

Obtain authentication token for a user

Description:
Obtains a user's current auth key or creates a new one if it doesn't already exist

Endpoint:
POST /api/accounts/token/

Path Parameters:
None

Query Parameters:
None

Data Parameters:
This endpoint expects a JSON payload with the following fields/values:

username
password

Success Response:

Status Code: 200
Content: JSON object with field token

Parameters and Requests

This section dives into some parameters and request attributes common to all (if not most) of the service's API endpoints.

Query parameters for endpoints that return paginated results

By default, all endpoints that fetch a collection of objects automatically paginate their results. This behavior can be controlled with the following query parameters:

page (integer): the results page number to return (optional)
page_size (integer): the number of entries per page to return (optional, defaults to 100)

Request header for endpoints that require authentication

All API endpoints that interact with feed objects require authentication. These endpoints expect the user's auth token to be included in the request header as follows:

Authorization: Token 705cf7xa9303e013b3c2300408c3dpd6390qcwdf

Response Schemas

This section goes over some response content and schemas returned by most of the service's API endpoints.

Response body for endpoints that return paginated results

API endpoints that return paginated results have the following JSON response content:

count: total number of items found
next: URL to next results page
previous: URL to previous results page
results: array of objects; this can either be an array of feed objects or array of entry objects

Entry objects in response content

A feed entry is represented by the following JSON object:

link: URL to the article/blog post/content
title: the entry's title
summary: the entry's summary
published: ISO-formatted datetime string

Feed objects in response content

An RSS feed is represented by the following JSON object:

id: the feed's ID
title: the feed's title
description: the feed's description
link: URL that points to the RSS feed
version: the feed's RSS version
entries_count: the total number of entries associated with this feeed
entries_list: URL that points to the list of entries associated with this feed

Contributing

Fork this repo at https://github.com/ralphqq/rss-apifier
Clone your fork into your local machine
Follow steps in development setup

Create your feature branch:

$ git checkout -b feature/some-new-thing

Commit your changes:
```
$ git commit -m "Develop new thing"
```

Push to the branch:

$ git push origin feature/some-new-thing

Create a pull request

License

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
accounts		accounts
api		api
feeds		feeds
rss_apifier		rss_apifier
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
boot.sh		boot.sh
dev-requirements.txt		dev-requirements.txt
docker-compose.yml		docker-compose.yml
manage.py		manage.py
nginx.conf		nginx.conf
pytest.ini		pytest.ini
requirements.txt		requirements.txt

License

ralphqq/rss-apifier

Folders and files

Latest commit

History

Repository files navigation

rss-apifier

Contents

Overview and Features

Tech Stack and Dependencies

Setup and Configuration

Setting environment variables

Running in local environment

Running with Docker

Admin and Authentication

Using auth token from default admin user

Generating/Changing auth tokens via the Site admin page

Obtaining an auth token via API endpoint

Generating an auth token in the command line

RSS Feeds and Entries

Adding and managing RSS feeds

Fetching entries from feeds

API Reference

Resources and Endpoints

Entry

Retrieve all feed entries

Feed

Retrieve all saved RSS feeds

Retrieve a single RSS feed

Add a new RSS feed

Modify an existing RSS feed

Delete an existing RSS feed

Retrieve all entries from a specific RSS feed

Account

Obtain authentication token for a user

Parameters and Requests

Query parameters for endpoints that return paginated results

Request header for endpoints that require authentication

Response Schemas

Response body for endpoints that return paginated results

Entry objects in response content

Feed objects in response content

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Languages