Tournesol: collaborative content recommendation

This github hosts the code of the platform Tournesol.app.

See the wiki page Contribute to Tournesol for details.

We use TensorFlow to compute the aggregated scores, Django for the backend, and React.js for the front-end.

How to launch (tested on Ubuntu in WSL)

Click to expand

First, clone this repo and cd to it.

Prerequisites

Install latest nodejs and npm. Install Python 3
Install dependencies for front-end
```
$ cd frontend
frontend $ npm install
```

Create a virtual environment for backend and install its dependencies:

$ python3 -m venv venv

# run that to go inside the virtual environment
$ source venv/bin/activate
(venv) $ pip install -r backend/requirements.txt

Run tests to see that the installation is correct: ./tests.sh

Building and running front-end

$ cd frontend

# will watch for changes made to the frontend source code and re-build automatically:
frontend $ npm run dev

Running back-end

(venv) $ . ./debug_export.sh # to set env vars
# cd backend

# (optional) run training
(venv) backend $ python manage.py ml_train

# (optional) download latest video metadata
(venv) backend $ python manage.py add_videos

# optional: create a user for yourself
(venv) backend $ python manage.py createsuperuser

# now go to localhost:8000
(venv) backend $ python manage.py runserver

Note that creating a super user is highly recommended for testing the website locally and contributing to the codebase. 💯

API

Click to expand

API is implemented in Django-REST using Spectacular for annotations compliany with OpenAPI 3.0:

API (v2): api_v2, running at api/v2/.

For API v2, the OpenAPI 3 schema is available at schema.json or at schema/

To generate it, run

backend $ python manage.py spectacular --file schema.json --format openapi-json --validate

For API v2, auto-generated documentation is available as well:
- Via Swagger: schema/swagger-ui/
- Via ReDoc: schema/redoc/
~~Old API (v1): api.py, will run at api_explorer/~~ deprecated

Documentation

Backend documentation (Sphinx): backend/doc/build/html/index.html
API v2 documentation in Markdown (auto-generated): API/README.md
API v2 documentation for JavaScript auto-generated code: frontend/api/README.md

Website structure

Main page -- loads react.js template
/admin Django admin panel. Use the superuser login you created
Training artifacts: /files

Machine learning model

Click to expand

The video fields (reliability, ...) are described in rating_fields.py.
The model transforms Expert Ratings (pairwise comparisons), ExpertRating model into aggregated scores for each Video
Per-expert scores are written to the VideoRating model
To run the model training, call backend $ python manage.py ml_train, this will run the ml_train.py
- The script will save weights and plots to backend/../.models/
- The script will use the default config file specified by --config
- To run hyperparameter tuning with ray tune, add the --tune option and use a corresponding config file, such as featureless_config_hparam_search.gin. The file will generate TensorBoard logs and best/worst predictions in ~/ray_results.
There are 2 frameworks used in the project:
- Embedding model. Uses the Video.embedding field in order to represent a video
- (now used) Featureless model. For each video and each expert, there is a variable
Code structure for the ML models, see backend/backend/ml_model
1. preference_aggregation.py defines the abstract preference aggregation model without application to Tournesol
  - Constructor creates the model, fit() trains it, __call__() is for prediction.
  - MedianPreferenceAggregator takes outputs of many models and computes the median
  - preference_aggregation_featureless.py Featureless implementation
    - VariableIndexLayer defines the Keras layer with a variable which takes indices as inputs and outputs variable[index]
    - AllRatingsWithCommon defines the wrapper around VariableIndexLayer with user-friendly access (indices are converted into names and vice-versa), as well as checkpointing
    - FeaturelessPreferenceLearningModel defines a wrapper around AllRatingsWithCommon which implements prediction for a particular user, and ratings storage
    - FeaturelessMedianPreferenceAverageRegularizationAggregator implements the losses, minibatch computation and the plotting of losses
  - preference_aggregation_with_embeddings.py Embedding implementation
2. client_server/database_learner.py Abstract class to load data to and from the database into the Preference Aggregation model
  - Constructor loads data, the fit() method trains the model, update_features() saves results. load() and save() are for checkpointing
  - django_ml_featureless.py Featureless implementation
  - django_ml_embedding.py Embedding implementation

Where to add online updates

The rough plan to add online updates would be to:

Create a DatabasePreferenceLearnerFeatureless as a global object inside the Django code (do once, it would take time)
Load current weights data from a checkpoint (do once, it would take time), use .load()
Load the ratings into the learner, use .fit() with 0 epochs
Do custom updates (write your own tf.function that will re-compute the weights)
Get the results and send via API

For quick development, you can use Jupyter notebooks (running by-default on port 8899 if started via launch.sh)

Directory structure

Click to expand

notebooks -- research and development
frontend -- react.js code
backend -- django/tensorflow code
backend/db.sqlite3 -- database with videos, preferences, ratings
backend/api.py -- API definition
backend/models.py -- Models definition. After updating, run (venv) backend $ python manage.py makemigrations && python manage.py migrate
backend/ml_models.py -- Machine Learning part of the project (server definition included)
backend/ml_client.py _trainr
backend/preference_aggregation.py -- code to aggregate expert ratings
backend/rating_fields.py -- video fields (rating)
backend/video_search.py -- code to search for a video by name in Django database
backend/add_videos.py -- code to import videos from YouTube
backend/management - code to automate import/ml server tasks
config -- server configuration files
scripts -- server scripts

Useful things

Running the notebook to interface Django models (venv) backend $ python manage.py shell_plus --notebook
Populate the database with videos: notebooks/populate-database.ipynb use Notebook with Django (as above)
Show the PCA of all embeddings and other stats: notebooks/video-database-stats.ipynb

Development workflow

The dev-server branch contains code running at dev.tournesol.app and contains unmerged pull-requests
The master branch contains tested code. dev-server gets merged into it when the pull-request gets merged
Before commiting, use tests.sh to run tests
To run github action locally (useful to test dependencies installation as well), run act -b --reuse with act
Integration tests produce videos of the format integration_test_xxxx.avi. A frame is grabbed each time an attribute is requested from a driver (this is a hacky a bit)

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
API		API
backend		backend
frontend		frontend
integration_test		integration_test
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
debug_export.sh		debug_export.sh
demo.png		demo.png
launch_debug.sh		launch_debug.sh
tests.sh		tests.sh
tox.ini		tox.ini
update_api.sh		update_api.sh

License

TheDuckWhisperer/tournesol

Folders and files

Latest commit

History

Repository files navigation

Tournesol: collaborative content recommendation

How to launch (tested on Ubuntu in WSL)

Prerequisites

Building and running front-end

Running back-end

API

Documentation

Website structure

Machine learning model

Where to add online updates

Directory structure

Useful things

Development workflow

About

Resources

License

Stars

Watchers

Forks

Languages