Skip to content

TheDuckWhisperer/tournesol

 
 

Repository files navigation

Tournesol: collaborative content recommendation

This github hosts the code of the platform Tournesol.app.

See the wiki page Contribute to Tournesol for details.

Home page of Tournesol.app

We use TensorFlow to compute the aggregated scores, Django for the backend, and React.js for the front-end.

How to launch (tested on Ubuntu in WSL)

Continuous Integration

Click to expand

First, clone this repo and cd to it.

Prerequisites

  1. Install latest nodejs and npm. Install Python 3

  2. Install dependencies for front-end

    $ cd frontend
    frontend $ npm install
    
  3. Create a virtual environment for backend and install its dependencies:

    $ python3 -m venv venv
    
    # run that to go inside the virtual environment
    $ source venv/bin/activate
    (venv) $ pip install -r backend/requirements.txt
    
  4. Run tests to see that the installation is correct: ./tests.sh

Building and running front-end

$ cd frontend

# will watch for changes made to the frontend source code and re-build automatically:
frontend $ npm run dev

Running back-end

(venv) $ . ./debug_export.sh # to set env vars
# cd backend

# (optional) run training
(venv) backend $ python manage.py ml_train

# (optional) download latest video metadata
(venv) backend $ python manage.py add_videos

# optional: create a user for yourself
(venv) backend $ python manage.py createsuperuser

# now go to localhost:8000
(venv) backend $ python manage.py runserver

Note that creating a super user is highly recommended for testing the website locally and contributing to the codebase. 💯

API

Click to expand

API is implemented in Django-REST using Spectacular for annotations compliany with OpenAPI 3.0:

Documentation

Website structure

  • Main page -- loads react.js template
  • /admin Django admin panel. Use the superuser login you created
  • Training artifacts: /files

Machine learning model

Click to expand
  • The video fields (reliability, ...) are described in rating_fields.py.
  • The model transforms Expert Ratings (pairwise comparisons), ExpertRating model into aggregated scores for each Video
  • Per-expert scores are written to the VideoRating model
  • To run the model training, call backend $ python manage.py ml_train, this will run the ml_train.py
    • The script will save weights and plots to backend/../.models/
    • The script will use the default config file specified by --config
    • To run hyperparameter tuning with ray tune, add the --tune option and use a corresponding config file, such as featureless_config_hparam_search.gin. The file will generate TensorBoard logs and best/worst predictions in ~/ray_results.
  • There are 2 frameworks used in the project:
    • Embedding model. Uses the Video.embedding field in order to represent a video
    • (now used) Featureless model. For each video and each expert, there is a variable
  • Code structure for the ML models, see backend/backend/ml_model
    1. preference_aggregation.py defines the abstract preference aggregation model without application to Tournesol
      • Constructor creates the model, fit() trains it, __call__() is for prediction.
      • MedianPreferenceAggregator takes outputs of many models and computes the median
      • preference_aggregation_featureless.py Featureless implementation
        • VariableIndexLayer defines the Keras layer with a variable which takes indices as inputs and outputs variable[index]
        • AllRatingsWithCommon defines the wrapper around VariableIndexLayer with user-friendly access (indices are converted into names and vice-versa), as well as checkpointing
        • FeaturelessPreferenceLearningModel defines a wrapper around AllRatingsWithCommon which implements prediction for a particular user, and ratings storage
        • FeaturelessMedianPreferenceAverageRegularizationAggregator implements the losses, minibatch computation and the plotting of losses
      • preference_aggregation_with_embeddings.py Embedding implementation
    2. client_server/database_learner.py Abstract class to load data to and from the database into the Preference Aggregation model
      • Constructor loads data, the fit() method trains the model, update_features() saves results. load() and save() are for checkpointing
      • django_ml_featureless.py Featureless implementation
      • django_ml_embedding.py Embedding implementation

Where to add online updates

The rough plan to add online updates would be to:

  1. Create a DatabasePreferenceLearnerFeatureless as a global object inside the Django code (do once, it would take time)
  2. Load current weights data from a checkpoint (do once, it would take time), use .load()
  3. Load the ratings into the learner, use .fit() with 0 epochs
  4. Do custom updates (write your own tf.function that will re-compute the weights)
  5. Get the results and send via API

For quick development, you can use Jupyter notebooks (running by-default on port 8899 if started via launch.sh)

Directory structure

Click to expand
  • notebooks -- research and development
  • frontend -- react.js code
  • backend -- django/tensorflow code
  • backend/db.sqlite3 -- database with videos, preferences, ratings
  • backend/api.py -- API definition
  • backend/models.py -- Models definition. After updating, run (venv) backend $ python manage.py makemigrations && python manage.py migrate
  • backend/ml_models.py -- Machine Learning part of the project (server definition included)
  • backend/ml_client.py _trainr
  • backend/preference_aggregation.py -- code to aggregate expert ratings
  • backend/rating_fields.py -- video fields (rating)
  • backend/video_search.py -- code to search for a video by name in Django database
  • backend/add_videos.py -- code to import videos from YouTube
  • backend/management - code to automate import/ml server tasks
  • config -- server configuration files
  • scripts -- server scripts

Useful things

  • Running the notebook to interface Django models (venv) backend $ python manage.py shell_plus --notebook
  • Populate the database with videos: notebooks/populate-database.ipynb use Notebook with Django (as above)
  • Show the PCA of all embeddings and other stats: notebooks/video-database-stats.ipynb

Development workflow

  • The dev-server branch contains code running at dev.tournesol.app and contains unmerged pull-requests
  • The master branch contains tested code. dev-server gets merged into it when the pull-request gets merged
  • Before commiting, use tests.sh to run tests
  • To run github action locally (useful to test dependencies installation as well), run act -b --reuse with act
  • Integration tests produce videos of the format integration_test_xxxx.avi. A frame is grabbed each time an attribute is requested from a driver (this is a hacky a bit)

About

Tournesol aims to identify top videos of public utility by eliciting contributors' judgments on content quality.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 54.3%
  • JavaScript 45.2%
  • Other 0.5%