This github hosts the code of the platform Tournesol.app.
See the wiki page Contribute to Tournesol for details.
We use TensorFlow to compute the aggregated scores, Django for the backend, and React.js for the front-end.
Click to expand
First, clone this repo and cd
to it.
-
Install latest nodejs and npm. Install Python 3
-
Install dependencies for front-end
$ cd frontend frontend $ npm install
-
Create a virtual environment for backend and install its dependencies:
$ python3 -m venv venv # run that to go inside the virtual environment $ source venv/bin/activate (venv) $ pip install -r backend/requirements.txt
-
Run tests to see that the installation is correct:
./tests.sh
$ cd frontend
# will watch for changes made to the frontend source code and re-build automatically:
frontend $ npm run dev
(venv) $ . ./debug_export.sh # to set env vars
# cd backend
# (optional) run training
(venv) backend $ python manage.py ml_train
# (optional) download latest video metadata
(venv) backend $ python manage.py add_videos
# optional: create a user for yourself
(venv) backend $ python manage.py createsuperuser
# now go to localhost:8000
(venv) backend $ python manage.py runserver
Note that creating a super user is highly recommended for testing the website locally and contributing to the codebase. 💯
Click to expand
API is implemented in Django-REST using Spectacular for annotations compliany with OpenAPI 3.0:
-
For API v2, the OpenAPI 3 schema is available at schema.json or at schema/
- To generate it, run
backend $ python manage.py spectacular --file schema.json --format openapi-json --validate
- To generate it, run
-
For API v2, auto-generated documentation is available as well:
- Via Swagger: schema/swagger-ui/
- Via ReDoc: schema/redoc/
-
Old API (v1): api.py, will run at api_explorer/deprecated
- Backend documentation (Sphinx): backend/doc/build/html/index.html
- API v2 documentation in Markdown (auto-generated): API/README.md
- API v2 documentation for JavaScript auto-generated code: frontend/api/README.md
- Main page -- loads react.js template
/admin
Django admin panel. Use the superuser login you created- Training artifacts:
/files
Click to expand
- The video fields (reliability, ...) are described in rating_fields.py.
- The model transforms Expert Ratings (pairwise comparisons),
ExpertRating
model into aggregated scores for eachVideo
- Per-expert scores are written to the
VideoRating
model - To run the model training, call
backend $ python manage.py ml_train
, this will run the ml_train.py- The script will save weights and plots to
backend/../.models/
- The script will use the default config file specified by
--config
- To run hyperparameter tuning with ray tune, add the
--tune
option and use a corresponding config file, such as featureless_config_hparam_search.gin. The file will generate TensorBoard logs and best/worst predictions in~/ray_results
.
- The script will save weights and plots to
- There are 2 frameworks used in the project:
- Embedding model. Uses the
Video.embedding
field in order to represent a video - (now used) Featureless model. For each video and each expert, there is a variable
- Embedding model. Uses the
- Code structure for the ML models, see backend/backend/ml_model
- preference_aggregation.py defines the abstract preference aggregation model without application to Tournesol
- Constructor creates the model,
fit()
trains it,__call__()
is for prediction. MedianPreferenceAggregator
takes outputs of many models and computes the median- preference_aggregation_featureless.py Featureless implementation
VariableIndexLayer
defines the Keras layer with a variable which takes indices as inputs and outputsvariable[index]
AllRatingsWithCommon
defines the wrapper aroundVariableIndexLayer
with user-friendly access (indices are converted into names and vice-versa), as well as checkpointingFeaturelessPreferenceLearningModel
defines a wrapper aroundAllRatingsWithCommon
which implements prediction for a particular user, and ratings storageFeaturelessMedianPreferenceAverageRegularizationAggregator
implements the losses, minibatch computation and the plotting of losses
- preference_aggregation_with_embeddings.py Embedding implementation
- Constructor creates the model,
- client_server/database_learner.py Abstract class to load data to and from the database into the Preference Aggregation model
- Constructor loads data, the
fit()
method trains the model,update_features()
saves results.load()
andsave()
are for checkpointing - django_ml_featureless.py Featureless implementation
- django_ml_embedding.py Embedding implementation
- Constructor loads data, the
- preference_aggregation.py defines the abstract preference aggregation model without application to Tournesol
The rough plan to add online updates would be to:
- Create a
DatabasePreferenceLearnerFeatureless
as a global object inside the Django code (do once, it would take time) - Load current weights data from a checkpoint (do once, it would take time), use
.load()
- Load the ratings into the learner, use
.fit()
with0
epochs - Do custom updates (write your own
tf.function
that will re-compute the weights) - Get the results and send via API
For quick development, you can use Jupyter notebooks (running by-default on port 8899 if started via launch.sh)
Click to expand
- notebooks -- research and development
- frontend -- react.js code
- backend -- django/tensorflow code
- backend/db.sqlite3 -- database with videos, preferences, ratings
- backend/api.py -- API definition
- backend/models.py -- Models definition. After updating, run
(venv) backend $ python manage.py makemigrations && python manage.py migrate
- backend/ml_models.py -- Machine Learning part of the project (server definition included)
- backend/ml_client.py _trainr
- backend/preference_aggregation.py -- code to aggregate expert ratings
- backend/rating_fields.py -- video fields (rating)
- backend/video_search.py -- code to search for a video by name in Django database
- backend/add_videos.py -- code to import videos from YouTube
- backend/management - code to automate import/ml server tasks
- config -- server configuration files
- scripts -- server scripts
- Running the notebook to interface Django models
(venv) backend $ python manage.py shell_plus --notebook
- Populate the database with videos:
notebooks/populate-database.ipynb
use Notebook with Django (as above) - Show the PCA of all embeddings and other stats:
notebooks/video-database-stats.ipynb
- The
dev-server
branch contains code running atdev.tournesol.app
and contains unmerged pull-requests - The
master
branch contains tested code.dev-server
gets merged into it when the pull-request gets merged - Before commiting, use
tests.sh
to run tests - To run github action locally (useful to test dependencies installation as well), run
act -b --reuse
with act - Integration tests produce videos of the format
integration_test_xxxx.avi
. A frame is grabbed each time an attribute is requested from a driver (this is a hacky a bit)