Skip to content

dvyukov/kcidb

 
 

Repository files navigation

KCIDB

Kcidb is a package for submitting and querying Linux Kernel CI reports, and for maintaining the service behind that.

See the collected results on our dashboard. Write to kernelci@groups.io if you want to start submitting results from your CI system, or if you want to receive automatic notifications of arriving results.

Installation

Kcidb requires Python v3.6 or later.

To install the package for the current user, run this command:

pip3 install --user <SOURCE>

Where <SOURCE> is the location of the package source, e.g. a git repo:

pip3 install --user git+https://github.com/kernelci/kcidb.git

or a directory path:

pip3 install --user .

In any case, make sure your PATH includes the ~/.local/bin directory, e.g. with:

export PATH="$PATH":~/.local/bin

Before you execute any of the tools make sure you have the path to your Google Cloud credentials stored in the GOOGLE_APPLICATION_CREDENTIALS variable. E.g.:

export GOOGLE_APPLICATION_CREDENTIALS=~/.credentials.json

User guide

Submitting and querying

To submit records use kcidb-submit, to query records - kcidb-query. Both use the same JSON schema on standard input and output respectively, which can be displayed by kcidb-schema. You can validate the data without submitting it using the kcidb-validate tool.

See Submission HOWTO for details.

API

You can use the kcidb module to do everything the command-line tools do.

First, make sure you have the GOOGLE_APPLICATION_CREDENTIALS environment variable set and pointing at your Google Cloud credentials file. Then you can create the client with kcidb.Client(...) and call its submit(...) and query(...) methods.

You can find the I/O schema in kcidb.io.schema.LATEST.json and use kcidb.io.schema.validate() to validate your I/O data.

See the source code for additional documentation.

Administrator guide

Kcidb infrastructure is mostly based on Google Cloud services at the moment:

=== Hosts ===  ======================= Google Cloud Project ========================

~~ Staging ~~                                                    ~~~~ BigQuery ~~~~~
kcidb-grafana <-------------------------------------------------  . kernelciXX .
                                                                 :   revisions  :
~~ Client ~~~                                                    :   builds     :
kcidb-query <--------------------------------------------------- :   tests      :
                                                                  ''''''''''''''
                ~~ Pub/Sub ~~       ~~~~ Cloud Functions ~~~~            ^
                kernelci_trigger -------.                                |
                                        v                                |
kcidb-submit -> kernelci_new -----> kcidb_load_queue --------------------'
                                        |
                      .-----------------'
                      v                                          ~~~~ Firestore ~~~~
                kernelci_loaded --> kcidb_spool_notifications -> notifications
                                                                       |
                                               .-----------------------'
                                               |
                                               v                 ~ Secret Manager ~~
                                    kcidb_send_notification <--- kcidb_smtp_password
                                               |
                                               |                 ~~~~~~ GMail ~~~~~~
                                               `---------------> bot@kernelci.org

BigQuery stores the report dataset and serves it to Grafana dashboards hosted on staging.kernelci.org, as well as to any clients invoking kcidb-query or using the kcidb library to query the database.

Whenever a client submits reports, either via kcidb-submit or the kcidb library, they go to a Pub/Sub message queue topic named kcidb_new, then to the kcidb_load_queue "Cloud Function", which loads the data to the BigQuery dataset, and then pushes it to kcidb_loaded topic. The kcidb_load_queue function is triggered periodically via messages to kcidb_trigger topic, pushed there by the Cloud Scheduler service.

That topic is watched by kcidb_spool_notifications function, which picks up the data, generates report notifications, and stores them in a Firestore collection named notifications.

The last "Cloud Function", kcidb_send_notification, picks up the created notifications from the Firestore collection, and sends them out through GMail, using the bot@kernelci.org account, authenticating with the password stored in kcidb_smtp_password secret, within Secret Manager.

Setup

To setup and manage most of Google Cloud services you will need the gcloud tool, which is a part of Google Cloud SDK. You can install it and create a Google Cloud Project by following one of the official quickstart guides. The instructions below assume the created project ID is kernelci-production (yours likely won't be).

Authenticate the gcloud tool with your Google account:

gcloud auth login

Select the project you just created:

gcloud config set project kernelci-production

Create an administrative service account (kernelci-production-admin from here on):

gcloud iam service-accounts create kernelci-production-admin

Grant the administrative service account the project owner permissions:

gcloud projects add-iam-policy-binding kernelci-production \
       --member "serviceAccount:kernelci-production-admin@kernelci-production.iam.gserviceaccount.com" \
       --role "roles/owner"

Generate the account key file (kernelci-production-admin.json here):

gcloud iam service-accounts keys create kernelci-production-admin.json \
       --iam-account kernelci-production-admin@kernelci-production.iam.gserviceaccount.com

NOTE: This key allows anyone to do anything with the specified Google Cloud project, so keep it safe.

Select the account key for use with Google Cloud API (which kcidb uses):

export GOOGLE_APPLICATION_CREDENTIALS=`pwd`/kernelci-production-admin.json

Install kcidb as described above.

BigQuery

Create a BigQuery dataset (kernelci03 here):

bq mk kernelci03

Check it was created successfully:

bq ls

Initialize the dataset:

kcidb-db-init -d kernelci03

Pub/Sub

Enable the Pub/Sub API:

gcloud services enable pubsub.googleapis.com

Create the kernelci_trigger topic used to trigger the execution of the kernelci_load_queue function:

gcloud pubsub topics create kcidb_trigger

Create the kernelci_new topic:

kcidb-mq-publisher-init -p kernelci-production -t kernelci_new

Create the kernelci_new_load subscription used by the kernelci_load_queue function to receive the submissions from the kernelci_new topic:

kcidb-mq-subscriber-init -p kernelci-production \
                         -t kernelci_new \
                         -s kernelci_new_load

Set the kernelci_new_load ACK deadline to match the maximum runtime of the kernelci_load_queue function:

gcloud pubsub subscriptions update kernelci_new_load --ack-deadline=540

Create the kernelci_loaded topic:

kcidb-mq-publisher-init -p kernelci-production -t kernelci_loaded

Firestore

Create a native Firestore database by following the quickstart guide.

Enable the Firestore API:

gcloud services enable firestore.googleapis.com

Secret Manager

Enable the Secret Manager API:

gcloud services enable secretmanager.googleapis.com

Add the kcidb_smtp_password secret containing the GMail password (here PASSWORD) for bot@kernelci.org:

echo -n 'PASSWORD' | gcloud secrets create kcidb_smtp_password \
                            --replication-policy automatic \
                            --data-file=-

NOTE: For a more secure way specify a file with the secret to the --data-file option instead.

Cloud Functions

Requires all the services above setup first.

Enable the Cloud Functions API:

gcloud services enable cloudfunctions.googleapis.com

Allow the default Cloud Functions account access to the SMTP password:

gcloud secrets add-iam-policy-binding kcidb_smtp_password \
       --role roles/secretmanager.secretAccessor \
       --member serviceAccount:kernelci-production@appspot.gserviceaccount.com

Download and unpack, or clone the kcidb version being deployed, and change into the source directory. E.g.:

git clone https://github.com/kernelci/kcidb.git
cd kcidb

Make sure the functions' environment variables specify the setup correctly, amend if not:

cat main.env.yaml

Deploy the functions (do not allow unauthenticated invocations when prompted):

gcloud functions deploy kcidb_load_queue \
                        --runtime python37 \
                        --trigger-topic kernelci_trigger \
                        --env-vars-file main.env.yaml \
                        --timeout=540

gcloud functions deploy kcidb_spool_notifications \
                        --runtime python37 \
                        --trigger-topic kernelci_loaded \
                        --env-vars-file main.env.yaml \
                        --retry \
                        --timeout=540

gcloud functions deploy kcidb_send_notification \
                        --runtime python37 \
                        --trigger-event providers/cloud.firestore/eventTypes/document.create \
                        --trigger-resource 'projects/kernelci-production/databases/(default)/documents/notifications/{notification_id}' \
                        --env-vars-file main.env.yaml \
                        --retry \
                        --timeout=540

NOTE: If you get a 403 Access Denied response to the first gcloud functions deploy invocation, try again. It might be a Google infrastructure quirk and could work the second time.

Cloud Scheduler

Enable the Cloud Scheduler API:

gcloud services enable cloudscheduler.googleapis.com

Create a scheduler job triggering the kcidb_load_queue function every minute via the kernelci_trigger topic:

gcloud scheduler jobs create pubsub kernelci_trigger \
                                    --schedule='* * * * *' \
                                    --topic=kernelci_trigger \
                                    --message-body='{}'

Grafana

See kcidb-grafana README.md for setup instructions.

CI System Accounts

Each submitting/querying CI system needs to have a service account created, permissions assigned, and the account key generated. Below is an example for a CI system called "CKI" having account named "kernelci-production-ci-cki" created.

Create the service account:

gcloud iam service-accounts create kernelci-production-ci-cki

Grant the account query permissions for the BigQuery database:

gcloud projects add-iam-policy-binding kernelci-production \
       --member "serviceAccount:kernelci-production-ci-cki@kernelci-production.iam.gserviceaccount.com" \
       --role "roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding kernelci-production \
       --member "serviceAccount:kernelci-production-ci-cki@kernelci-production.iam.gserviceaccount.com" \
       --role "roles/bigquery.jobUser"

Grant the account permissions to submit to the kernelci_new Pub/Sub topic:

gcloud pubsub topics add-iam-policy-binding kernelci_new \
                     --member="serviceAccount:kernelci-production-ci-cki@kernelci-production.iam.gserviceaccount.com" \
                     --role=roles/pubsub.publisher

Generate the account key file (kernelci-production-ci-cki.json here) for use by the CI system:

gcloud iam service-accounts keys create kernelci-production-ci-cki.json \
       --iam-account kernelci-production-ci-cki@kernelci-production.iam.gserviceaccount.com

Upgrading

BigQuery

To upgrade the dataset schema, do the following.

  1. Authenticate to Google Cloud with the key file (~/.kernelci-bq.json here):

     gcloud auth activate-service-account --key-file ~/.kernelci-bq.json
    

    or login with your credentials (entered via a browser window):

     gcloud auth login
    
  2. Create a new dataset (kernelci02 in project kernelci here) with the new schema:

     bq mk --project_id=kernelci kernelci02
     # Using new-schema kcidb
     kcidb-db-init -d kernelci02
    
  3. Switch all data submitters to using new-schema kcidb and the newly-created dataset.

  4. Create a new dataset with the name of the old one (kernelci01 here), but with _archive suffix, using the old-schema kcidb:

     # Using old-schema kcidb
     kcidb-db-init -d kernelci01_archive
    
  5. Using BigQuery management console, shedule copying the old dataset to the created dataset. When that is done, remove the old dataset.

  6. Transfer data from the copy of the old dataset (named kernelci01_archive here) to the new dataset (named kernelci02 here) using old-schema kcidb-db-dump and new-schema kcidb-db-load.

     # Using old-schema kcidb
     kcidb-db-dump -d kernelci01_archive > kernelci01_archive.json
     # Using new-schema kcidb
     kcidb-db-load -d kernelci02 < kernelci01_archive.json
    

Developer guide

Hacking

If you want to hack on the source code, install the package in the editable mode with the -e/--editable option, and with "dev" extra included. E.g.:

pip3 install --user --editable '.[dev]'

The latter installs kcidb executables which use the modules from the source directory, and changes to them will be reflected immediately without the need to reinstall. It also installs extra development tools, such as flake8 and pylint.

Releasing

Before releasing make sure the README.md and SUBMISSION_HOWTO.md are up to date.

To make a release tag the release commit with v<NUMBER>, where <NUMBER> is the next release number, e.g. v3. The very next commit after the tag should update the version number in setup.py to be the next one. I.e. continuing the above example, it should be 4.

About

kernelci.org common database tools (proof-of-concept)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.7%
  • HTML 5.3%