Skip to content

masies/CRA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

USING PRE-TRAINED MODELS TO PARTIALLY AUTOMATE CODE REVIEW ACTIVITIES

In this work, we investigate the capabilities of Generative Pre-trained Transformers, T5(Text-To-Text Transfer Transformer) to support code review.

How to replicate our results

Step 1 - Set up a GCS Bucket

This GCS Bucket will hold all the data needed for Setting up, pre-training, fine-tuning, and testing our T5 model. To Set up a new GCS Bucket, please follow the original guide provided by Google.

Step 2 - Get the datasets and all our utilities

You need to have this folder on your GSC bucket. It will contain all of our data and some utilities to replicate our results.

In particular you will have:

  • Pre-Training dataset Obtained by mining Stack Overflow and CodeSearchNet data.
  • Fine-Tuning dataset We will fine-tune our T5 small model on different datasets obtained by mining code review data from Gerrit and GitHub repositories.
    • Fine-Tuning dataset v1 (Small) Same dataset used by Tufano et al., not abstracted code and raw comments.
    • Fine-Tuning dataset v2 (Small) Same dataset used by Tufano et al., not abstracted code and cleaned comments.
    • Fine-Tuning dataset (Large) Our new Large dataset

(optional) Step 2.5 - Process the raw datasets

All our datasets are already processed, and it's all set up to start pre-training and fine-tuning the models.

However, if you want to replicate our pre-processing steps, you just need to follow this Colab notebook. Here we will clean our raw datasets and train the Sentencepiece model to accommodate the expanded vocabulary given by the pre-training dataset.

Step 3 - Pre-Training and Fine-Tuning

To pre-train and then fine-tune T5, please follow the colab notebooks provided:

Step 4 - Generate the predictions

We generate results on different beams converting the model in PyTorch; if you want to generate predictions using a beam of 1, you can directly use the fine-tuning colab notebook linked above, once the model is fine-tuned, you can generate custom prediction. To convert the model use This Colab noteebook where you also have all the functionalities to compute perfect predictions, almost perfect predictions, codeBleu and BLEU.

here you can see Our results

Releases

No releases published

Packages

No packages published