Skip to content

stjordanis/commit-message-generation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the code needed in order to replicate the results obtained in the "A Sketch-Based Neural Model for Generating Commit Messages from Diffs" paper.

Table of content

Datasets cleaned (datasets_cleaned)

Contains the cleaned dataset (Liu et al. dataset) and the datasets derived from the cleaned dataset.

  1. all - contains the cleaned dataset
  2. gitignore - contains the dataset with gitignore files
  3. gradle - contains the dataset with gradle files
  4. java - contains the dataset with java files
  5. java_template - contains the dataset with java template files
  6. md - contains the dataset with gitignore files
  7. others_v1 - contains the dataset with files which are not gitrepo, gradle, java, txt and xml
  8. others_v2 - contains the dataset with files which are not gitrepo, gitignore, gradle, java, md, properties, txt, xml and yml
  9. properties - contains the dataset with properties files
  10. txt - contains the dataset with txt files
  11. xml - contains the dataset with xml files
  12. yml - contains the dataset with yml files

Datasets original (datasets_original)

Contains the original dataset (Jiang et al. dataset) and the datasets derived from the cleaned dataset.

  1. all - contains the cleaned dataset
  2. gitignore - contains the dataset with gitignore files
  3. gitrepo - contains the dataset with gitrepo files
  4. gradle - contains the dataset with gradle files
  5. java - contains the dataset with java files
  6. java_template - contains the dataset with java template files
  7. md - contains the dataset with gitignore files
  8. others_v1 - contains the dataset with files which are not gitrepo, gradle, java, txt and xml
  9. others_v2 - contains the dataset with files which are not gitrepo, gitignore, gradle, java, md, properties, txt, xml and yml
  10. properties - contains the dataset with properties files
  11. txt - contains the dataset with txt files
  12. xml - contains the dataset with xml files
  13. yml - contains the dataset with yml files

distributions_plot.py - Plots the words distributions on the diffs and messages for the gitrepo, java and xml files.

Models

NNGen

Our implementation of the NNGen algorithm introduced by Liu et al.

  • main.py - contains the implementation
  • run.sh - runs the implementation on all datasets in the datasets_original

Predictions

The predictions folder contains two folders (original, cleaned) both of them containing three files:

  • nmt8-ft.txt - predictions of the nmt8-ft ensemble
  • nmt8-ft-jt.txt - predictions of the nmt8-ft-jt ensemble
  • target_for_nmt_ensemble.msg - target messages reordered based on the file type

seq2seq

Is a modified version of Google's seq2seq which is able to support beam search with copying mechanism.

utils

About

No description, website, or topics provided.

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.9%
  • Shell 5.9%
  • Perl 1.1%
  • Other 1.1%