GitHub - stjordanis/commit-message-generation

This repository contains the code needed in order to replicate the results obtained in the "A Sketch-Based Neural Model for Generating Commit Messages from Diffs" paper.

Table of content

Datasets cleaned (datasets_cleaned)
Datasets original (datasets_original)
Models
NNGen
Predictions
seq2seq
utils

Datasets cleaned (datasets_cleaned)

Contains the cleaned dataset (Liu et al. dataset) and the datasets derived from the cleaned dataset.

all - contains the cleaned dataset
gitignore - contains the dataset with gitignore files
gradle - contains the dataset with gradle files
java - contains the dataset with java files
java_template - contains the dataset with java template files
md - contains the dataset with gitignore files
others_v1 - contains the dataset with files which are not gitrepo, gradle, java, txt and xml
others_v2 - contains the dataset with files which are not gitrepo, gitignore, gradle, java, md, properties, txt, xml and yml
properties - contains the dataset with properties files
txt - contains the dataset with txt files
xml - contains the dataset with xml files
yml - contains the dataset with yml files

Datasets original (datasets_original)

Contains the original dataset (Jiang et al. dataset) and the datasets derived from the cleaned dataset.

all - contains the cleaned dataset
gitignore - contains the dataset with gitignore files
gitrepo - contains the dataset with gitrepo files
gradle - contains the dataset with gradle files
java - contains the dataset with java files
java_template - contains the dataset with java template files
md - contains the dataset with gitignore files
others_v1 - contains the dataset with files which are not gitrepo, gradle, java, txt and xml
others_v2 - contains the dataset with files which are not gitrepo, gitignore, gradle, java, md, properties, txt, xml and yml
properties - contains the dataset with properties files
txt - contains the dataset with txt files
xml - contains the dataset with xml files
yml - contains the dataset with yml files

distributions_plot.py - Plots the words distributions on the diffs and messages for the gitrepo, java and xml files.

Models

nmt2.yml - contains the model with two layer
nmt4.yml - contains the model with four layers and residual connections
nmt8.yml - contains the model with eight layers and residual conntections
predict-beam5.sh - runs prediction with beam search with width 5
predict-beam10-pen1-replace-unk.sh - runs prediction with beam search with width 10 and length penalty 1 and copying mechanism
predict-beam10-pen1.sh - runs prediction with beam search with width 10 and length penalty 1
predict-beam10-replace-unk.sh - runs prediction with beam 10 and copying mechanism
predict-beam10.sh - runs prediction with beam search with width 10
predict-normal.sh - runs prediction withour beam search and copying mechanism
predict-replace-unk.sh - runs prediction with copying mechanism
predict.sh - runs all the predictions scripts
text_metrics.yml - contains the metrics for training
train_seq2seq.yml - sets the training bucket sizes
train.sh - runs the training

NNGen

Our implementation of the NNGen algorithm introduced by Liu et al.

main.py - contains the implementation
run.sh - runs the implementation on all datasets in the datasets_original

Predictions

The predictions folder contains two folders (original, cleaned) both of them containing three files:

nmt8-ft.txt - predictions of the nmt8-ft ensemble
nmt8-ft-jt.txt - predictions of the nmt8-ft-jt ensemble
target_for_nmt_ensemble.msg - target messages reordered based on the file type

seq2seq

Is a modified version of Google's seq2seq which is able to support beam search with copying mechanism.

utils

calculate_bleu.sh - calculates the essemble bleu score based on each dataset predictions
create_dataset_by_file_type.py - generates the gitignore, gitrepo, gradle, java, md, properties, txt, xml and yml datasets
create_dataset_without_file_types.py - generates the others_v1 and others_v2 datasets
find_top_k_file_types.py - calculates the top 10 file types found in the datasets
generate_vocabs.py - generates the reduced vocabulary for each file type
prepare_diffs.py - generates the template java diff and saves the constants, classes, functions, variables tokens in a mapper
prepare_msgs.py - replaces tokens in the messages based on the tokens in found in the mapper
replace.py - replaces the tokens in the predicted java template messages with the tokens found in the mapper
utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models

Models

NNGen

NNGen

Predictions

Predictions

datasets_cleaned

datasets_cleaned

datasets_original

datasets_original

seq2seq

seq2seq

utils

utils

.gitignore

.gitignore

Readme.md

Readme.md

Repository files navigation

Table of content

Datasets cleaned (datasets_cleaned)

Datasets original (datasets_original)

Models

NNGen

Predictions

seq2seq

utils

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Models		Models
NNGen		NNGen
Predictions		Predictions
datasets_cleaned		datasets_cleaned
datasets_original		datasets_original
seq2seq		seq2seq
utils		utils
.gitignore		.gitignore
Readme.md		Readme.md

stjordanis/commit-message-generation

Folders and files

Latest commit

History

Repository files navigation

Table of content

Datasets cleaned (datasets_cleaned)

Datasets original (datasets_original)

Models

NNGen

Predictions

seq2seq

utils

About

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Languages