Skip to content

Manas-Embold/code_clone_detection

 
 

Repository files navigation

Code Clone Detection

Setup

  • Dockerile <3

Files

A list of important notebooks are as follows:

  • notebooks/clone_detection_baseline.ipynb: Uses LSTM with code2vec(0.86)/fasttext(0.82)/random embeddings(0.83) for the task
  • notebooks/model_play-seasme.ipynb: Uses a Siamese Nework with base model of GrapConv+TopKPooling and node attributes assigned using code2vec(0.56)/fasttext(0.90)
  • notebooks/model_play.ipynb: Uses GrapConv+TopKPooling with code2vec(0.84)/fasttext(0.90).
  • notebooks/dgl_model_play.ipynb: Uses just GraphConv with code2vec(0.56)
  • [notebooks/data_preprocssing_main.ipynb]: For making trying different kinds of processing on AST network, making vocab, training fasttext embbedings. Other notebooks have experminents that we weren't able to execute succesfully due one or more errors.

A list of important code files:

  • src/code_parser.py: Code for parsing a string java code, making an AST followed by making a networkx graph and combining it.
  • src/dataset.py: Make several kind of torch_geometric dataset
  • src/data_prep.py: Data precrossing and data split script.

References

About

Code Clone Detection: MLN Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.1%
  • Python 1.9%