Skip to content

Using machine learning combine with attribute counting and structured based methods to obtain an accurate analysis of files for source code plagiarism

Notifications You must be signed in to change notification settings

BossCodersHQ/Machine-Learning-Source-Code-Plagiarism

Repository files navigation

Machine-Learning-Source-Code-Plagiarism

Using machine learning combined with attribute counting and structured based methods to obtain an accurate analysis of files for source code plagiarism Utilises the Rabin–Karp algorithm and AST's for improved performance.

front-page main-page-1 main-page-2
main-page-3 results-page-1

Getting the Data

  • The data to train this model was taken from the PAN 2014 dataset. This dataset is not included in this repository, but details around it can be found here
  • The actual data can be found here
  • You may have to request access to the data from the PAN organisers
  • If you're not able to acquire from here please contact me and I'll share the data I have with you

Running the Code

  • The entry point is gui.py
  • Please run these commands
# Make sure you're in the base directory first where the poetry.lock file is
poetry install
poetry shell
python scp/gui.py
  • Note that if you're running this via ssh/wsl you will need to do extra steps to setup the GUI to display properly

About

Using machine learning combine with attribute counting and structured based methods to obtain an accurate analysis of files for source code plagiarism

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published