Machine-Learning-Source-Code-Plagiarism

Using machine learning combined with attribute counting and structured based methods to obtain an accurate analysis of files for source code plagiarism Utilises the Rabin–Karp algorithm and AST's for improved performance.

Getting the Data

The data to train this model was taken from the PAN 2014 dataset. This dataset is not included in this repository, but details around it can be found here
The actual data can be found here
You may have to request access to the data from the PAN organisers
If you're not able to acquire from here please contact me and I'll share the data I have with you

Running the Code

The entry point is gui.py
Please run these commands

# Make sure you're in the base directory first where the poetry.lock file is
poetry install
poetry shell
python scp/gui.py

Note that if you're running this via ssh/wsl you will need to do extra steps to setup the GUI to display properly

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
judgements		judgements
materialize		materialize
notebooks		notebooks
scp		scp
token_files		token_files
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

images

images

judgements

judgements

materialize

materialize

notebooks

notebooks

scp

scp

token_files

token_files

.gitignore

.gitignore

README.md

README.md

poetry.lock

poetry.lock

poetry.toml

poetry.toml

pyproject.toml

pyproject.toml

Repository files navigation

Machine-Learning-Source-Code-Plagiarism

Getting the Data

Running the Code

About

Releases

Packages

Languages

BossCodersHQ/Machine-Learning-Source-Code-Plagiarism

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-Source-Code-Plagiarism

Getting the Data

Running the Code

About

Resources

Stars

Watchers

Forks

Languages