data-mining-proj

Make sure you have Anaconda (latest) installed
For existing user, run conda update conda, then conda update --all to make sure the latest packages are installed
On your conda prompt, type: conda create --name mining --file dependencies.txt . If you desire to run python3.6, use: conda create --name mining python=3.6 --file dependencies.txt
Navigate to the project directory, and activate the environment For Windows: activate mining For Mac/Linux: source activate mining
Run python main.py to run all classifiers. Results are printed onto terminals and graphs are produced.

In the main code (ie main.py) there is a sandbox area for you to write the code that does your part.
After writing that, copy and paste the code as a function into your own python module.
After that, in main.py, import the python module function and store the output in a variable so that the next person down the pipeline can easily access your data if needed.
If you need an example, you can look at the importcsv.py and processdata.py packages, and how I implemented those functions as a one liner in the main.py code
Pull before pushing, or create your own branch to ensure no code clashes exist! It is a pain in the ass to resolve merge errors.
If you ever import a new dependency, remember to import it using conda e.g. conda install PACKAGE_NAME. After that, update the dependency using conda list --explicit > dependencies.txt
If there is a dependency change, key in the command conda install --yes --file dependencies.txt while in the mining environment

Have a good time :)

For each classifier you are creating, e.g. svm, random forest etc., please create a predict function that takes in testX and returns the corresponding predictions
For example, create a function bayesPredictions = bayesian.naiveBayes(testX, testY, trainX, trainY) where bayesPredictions will represent the classifications of testX.
Place this function in main.py
You can look at the current code in main.py to get an idea

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.vscode		.vscode
__pycache__		__pycache__
figures		figures
nnmodels		nnmodels
svmModels		svmModels
.DS_Store		.DS_Store
.gitignore		.gitignore
DBSCAN.py		DBSCAN.py
README.md		README.md
arm.py		arm.py
bayesian.py		bayesian.py
checkpoint		checkpoint
cleveland.csv		cleveland.csv
decisiontree.py		decisiontree.py
dependencies.txt		dependencies.txt
fptree.py		fptree.py
importcsv.py		importcsv.py
kmeans.py		kmeans.py
long_beach.csv		long_beach.csv
main.py		main.py
nn.py		nn.py
nnvalidator.py		nnvalidator.py
preprocessing.py		preprocessing.py
processResults.py		processResults.py
processdata.py		processdata.py
processdata.pyc		processdata.pyc
svm.pickle		svm.pickle
svm.py		svm.py
svmresults.csv		svmresults.csv
switzerland.csv		switzerland.csv
visualise.py		visualise.py

dschuan/data-mining-proj