Sapphirine/Santander-Customer-Satisfaction
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Here are the steps to run the program: During running of the program, you might be asked to install packages including but not limited to sk-learn, spark-sklearn package, pandas, metrics. 1. Go to the folder path project_folder/code 2. Move the data to project_folder /code/input 3. Extract features by running extract_features.sh 4. Confirm the features are saved in folder project_folder /code/features 5. Run train_models.py to train different models using different combo of classifiers and features 6 Run models_combine.r to combines results from different modesl in a way that optimizes the area under the ROC curve (AUC). 7. Find the results in folder project/code/submission The following content expains what each source code do: main interface: models_combine.r: combines results from different modesl in a way that optimizes the area under the ROC curve (AUC). train_models.py: train different models using different combo of classifiers and features basic source code: santander_preprocess.py: preprocess the raw data tsne_features.py: produce tsne features pca_features.py: produce two PCA features (# of pca can be edited) kmeans_features.py: produce kmeans features with 2-10 clusters rgf.py: classification algorithm regularized greedy forest
About
projectID:201712-27 Team members:jc4805, jq2261, ll2873
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published