wbenica/csc466lab4
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Names: Sarah Bae, Wesley Benica Emails: shbae@calpoly.edu, wbenica@calpoly.edu Programming language: Python with libraries pandas, numpy, itertools, math, copy, and json Instructions to run: python3 hclustering.py <filepath to input csv> <optional: threshold number (max circumference of clusters)> hclustering.py: A program read and parse input files of datasets to use hierarchical clustering and output a dendrogram showing the hierarchical cluster tree in an output file "outputDendrogram.json". If a threshold for maximum circumference is given as a parameter, we also output the clusters from cutting the dendrogram at the threshold in "clusters.txt". Clusters are made by single link distance, and distance is computed with the euclidian formula. An optional last parameter after the threshold is a "c" for complete linkage instead of single link distance. kmeans.py: A program to read csv file dataset and output clusters and analysis based on kmeans clustering. Will plot 2- and 3-D datasets. usage: python3 kmeans.py <filepath to input csv> <k> [SSE stopping threshold] dbscan.py: A program to read csv file dataset and output clusters and analysis based on density-based scan clustering. Will plot 2- and 3-D datasets. usage: python3 dbscan.py <filepath to input csv> <epsilon> <min points> Additional files and programs: kmeans_tuning.py: contains functions to optimize k-values for a dataset and a function for comparing initial centroid selection methods and normalized vs. raw data. If run as a a program, it will output optimal k values for this assignment's datasets dbscan_tuning.py: contains functions to optimize epsilon and min_points for a dataset. If run as a program, it will output epsilon and min_points values for this assignment's datasets, though they will require hand-tuning. kmeans_results.py, dbscan_results.py: When run as programs, will perform the respective clustering algorithm on the assignment's datasets and analysis utils.py: Functions for euclidean distance (raw and normalized), plotting clusters/centroids, and evaluating clusters by distances from centroids and sum-squared-error. constants.py: constants relating to this assignment, including file paths, optimal k-values, epsilons, and min_points.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published