Skip to content

roysgitprojects/Unsupervised-Learning-Middle-Project

Repository files navigation

Unsupervised-Learning-Middle-Project

Middle project in Prof. Louzoun's Unsupervised Learning course.

Abstract

Three data sets were analyzed using five unsupervised learning methods. The first data set is of online shoppers purchasing intention. The second one represents a decade (1999-2008) of clinical care at 130 US hospitals of patients with diabetes. The third data set contains information on click-stream from an online store offering clothing for pregnant women. For each data set, the goal was to cluster the data, visualize the clustering results, compute how well each clustering method fits the external classification, determine which clustering algorithm is better and explain the reason for the difference between them. Out of the five algorithms tested, Hierarchical Complete with four clusters was the best algorithm for the data of online shoppers' intention and e-shop clothing. However, for the clinical data, K Means with three clusters provided the best results.

Data Sets

The data are too large to upload. They can be found here:

  1. Online Shoppers Purchasing Intention Dataset Data Set
  2. Diabetes 130-US hospitals for years 1999-2008 Data Set
  3. clickstream data for online shopping Data Set

In order to run the code, the data sets shall be downloaded and placed in a directory named 'dataset'.

Python Modules

The main modules used on this project are:

  • Sklearn
  • Matplotlib
  • Skfuzzy
  • Numpy
  • Pandas
  • Scipy
  • Yellowbrick

About

Middle project in Prof. Louzoun's Unsupervised Learning course.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages