Notice: This is an early stage of work and there will be more updates. Stay stuned!
Doko - A smart tour guider for world-traveler. Explore and travel without planning. This is a sample of work as a result of 14-day work for a capstone project of data science immersive program at Galvanize.
The ultimate goal is to create a web app that helps people explore new places. I apply unsupervised learning models including K-means (Clustering), to segment user data into similar groups and build a recommendation of business ids. The data set is taken from Yelp dataset challenge under the condition of Academic/Education purpose.
This is just a simple demo of Kmeans clustering to run on a local machine based on 1000 users. I am testing some more models on Spark with AWS with amount of data arround 2 TB. More work will be updated later.
My next plan is apply Topological Data Analysis (Computational Topology) by implementing them on Spark to study the structure of data shape and generating a better similar matrix from a cluster of network. It is one of very promising solutions to deal with very high dimensionality and complex data, however, still lacks of a lot of work on this field.
I would like to send my thank to Galvanize for my time at data immersive program which gives me a ton of experience in my journey of seeking for knowledge.
- “NEO-PI-R - Manual.” Accessed October 20, 2016. http://www.unifr.ch/ztd/HTS/inftest/WEB-Informationssystem/en/4en001/d590668ef5a34f17908121d3edf2d1dc/hb.htm
- Inc, 2016 Yelp –. ‘Yelp Dataset Challenge’. 2004. Accessed October 20, 2016. https://www.yelp.com/dataset_challenge
- Wikipedia. Wikimedia Foundation, 2016. s.v ‘Maven’. Accessed October 20, 2016. https://en.wikipedia.org/wiki/Maven.
- Wikipedia. Wikimedia Foundation, 2016. s.v ‘Topological data analysis’. Accessed October 20, 2016. https://en.wikipedia.org/wiki/Topological_data_analysis