This project aims at clustering tweets based on their word2vec embeddings using CUDA framework.
generateData - generate word2vec data for cuda hierarchical clustering code
hierarchicalGPU.cu - CUDA code, outputs hierarchical sequences
generatePlotReduceDendro - generates clusters from allotted hierarchical sequences based on threshold
generate_cluster_labels - generate cluster labels based on percentile and frequency
populateGraph - generate graph visualization for basic clusters
generateDendrogram - generate hierarchy for the clusters in graph
For full result, download : https://raw.githubusercontent.com/shikhar-b/hierarchical_gpu/master/dendrogram_with_labels.html
Clusters with labels : https://github.com/shikhar-b/hierarchical_gpu/blob/master/levelsclusters.txt
Logs (with performance and clustering results) : https://github.com/shikhar-b/hierarchical_gpu/blob/master/logs.txt