My homework and project code for CSCI 5510 Big Data Analytics.
- MapReduce
- Apriori algorithm
- Locality Sensitive Hashing
-
Mining Data Streams
- DGIM algorithm to count ones in a bit stream.
- Flajolet-Martin Algorithm is used to estimate the number of distinct elements by hashing the elements of the universal set to a bit-string that is sufficiently long.
-
Scalable Clustering
- Utilize MapReduce to accelerate k-means algorithm
- BFR algorithm
- Principal component analysis (PCA)
- Probabilistic Matrix Factorization (PMF)
- PageRank algorithms
- Soft Margin SVM using stochastic gradient decent (SGD) method
- Analysis of Massive Graph
- Spectral Clustering method