Skip to content

yuguang/reddit-comments

Repository files navigation

Redditor's Club

Redditor's Club is a data pipeline for analysis of Reddit trends. Reddit is an interest-based social media network popular among young males in North America.

Technology Stack

This project currently makes use of the following technologies:

  • Amazon DynamoDB
  • Apache Spark 1.6.1 with Hadoop 2.7
  • MySQL
  • AWS S3
  • AWS Redshift
  • Django 1.9.6 with the following frameworks: HighCharts, jQuery, Bootstrap

The figure below shows the flow of data through the pipeline:

pipeline

The data used in this project is the set of Reddit comments published and compiled by reddit user /u/Stuck_In_the_Matrix. The total size of all comments from October 2007 to December 2015 is greater than 1TB.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published