Skip to content

samuelkahn/Datascience205

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#Datascience205 --- Introduction Information Retrieval

##Coursework for information retrieval including parts of the Final Project

###HW1 --- Process XML Data ###HW2 --- SQL Queries ###HW3 --- Store Twitter Stream in MongoDB and plot top 50 most common words in tweets... No stopwords used. ###HW4 --- Simple MapReduce Code for calculating top users with more than 2 posts ###HW5 --- Pig Script for processing a log file

###Final-Project --- MrJob code to process ~50 GB of Census data on AWS S3 using EMR across 15 EC2 Instances

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published