Data Challenge1

Use Case description

Build a streaming application that reads tweets (via Twitter API) and calculates the top hashtags used, distributed by the following aspects:

- Language  
- Date  
- Source (e.g. Twitter from Iphone)

High Level Architechture

Environment Setup

Spin off an EC2 instance and deploy Minikube.
Deploy Apache Nifi Helm Chart.

helm repo add cetic https://cetic.github.io/helm-charts
helm install tweets cetic/nifi

Deploy Apache Kafka Helm Chart.

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install tweets-kafka bitnami/kafka --set zookeeper.enabled=false,externalZookeeper.servers=tweets-zookeeper:2181

Deploy Cassandra Helm Chart.

helm install tweets-db --set dbUser.user=admin,dbUser.password=<password> bitnami/cassandra

Upload Nifi template via UI.
Insert Tweeter API tokens in processor GetTwitter.

Run the Application

Start all processors in Apache Nifi.
Browse to:
```
http://<ec2-public-ip>:8081/nifi
```

Deploy the count_hashtags container in minikube:

sudo kubectl run count-hashtags --image gcr.io/pmoraesm/count_hashtags:0.4

Docker Images

The docker images used in this project are available at https://gcr.io/pmoraesm. Two images are available:

- pyspark: A pyspark interactive environment, used for testing purposes 
- count_hashtags : The spark application that processes counts the hashtags and saves it to the database, starts processing and saving results to Cassandra automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docker-images		docker-images
nifi		nifi
sample		sample
README.md		README.md
high_level_arch.png		high_level_arch.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker-images

docker-images

nifi

nifi

sample

sample

README.md

README.md

high_level_arch.png

high_level_arch.png

Repository files navigation

Data Challenge1

Use Case description

High Level Architechture

Environment Setup

Run the Application

Docker Images

About

Releases 2

Packages

Languages

pmoraesm/tweet-processing

Folders and files

Latest commit

History

Repository files navigation

Data Challenge1

Use Case description

High Level Architechture

Environment Setup

Run the Application

Docker Images

About

Resources

Stars

Watchers

Forks

Languages