GitHub - randy-chng/demo_pyspark_streaming

Overview

This project demonstrates

streaming of tweets (twitter_app.py)
processing of streamed tweets using Spark (spark_app.py)
storing processed tweets into SQLite (database_app.py)

Installation (Docker)

Step 1

Spin up GCE instance (Container Optimized OS)

Step 2

SSH into instance and clone project

git clone https://github.com/randy-chng/demo_pyspark_streaming.git

Step 3

Build image and provide Twitter details [W], [X], [Y] and [Z]

cd demo_pyspark_streaming
docker image build --tag streaming_pipeline_i --build-arg access_token=[W] --build-arg access_secret=[X] --build-arg consumer_key=[Y] --build-arg consumer_secret=[Z] --file Dockerfile .

Step 4

Run created image

docker run --name streaming_pipeline_c --publish 5555:5555 -di streaming_pipeline_i

Step 5

Access created container

docker exec -it streaming_pipeline_c /bin/bash

Step 6

Run following commands to start pipeline

nohup python -u twitter_app.py > twitter_app_output.log 2>&1 &
nohup python -u spark_app.py > spark_app_output.log 2>&1 &

Query SQLite via Jupyter

For sample output, refer to notebook.ipynb

Step 1

Access created container

docker exec -it streaming_pipeline_c /bin/bash

Step 2

Run following commands to start jupyterlab

jupyter lab --ip=0.0.0.0 --port=5555 --allow-root

Step 3

Visit provided link using local web browser

http://127.0.0.1:5555/?token=[SOME RANDOM GENERATED TOKEN VALUE]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
database_app.py		database_app.py
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt
spark_app.py		spark_app.py
twitter_app.py		twitter_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

database_app.py

database_app.py

notebook.ipynb

notebook.ipynb

requirements.txt

requirements.txt

spark_app.py

spark_app.py

twitter_app.py

twitter_app.py

Repository files navigation

Overview

Installation (Docker)

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Query SQLite via Jupyter

Step 1

Step 2

Step 3

About

Releases

Packages

Languages

randy-chng/demo_pyspark_streaming

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation (Docker)

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Query SQLite via Jupyter

Step 1

Step 2

Step 3

About

Resources

Stars

Watchers

Forks

Languages