Real-time Twitter Source Analysis

It's better when you run with Linux (otherwise we have to start hbase manually. See start-hbase.sh)
Change Twitter Tokens on reader.py and mysql credentials on processor.py
Don't forget to config your /etc/mysql/my.cnf (MySql) with bind-address=0.0.0.0 to allow external connections from docker
Change (IPDODOCKER var) and Copy mysqlcatalog.properties to /etc/presto/catalog (inside presto container) to make the queries from presto
Run mysql container: docker run --name some-mysql -p 3306:3306 -p 8080:8080 -v C:\Users\U003675\Desktop\DataEngineerTweetProcessing\mysql:/etc/mysql/conf.d -e MYSQL_ROOT_PASSWORD=123db4 -d mysql:latest
If you never run this project before type export KAFKA_TOPICS="TweetsTopic:1:3" it will create the Kafka Topic when the image build up (or SET instead export if you're Windows User)
Run start_all.sh (it will start HBASE and KAFKA docker containers and services)
Run python reader.py script to read the tweets and produce them using Kafka Producer.
Run spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 processor.py script to read from Kafka and process the messages
Run python webapp.py script to access the page that visualizes the data

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
hbase		hbase
mysql		mysql
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
data engineer arch.png		data engineer arch.png
docker-compose.yml		docker-compose.yml
mysqlcatalog.properties		mysqlcatalog.properties
processor.py		processor.py
reader.py		reader.py
reader_csv.py		reader_csv.py
requirements.txt		requirements.txt
start-all.sh		start-all.sh
tweets.csv		tweets.csv
webapp.py		webapp.py

guilmarques/datatweet