ADBI TOP K

Please follow below steps in order to run the application.

Installing Project Dependencies

Create CSC591_ADBI_v3 virtual machine in VCL
Log in to VM
Create an apps folder in the root folder of VCL machine: mkdir /apps
Download project source code and unzip it to /apps
cd /apps/adbi-top-k
pip3 install -r requirements.txt

Project Configuration

We have added Twitter API keys to config.ini. In case these keys do not work for you. Feel free to update them with your own keys.

Installing and Starting ElasticSearch on VCL

Please run the following commands to install and start ElasticSearch on VCL. For convenience these commands have been captured in a script called scripts/install.elastic.sh

sudo apt-get --yes --assume-yes install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
sudo apt-get --yes --assume-yes update && sudo apt-get --yes --assume-yes --allow-unauthenticated install elasticsearch
sudo systemctl start elasticsearch.service

Starting Kafka

Start Zookeeper by running $KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties
Start Kafka by executing $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties

Note: for convenience these steps have been put in a script file scripts/start.kafka.sh which will launch Kafka in background and redirect their output to log files.

Starting Application

Once you have started ElasticSearch and Kafka. Please run the following command to start ADBI Top K application. python3 web_server.py

Collecting Data

Once ADBI Top K server is started, you will need to start ingestion and indexing of data. We have implemented REST API to be able to start/stop these processes. By default, the web server will start on port 5000. You will need to make sure that this port is forwarded to your local machine in order to access the web server. Please follow one of the following guides to expose the ports

Note: We were asked to forward ports for an assignment in this course and hoping that TAs and instructor can do the same.

Once you have expose port 5000, please open the following page http://localhost:5000/api

Below are steps that will allow you to start ingestion and indexing

Click on heavyhitters
Select heavyhitters/ingestion/start
Click Try It Out
Click Execute
Select heavyhitters/indexing/start
Click Try It Out
Click Execute

If everything is running fine, you should see ElasticSearch indexing logs saying that data is being posted one in every few milliseconds. This log will not contain messages themselves unless you increase log level in logging.conf to DEBUG

Please let the application ingest and index data for at least five minutes before quering for top K.

Executing Queries

We have developed a GUI client for our application. You may access it via http://localhost:5000/ The interface should be intuitive. There are few gatchas though.

Timestamp can be relative (now, now-1d/d, etc) or
Timestamp need to follow ISO 8601 format
- Please make sure to remove timezone from the timestamp
- This is the last five characters
- For example: 2019-04-19T12:02:23 instead of 2019-04-19T12:02:23-0700
- All timestamps in the system are in UTC
Algorithm takes from 5 to 30 minutes to execute depending on data set size
- This can be improved by adding more Spark worker nodes

Once you submit a query, you will be able to see Spark logs in the shell where you launched the server. Please watch our recorded demo for more details.

Demo video

Please see the live demo video on this link

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
algorithms		algorithms
converters		converters
indexing		indexing
ingestion		ingestion
ioc		ioc
queries		queries
scripts		scripts
templates		templates
web_api		web_api
.gitignore		.gitignore
README.md		README.md
config.ini		config.ini
elasticsearch-hadoop-6.7.1.jar		elasticsearch-hadoop-6.7.1.jar
logging.conf		logging.conf
requirements.txt		requirements.txt
web_server.py		web_server.py

justin830827/Twitter-Trend-Analysis

Folders and files

Latest commit

History

Repository files navigation

ADBI TOP K

Installing Project Dependencies

Project Configuration

Installing and Starting ElasticSearch on VCL

Starting Kafka

Starting Application

Collecting Data

Executing Queries

Demo video

About

Topics

Resources

Stars

Watchers

Forks

Languages