BDEProject

Students

AGOUGILE RAMZI
BOUTAKHOT Jonathan
DO BARREIRO Jordan

This project is an engineering student's project. It's a big data ecosystem project. The goal of this project is to use big data tools to create and train a prediction model. In our case, we chose Twitter Sentiment Analysis. During this project, we used some tools like Kafka and Spark.

What to do

For this project, you will need to install and setup some tools to use all of the aspects of the project. First of all we were on a Linux operating system, but it might also work easily on MAC.

The Kafka installation folder is already on the github, you just have to clone the repository.
To run Kafka, first, we need to start the zookeeper and then Kafka server :
- Run Zookeeper : in the Kafka installation folder, use the following command to start the zookeeper server
  
  ./bin/zookeeper-server-start.sh config/zookeeper.properties
  
  Note : You can check the zookeeper.properties file if you want to change the client port (2181 by default)
- Start Kafka server: Use the following command to run Kafka
  
  ./bin/kafka-server-start.sh config/server.properties
  
  Note : You can also check the server.properties file if you want to change the zookeeper to connect, or the directory of the logs files etc...
Now to access Twitter streaming APIs, we need to sign in for Twitter developer account and get the following OAuth authentification details :
- CustomerKey
- CustomerSecret
- AccessToken
- AccessTokenSecret

We let our authentification here for you M.LEONARD but we tend to delete it after your correction

Now you can already see your streaming flow with the following command (still on the kafka installation folder). You should see tweets displaying on your console now:

python twitter_streaming.py

Note : You will maybe need to install some packages depending of what you already have. If you get error with "No module named kafka" for example, you just have to pip install kafka

Here the topic is twitterstream, if you want to change its name you can change it in the [twitter_streaming.py](kafka-2.7.0-src/twitter_streaming.py). You can also change the subject of the search, we used "Vaccine" but it can be anything else.
- Now that Kafka streaming is working, we have to make Spark connecting to this stream :
- To that you just have the execute the following command (this time in your base project directory):
  
  python3 spark_streaming.py
  
  Note : You will maybe need to install some packages depending of what you already have. If you get error with "No module named tweepy" for example, you just have to pip3 install tweepy
Now you should see tweets and just below the beginning of the tweet (often the user's name) and its prediction. For the prediction, you should know that 0 = negative sentiment, 2 = neutral sentiment, 4 = positive sentiment. All the tweets and predictions are saved in the [stream_data](stream_data) folder.

More

In the github you also have the model that we trained in the [train_model.py](train_model.py).

The data we downloaded from kaggle, here is the link https://www.kaggle.com/kazanova/sentiment140

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
__pycache__		__pycache__
kafka-2.7.0-src		kafka-2.7.0-src
stream_data/2020-12-30		stream_data/2020-12-30
tmp		tmp
README.md		README.md
kafka		kafka
predictor.py		predictor.py
spark_streaming.py		spark_streaming.py
train_model.py		train_model.py
training.1600000.processed.noemoticon.csv.zip		training.1600000.processed.noemoticon.csv.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

kafka-2.7.0-src

kafka-2.7.0-src

stream_data/2020-12-30

stream_data/2020-12-30

tmp

tmp

README.md

README.md

kafka

kafka

predictor.py

predictor.py

spark_streaming.py

spark_streaming.py

train_model.py

train_model.py

training.1600000.processed.noemoticon.csv.zip

training.1600000.processed.noemoticon.csv.zip

Repository files navigation

BDEProject

Students

What to do

More

About

Releases

Packages

Contributors 3

Languages

RamziAgou/BDAProject

Folders and files

Latest commit

History

Repository files navigation

BDEProject

Students

What to do

More

About

Resources

Stars

Watchers

Forks

Languages