GitHub - saivivek15/Event-Detection: Victim entities are detected from news events using Python, Apache Spark, Apache Kafka, CoreNLP, SEMAFOR, MongoDB

Execution:

- python parse_json.py
# output dir: parsed_files 
# Training Dataset files are parsed

- python semaforAll.py 
# output dir: output_files 
# Semafor Tool is executed on all training files

- python parse_semafor.py 
# output dir: semafor_filter_files 
# Filter the frames having Victim entities

- python feature_matrix.py 
# output file: training_data.txt
# Generate the feature matrix from the filtered news feeds

Start Kafka and ZooKeeper:
- bin/zookeeper-server-start.sh config/zookeeper.properties
- bin/kafka-server-start.sh config/server.properties	

- python scraper-master/scraper.py 
# Kafka producer client is started

- spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.0 consumer.py localhost:2181 news
# Kafka Consumer is started and semafor frames are generated for real time rss feeds

- python parse_semafor.py
# change the input directory to output_test_files
# parse semafor files
# generate feature_matrix for test feeds

- python tfidf.py 
# output dir: tempFile 
# predict with the built models

Notes:

Modified the Scraper referenced from below by adding Kafka Producer Client and Connecting it to MongoDB
The input_files contain only a sample of training data set for referecne. Please contact me at sai.vivek15@gmail.com, if interested in discussing more about the project

References:

-> Scraper - https://github.com/openeventdata/scraper

-> Semafor Tool(CMU)- https://github.com/Noahs-ARK/semafor (http://www.ark.cs.cmu.edu/SEMAFOR)

-> Dataset: http://eventdata.parusanalytics.com/data.dir/atrocities.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input_files

input_files

output_files

output_files

output_test_files

output_test_files

parsed_files

parsed_files

scraper

scraper

semafor_filter_files

semafor_filter_files

tempFile

tempFile

.gitignore

.gitignore

README.md

README.md

feature_matrix.py

feature_matrix.py

parse_json.py

parse_json.py

parse_semafor.py

parse_semafor.py

semaforAll.py

semaforAll.py

tfidf.py

tfidf.py

training_data.txt

training_data.txt

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
input_files		input_files
output_files		output_files
output_test_files		output_test_files
parsed_files		parsed_files
scraper		scraper
semafor_filter_files		semafor_filter_files
tempFile		tempFile
.gitignore		.gitignore
README.md		README.md
feature_matrix.py		feature_matrix.py
parse_json.py		parse_json.py
parse_semafor.py		parse_semafor.py
semaforAll.py		semaforAll.py
tfidf.py		tfidf.py
training_data.txt		training_data.txt

saivivek15/Event-Detection

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages