Implementation of Fake new detection system

Big data integration course

Contribution

Multiprocess crawler, retrieve multiple news.
The crawler contain the url retriever module, which retrieve all news source started from baomoi.com/covid-19. (done)
After that, multiple html source retriever module run in different process retrieve the html source in parallel. (done the singlethreaded, can be scaled to run faster)
Html source retriever module cut the html into url and sentences. (done)
The retriever module then send the {url, sentence} json to SparkStreaming. (done)
SparkStreaming process batch of data
- Map {url, sentence} to {url, sentence, flag} where flag=0 is no information, flag=1 is truth, flag=2 mean fake sentence (done)
- Filter url with sentence that in our domain
- Reduce count the number of no information, truth, fake sentences {url, sentences}
- Update the statistic to database per each batch

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
analyzer		analyzer
config		config
crawler		crawler
saver		saver
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
activate		activate
analyzed.csv		analyzed.csv
requirements.txt		requirements.txt
run_analyzer.py		run_analyzer.py
run_crawler.py		run_crawler.py
run_saver.py		run_saver.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analyzer

analyzer

config

config

crawler

crawler

saver

saver

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

activate

activate

analyzed.csv

analyzed.csv

requirements.txt

requirements.txt

run_analyzer.py

run_analyzer.py

run_crawler.py

run_crawler.py

run_saver.py

run_saver.py

util.py

util.py

Repository files navigation

Implementation of Fake new detection system

Big data integration course

Contribution

About

Releases

Packages

Languages

License

ltthacker/bdi_final

Folders and files

Latest commit

History

Repository files navigation

Implementation of Fake new detection system

Big data integration course

Contribution

About

Resources

License

Stars

Watchers

Forks

Languages