GitHub - bcipolli/mls-aihack-mother-repo: Hackathon project for Machine Learning Society in San Diego

Goal:

Match news articles
Find common text in the articles, and remove
Cluster "residual words" via topic modeling
Apply "topics" to all residuals from a news source, to get their "bias"

Setup & Installation:

pip install -r requirements.txt
python -c "import nltk; nltk.download()"

Then choose to install the "popular" collection.

NLP Demo

This demo shows stemming, lemmatizing, and word counting (including tf-idf)

python nlp_demo.py

Downloading data

Run

python registry_data.py

You can tweak parameters, such as the min # articles per event or api key, within the script.

Modeling

python main.py

Viewing Results

At the end of the modeling process a 3D graph will be generated for visualization purposes.

Results

Found common words across news articles within an event.
When clustering “residual” words via LDA, a lot of emotion words appear
Sources did not separate by topic
- MAYBE: sources use emotional words to describe the news; not consistent by event.

Future Directions

Model new source bias within a particular topic
Boost / attenuate emotion words via sentiment analysis
See if there’s bias by author
Include & apply fake news dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.idea		.idea
csv		csv
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
main.py		main.py
nlp_demo.py		nlp_demo.py
plotting.py		plotting.py
raw_dataframe.csv		raw_dataframe.csv
registry_data.py		registry_data.py
requirements.txt		requirements.txt
source-scatter.html		source-scatter.html
topics-scatter.html		topics-scatter.html
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

csv

csv

notebooks

notebooks

.gitignore

.gitignore

README.md

README.md

main.py

main.py

nlp_demo.py

nlp_demo.py

plotting.py

plotting.py

raw_dataframe.csv

raw_dataframe.csv

registry_data.py

registry_data.py

requirements.txt

requirements.txt

source-scatter.html

source-scatter.html

topics-scatter.html

topics-scatter.html

tox.ini

tox.ini

Repository files navigation

About

Releases

Packages

Contributors 3

Languages

bcipolli/mls-aihack-mother-repo

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages