project7_nlp_TopicModeling

Team member names:

Saba Yahyaa
Mikael Dominguez
Adam Flasse

project7_nlp_TopicModeling

nlp_Topic Modeling is a NLP application that uses LDA-MALLET (Latent Dirichlet allocation, Topic Modeling) and Xgboost to classify the topics of newspapers.

Features:

Dealing with news_data.json under the data folder.

Preprocessed each text (document) using preprocessed_text.py (text cleaning)
Create X (features) using Creating_X_TFIDF.py (applying TFIDF Vectorizing on document to create numerical features)
Create y (label) for each document using LDA-MALLET. Each y is the dominated topic.
Split the data (X and y) and train Xgboost.
Use Xgboost to find the label (dominated topic) for a new document.
Use LDA-MALLET to find the topics for a new document.

We specified the following 17 topics: 0. AI in fake news (Social Media Marketing)

AI in Human Discrimination
AI in Material and Energy research
AI in Voice Assistant
AI in Business
AI in AI Autonomous vehicle
AI in Hadware (Chip)
AI in Human question/problem answer
AI in Image (NN)
AI in Arts
AI in Robots
AI in Medecine
AI in Legal Service
AI in Industry (sillicon valley)
AI in Academic Research
AI in Game
AI in DNN and Machine learning application

highlights:

We create [https://ainewspaper.herokuapp.com/] (# newslAItter) app. Each time you select some topics from available AI topics, you enter you email address. All the selected topics will be sent to your email.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
templates		templates
Creating_X_TFIDF.py		Creating_X_TFIDF.py
Dockerfile		Dockerfile
LDA_model.py		LDA_model.py
LDA_model_Sklearn.py		LDA_model_Sklearn.py
LdaMallet_Optimat_no_Topics.py		LdaMallet_Optimat_no_Topics.py
LdaMallet_TrainTest_Data.py		LdaMallet_TrainTest_Data.py
LdaMallet_model.py		LdaMallet_model.py
README.md		README.md
apply_LDAMALLET_unseen.py		apply_LDAMALLET_unseen.py
apply_classification_unseen.py		apply_classification_unseen.py
newslAItters.py		newslAItters.py
preprocess_text.py		preprocess_text.py
requirements.txt		requirements.txt
testing_LDA_sklearn.py		testing_LDA_sklearn.py
urls_product_17_topic.py		urls_product_17_topic.py

SabaYahyaa/project7_nlp_TopicModeling

Folders and files

Latest commit

History

Repository files navigation

Team member names:

project7_nlp_TopicModeling

Features:

highlights:

About

Resources

Stars

Watchers

Forks

Languages