- Saba Yahyaa
- Mikael Dominguez
- Adam Flasse
nlp_Topic Modeling is a NLP application that uses LDA-MALLET (Latent Dirichlet allocation, Topic Modeling) and Xgboost to classify the topics of newspapers.
Dealing with news_data.json under the data folder.
- Preprocessed each text (document) using preprocessed_text.py (text cleaning)
- Create X (features) using Creating_X_TFIDF.py (applying TFIDF Vectorizing on document to create numerical features)
- Create y (label) for each document using LDA-MALLET. Each y is the dominated topic.
- Split the data (X and y) and train Xgboost.
- Use Xgboost to find the label (dominated topic) for a new document.
- Use LDA-MALLET to find the topics for a new document.
We specified the following 17 topics: 0. AI in fake news (Social Media Marketing)
- AI in Human Discrimination
- AI in Material and Energy research
- AI in Voice Assistant
- AI in Business
- AI in AI Autonomous vehicle
- AI in Hadware (Chip)
- AI in Human question/problem answer
- AI in Image (NN)
- AI in Arts
- AI in Robots
- AI in Medecine
- AI in Legal Service
- AI in Industry (sillicon valley)
- AI in Academic Research
- AI in Game
- AI in DNN and Machine learning application
We create [https://ainewspaper.herokuapp.com/] (# newslAItter) app. Each time you select some topics from available AI topics, you enter you email address. All the selected topics will be sent to your email.