NLP(Natural Language Processing)

Spam Detector:

dataset: uci machine learning repository.

Sci-kit learn

Naive Bayes

accuracy 87%

binary classifier

Sentiment Analyzer

dataset: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html

NLTK

Lemmatizer

Logstic Regression

sentiment = how positive or negative some text is

Latent Sentiment Analyzer

multiple words with the same meaning(synonymy) one word with multiple meanings(polysemy)

PCA(Principal Components Analysis) and SVD(Singular value decomposition)

Article Spinning

[Take an article and slightly modify it(synonyms, replace phrases that have the same meaning). Results in a new article with different terms, but same meaning.]

dataset: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html

Trigram

Word Analogy

[find relation among words by Euclidean distance and cosine distance]

dataset: the brown corpus & wikipedia data

word2vec

CBOW = use context to predict middle word

Skip-gram = use middle word to predict context

two version: Numpy & Theano

Recommender System

[Pretend you are Amazon - match users to products & Netflix - match users to movies.]

frame it as a typical prediction ML problem:

rating = neural_work.predict(X = [user_attributes + product_attributes])

Matrix Factorization

GLoVe: term-term matrix

Numpy Gradient Descent, Theano Gradient Descent, Alternating Least Squares

Parts-of-Speech Tagging

POS tag is a category you can assign a word, according toits syntactic function.

dataset: http://www.cnts.ua.ac.be/conll2000/chunking/

Logistic regression

Decision Tree

Baseline F1 score: 0.895, 0.862

RNN F1 score: 0.803, 0.782

HMM F1 score: 0.923, 0.862

Sentiment Analysis

dataset: http://nlp.stanford.edu/sentiment/

RNN

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
BrownCorpus(word2vec)		BrownCorpus(word2vec)
Parting-of-Speech Tagging		Parting-of-Speech Tagging
articleSpinning		articleSpinning
latentSentiment		latentSentiment
recommenderSystem		recommenderSystem
sentimentAnalysis(RNN)		sentimentAnalysis(RNN)
sentimentAnalyzer		sentimentAnalyzer
spamDetector		spamDetector
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BrownCorpus(word2vec)

BrownCorpus(word2vec)

Parting-of-Speech Tagging

Parting-of-Speech Tagging

articleSpinning

articleSpinning

latentSentiment

latentSentiment

recommenderSystem

recommenderSystem

sentimentAnalysis(RNN)

sentimentAnalysis(RNN)

sentimentAnalyzer

sentimentAnalyzer

spamDetector

spamDetector

README.md

README.md

Repository files navigation

NLP(Natural Language Processing)

Spam Detector:

Sentiment Analyzer

Latent Sentiment Analyzer

Article Spinning

Word Analogy

Recommender System

Parts-of-Speech Tagging

Sentiment Analysis

About

Releases

Packages

Languages

rongdilin/NLP

Folders and files

Latest commit

History

Repository files navigation

NLP(Natural Language Processing)

Spam Detector:

Sentiment Analyzer

Latent Sentiment Analyzer

Article Spinning

Word Analogy

Recommender System

Parts-of-Speech Tagging

Sentiment Analysis

About

Resources

Stars

Watchers

Forks

Languages