Sentiment Analysis

1. Dataset

English

Stanford Sentiment Treebank

The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from moview reviews.
Each phrase is labeled as either negative, somewhat negative, neutral, somewhat positive or positive. (SST-5 or SST fine-grained)
- train: 8,544
- dev: 1,101
- test: 2,210
https://paperswithcode.com/dataset/sst

Korean

Naver Sentiment Movie Corpus

The corpus is a movie review dataset in the Korean language. Reviews were scraped from Naver Movies.
200K reviews in total
- ratings_train.txt: 150K reviews for training
- ratings_test.txt: 50K reviews held out for testing
https://github.com/e9t/nsmc/

2. Embedding

English

GloVe
BERT (work in progress)

Korean

Character Embedding
BERT (work in progress)

3. Results

English

GloVe + CNN

Validation accuracy

Test accuracy

GloVe + BiLSTM

Validation accuracy

Test accuracy

Korean

Character Embedding + CNN

Test accuracy

Character Embedding + BiLSTM

Test accuracy

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Bert		Bert
Data		Data
English		English
Korean		Korean
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bert

Bert

Data

Data

English

English

Korean

Korean

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Sentiment Analysis

1. Dataset

2. Embedding

3. Results

About

Releases

Packages

Contributors 2

Languages

Seoyounggg/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis

1. Dataset

2. Embedding

3. Results

About

Resources

Stars

Watchers

Forks

Languages