Skip to content

dbrehmer/Knowself

Repository files navigation

Knowself

What can be learned about personality from writing samples.

As determined from the Twitter stream, Facebook posts and stream-of-consciousness essays.

The Datasets:

Three sets of data were found of writing with personality labels.

  1. Twitter posts from 152 unique people, 14166 in total.
  2. Facebook posts from 250 unique people, 9917 in total.
  3. Stream of consciousness essays from 2468 unique people.

All three of these are labeled with "Big 5" personality types. The Big 5 model for personality evolved from the lexical hypothesis that the range of human perspectives on the world must be encoded in the language that we use. The Big 5, or Five Factor model claims that the personality characteristics represented by the terms in our language cluster into five groups corresponding to scales on which personality can be measured.

The Application:

The plan is to use the model to predict personality given a person's twitter stream. Given this is the goal the model will probably best be developed using the first dataset, but there is some interest to know how well a model will do if it is trained on the other sets, then applied to tweets. Once the tags, user mentions and links are filtered out the remaining text from twitter may be similar to the stream of consciousness essays or similarly filtered Facebook posts.

The application will also include a short personality test to gather additional data.

About

Data Science for Introspection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published