Example #1
0
"""
From: https://www.figure-eight.com/data-for-everyone/

Contributors looked at over 10,000 tweets culled with a variety of searches like “ablaze”, “quarantine”, and “pandemonium”, then noted whether the tweet referred to a disaster event (as opposed to a joke with the word or a movie review or something non-disastrous).
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/socialmedia-disaster-tweets-DFE.csv",
        text_column="text",
        target_column="choose_one",
        filename="SocialMediaDisasters.csv")
Example #2
0
"""
From: https://www.figure-eight.com/data-for-everyone/

Movie review task from SentEval.  Note that performance on this dataset is not comparable to official SentEval scores because of differences in data splitting.
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(url="https://s3.amazonaws.com/enso-data/MovieReviews.csv",
                     text_column="Text",
                     target_column="Target",
                     filename="MovieReviews.csv")
Example #3
0
"""
From: https://www.figure-eight.com/data-for-everyone/

Contributors read sentences in which both a chemical (like Aspirin) and a disease (or side-effect) were present. They then determined if the chemical directly contributed to the disease or caused it. Dataset includes chemical names, disease name, and aggregated judgments of five (as opposed to the usual three) contributors.
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/chemicals-and-disease-DFE.csv",
        text_column="form_sentence",
        target_column="verify_relationship",
        filename="ChemicalDiseaseCauses.csv")
Example #4
0
"""
From: https://www.figure-eight.com/data-for-everyone/

Subjectivity task from SentEval.  Note that performance on this dataset is not comparable to official SentEval scores because of differences in data splitting.
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(url="https://s3.amazonaws.com/enso-data/Subjectivity.csv",
                     text_column="Text",
                     target_column="Target",
                     filename="Subjectivity.csv")
Example #5
0
"""
From: https://www.figure-eight.com/data-for-everyone/

In a variation on the popular task of sentiment analysis, this dataset contains labels for the emotional content (such as happiness, sadness, and anger) of texts. Hundreds to thousands of examples across 13 labels.
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/07/text_emotion.csv",
        text_column="content",
        target_column="sentiment",
        filename="Emotion.csv")
Example #6
0
Caused side effects – [Drug] gave me [symptom]
Was effective against a condition – [Drug] helped my [disease]
Is prescribed for a certain disease – [Drug] was given to help my [disease]
Is contraindicated in – [Drug] should not be taken if you have [disease or symptom]
The second similarity was more about the statement itself. Those broke down into:

Personal experiences – I started [drug] for [disease]
Personal experiences negated – [Drug] did not cause [symptom]
Impersonal experiences – I’ve heard [drug] causes [symptom]
Impersonal experiences negated – I’ve read [drug] doesn’t cause [symptom]
Question – Have you tried [drug]?
"""
from enso.download import generic_download, html_to_text

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/drug-relation-dfe.csv",
        text_column="text",
        target_column="human_relation",
        text_transformation=html_to_text,
        filename="DrugReviewType.csv")

    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/drug-relation-dfe.csv",
        text_column="text",
        target_column="human_relation_type",
        text_transformation=html_to_text,
        filename="DrugReviewIntent.csv")
Example #7
0
"""
From: https://www.figure-eight.com/data-for-everyone/
Contributors evaluated tweets for belief in the existence of global warming or climate change. The possible answers were “Yes” if the tweet suggests global warming is occurring, “No” if the tweet suggests global warming is not occurring, and “I can’t tell” if the tweet is ambiguous or unrelated to global warming. We also provide a confidence score for the classification of each tweet.
"""

from enso.download import generic_download

def words_to_char(val):
    conversion = {
        "Yes": 'Y',
        "No": 'N'
    }
    converted_val = conversion.get(val, val)
    return converted_val


if __name__ == "__main__":
    generic_download(
        url="https://www.figure-eight.com/wp-content/uploads/2016/03/1377884570_tweet_global_warming.csv",
        text_column="tweet",
        target_column="existence",
        target_transformation=words_to_char,
        filename="GlobalWarming.csv"
    )
Example #8
0
"""
From: https://www.figure-eight.com/data-for-everyone/

Contributors viewed tweets regarding a variety of left-leaning issues like legalization of abortion, feminism, Hillary Clinton, etc. They then classified if the tweets in question were for, against, or neutral on the issue (with an option for none of the above). After this, they further classified each statement as to whether they expressed a subjective opinion or gave facts.
"""
from enso.download import generic_download


if __name__ == "__main__":
    generic_download(
        url="https://www.figure-eight.com/wp-content/uploads/2016/03/progressive-tweet-sentiment.csv",
        text_column="tweet",
        target_column="q1_from_reading_the_tweet_which_of_the_options_below_is_most_likely_to_be_true_about_the_stance_or_outlook_of_the_tweeter_towards_the_target",
        filename="PoliticalTweetAlignment.csv"
    )

    generic_download(
        url="https://www.figure-eight.com/wp-content/uploads/2016/03/progressive-tweet-sentiment.csv",
        text_column="tweet",
        target_column="q2_which_of_the_options_below_is_true_about_the_opinion_in_the_tweet",
        filename="PoliticalTweetSubjectivity.csv"
    )

    generic_download(
        url="https://www.figure-eight.com/wp-content/uploads/2016/03/progressive-tweet-sentiment.csv",
        text_column="tweet",
        target_column="target",
        filename="PoliticalTweetTarget.csv"
    )
Example #9
0
"""
From: https://www.figure-eight.com/data-for-everyone/
Contributors evaluated tweets about multiple brands and products. The crowd was asked if the tweet expressed positive, negative, or no emotion towards a brand and/or product. If some emotion was expressed they were also asked to say which brand or product was the target of that emotion.
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/judge-1377884607_tweet_product_company.csv",
        text_column="tweet_text",
        target_column="is_there_an_emotion_directed_at_a_brand_or_product",
        filename="BrandEmotion.csv")
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/judge-1377884607_tweet_product_company.csv",
        text_column="tweet_text",
        target_column="emotion_in_tweet_is_directed_at",
        filename="BrandEmotionCause.csv")
Example #10
0
"""
From: https://www.figure-eight.com/data-for-everyone/

Contributors looked at a single sentence and rated its emotional content based on Plutchik’s wheel of emotions. 18 emotional choices were presented to contributors for grading.
"""

from enso.download import generic_download


def convert_score_to_category(score):
    if 1 <= score <= 3:
        return "negative"
    elif 4 <= score <= 6:
        return "neutral"
    else:
        return "positive"


if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/us-economic-newspaper.csv",
        text_column="text",
        target_column="positivity",
        target_transformation=convert_score_to_category,
        filename="Economy.csv")
Example #11
0
"""
From: https://www.figure-eight.com/data-for-everyone/
A Twitter topic analysis of users’ 2015 New Year’s resolutions.
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/New-years-resolutions-DFE.csv",
        text_column="text",
        target_column="Resolution_Category",
        filename="NewYearsResolutions.csv")
Example #12
0
"""
From: https://www.figure-eight.com/data-for-everyone/
Contributors looked at thousands of social media messages from US Senators and other American politicians to classify their content. Messages were broken down into audience (national or the tweeter’s constituency), bias (neutral/bipartisan, or biased/partisan), and finally tagged as the actual substance of the message itself (options ranged from informational, announcement of a media appearance, an attack on another candidate, etc.)
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url="https://www.figure-eight.com/wp-content/uploads/2016/03/Political-media-DFE.csv",
        text_column="text",
        target_column="message",
        filename="PoliticalTweetClassification.csv"
    )

    generic_download(
        url="https://www.figure-eight.com/wp-content/uploads/2016/03/Political-media-DFE.csv",
        text_column="text",
        target_column="bias",
        filename="PoliticalTweetBias.csv"
    )
Example #13
0
"""
From: https://www.figure-eight.com/data-for-everyone/
A data categorization job concerning what corporations actually talk about on social media. Contributors were asked to classify statements as information (objective statements about the company or it’s activities), dialog (replies to users, etc.), or action (messages that ask for votes or ask users to click on links, etc.).
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/Corporate-messaging-DFE.csv",
        text_column="text",
        target_column="category",
        filename="CorporateMessaging.csv")
Example #14
0
"""
From: https://www.figure-eight.com/data-for-everyone/

Contributors looked at a single sentence and rated its emotional content based on Plutchik’s wheel of emotions. 18 emotional choices were presented to contributors for grading.
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/primary-plutchik-wheel-DFE.csv",
        text_column="sentence",
        target_column="emotion",
        filename="DetailedEmotion.csv")
Example #15
0
"""
From: https://www.figure-eight.com/data-for-everyone/
A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as “late flight” or “rude service”).
"""

from enso.download import generic_download

if __name__ == "__main__":
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/Airline-Sentiment-2-w-AA.csv",
        text_column="text",
        target_column="negativereason",
        filename="AirlineNegativity.csv")
    generic_download(
        url=
        "https://www.figure-eight.com/wp-content/uploads/2016/03/Airline-Sentiment-2-w-AA.csv",
        text_column="text",
        target_column="airline_sentiment",
        filename="AirlineSentiment.csv")