You are currently viewing 10 Best Datasets for Sentiment Analysis [2023]

10 Best Datasets for Sentiment Analysis [2023]

Sentiment analysis has become a crucial aspect of data analysis, especially in industries such as marketing, customer service, and politics. The ability to understand and analyze the opinions and emotions of customers, voters, and consumers is invaluable. In this article, we will explore the 10 best datasets for sentiment analysis in 2023 that can help you to gain a deeper understanding of the emotions and opinions expressed in text data.

Dataset NameSizeDownload LinkDescription
IMDB25,000 movie reviews for training and 25,000 for testinghttps://ai.stanford.edu/~amaas/data/sentiment/The IMDB dataset contains 25,000 movie reviews for training and 25,000 for testing, labeled as positive or negative sentiment.
Sentiment1401.6 million tweetshttp://help.sentiment140.com/for-studentsSentiment140 is a large dataset of 1.6 million tweets, labeled as positive, negative or neutral sentiment.
Yelp Review Polarity560,000 reviewshttps://www.kaggle.com/yelp-dataset/yelp-datasetThe Yelp Review Polarity dataset contains 560,000 reviews labeled as positive or negative sentiment.
Amazon Review Polarity3 million reviewshttps://nijianmo.github.io/amazon/index.htmlThe Amazon Review Polarity dataset contains 3 million reviews labeled as positive or negative sentiment.
Twitter Sentiment Analysis1.6 million tweetshttps://www.kaggle.com/kazanova/sentiment140The Twitter Sentiment Analysis dataset contains 1.6 million tweets labeled as positive, negative or neutral sentiment.
Stanford Sentiment Treebank215,154 sentenceshttps://nlp.stanford.edu/sentiment/index.htmlThe Stanford Sentiment Treebank is a dataset of 215,154 sentences labeled with sentiment polarity.
SST-267,000 sentenceshttps://nlp.stanford.edu/sentiment/index.htmlThe Stanford Sentiment Treebank 2 (SST-2) is a subset of the Stanford Sentiment Treebank, containing 67,000 sentences labeled with sentiment polarity.
MPQA Opinion Corpus10,606 sentenceshttp://mpqa.cs.pitt.edu/corpora/mpqa_corpus/The MPQA Opinion Corpus is a dataset of 10,606 sentences labeled with sentiment polarity.
TREC5,452 questionshttps://cogcomp.org/Data/QA/QC/The Text REtrieval Conference (TREC) dataset is a dataset of 5,452 questions labeled with sentiment polarity.
SentiWordNethttp://sentiwordnet.isti.cnr.it/SentiWordNet is a lexical resource that assigns sentiment polarity to words.

Leave a Reply