Sentiment analysis has become a crucial aspect of data analysis, especially in industries such as marketing, customer service, and politics. The ability to understand and analyze the opinions and emotions of customers, voters, and consumers is invaluable. In this article, we will explore the 10 best datasets for sentiment analysis in 2023 that can help you to gain a deeper understanding of the emotions and opinions expressed in text data.
Dataset Name | Size | Download Link | Description |
---|---|---|---|
IMDB | 25,000 movie reviews for training and 25,000 for testing | https://ai.stanford.edu/~amaas/data/sentiment/ | The IMDB dataset contains 25,000 movie reviews for training and 25,000 for testing, labeled as positive or negative sentiment. |
Sentiment140 | 1.6 million tweets | http://help.sentiment140.com/for-students | Sentiment140 is a large dataset of 1.6 million tweets, labeled as positive, negative or neutral sentiment. |
Yelp Review Polarity | 560,000 reviews | https://www.kaggle.com/yelp-dataset/yelp-dataset | The Yelp Review Polarity dataset contains 560,000 reviews labeled as positive or negative sentiment. |
Amazon Review Polarity | 3 million reviews | https://nijianmo.github.io/amazon/index.html | The Amazon Review Polarity dataset contains 3 million reviews labeled as positive or negative sentiment. |
Twitter Sentiment Analysis | 1.6 million tweets | https://www.kaggle.com/kazanova/sentiment140 | The Twitter Sentiment Analysis dataset contains 1.6 million tweets labeled as positive, negative or neutral sentiment. |
Stanford Sentiment Treebank | 215,154 sentences | https://nlp.stanford.edu/sentiment/index.html | The Stanford Sentiment Treebank is a dataset of 215,154 sentences labeled with sentiment polarity. |
SST-2 | 67,000 sentences | https://nlp.stanford.edu/sentiment/index.html | The Stanford Sentiment Treebank 2 (SST-2) is a subset of the Stanford Sentiment Treebank, containing 67,000 sentences labeled with sentiment polarity. |
MPQA Opinion Corpus | 10,606 sentences | http://mpqa.cs.pitt.edu/corpora/mpqa_corpus/ | The MPQA Opinion Corpus is a dataset of 10,606 sentences labeled with sentiment polarity. |
TREC | 5,452 questions | https://cogcomp.org/Data/QA/QC/ | The Text REtrieval Conference (TREC) dataset is a dataset of 5,452 questions labeled with sentiment polarity. |
SentiWordNet | http://sentiwordnet.isti.cnr.it/ | SentiWordNet is a lexical resource that assigns sentiment polarity to words. |