Logistic regression is a popular machine learning algorithm. To train and evaluate logistic regression models, high-quality datasets are needed. The following list of 10 datasets provides a solid starting point for researchers and practitioners who want to apply logistic regression to real-world problems. These datasets cover a range of topics, from predicting the survival status of Titanic passengers to identifying the quality of wine based on chemical analysis. Each dataset includes a brief description and a link to the source where it can be downloaded.
|Dataset Name||Description||Reference Paper||Download Link|
|Titanic Dataset||This dataset contains information about passengers on the Titanic, including demographics, fare information, and survival status.||N/A||https://www.kaggle.com/c/titanic/data|
|Pima Indians Diabetes Dataset||This dataset contains medical records for Pima Indians, including various diagnostic measures, and the onset of diabetes.||N/A||https://www.kaggle.com/uciml/pima-indians-diabetes-database|
|Credit Card Default Dataset||This dataset contains information on default payments, demographic information, and credit card usage for customers of a credit card company.||Y. Ye and W. Yan, “A Study on Default of Credit Card Clients: Is this the End of the Trend?” Int. Conf. on Information and Financial Engineering, 2010.||https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset|
|Bank Marketing Dataset||This dataset contains information about a bank’s past marketing campaigns, including contact information, response rates, and whether the customer subscribed to the bank’s product.||M. Moro, R. Laureano, and P. Cortez, “A Data-Driven Approach to Predict the Success of Bank Telemarketing,” Decision Support Systems, 2014.||https://archive.ics.uci.edu/ml/datasets/Bank+Marketing|
|Iris Flower Dataset||This dataset contains measurements of iris flowers, including sepal length, sepal width, petal length, petal width, and species information.||R.A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, 1936.||https://archive.ics.uci.edu/ml/datasets/Iris|
|Digits Recognition Dataset||This dataset contains images of handwritten digits, with the goal of training a classifier to recognize the digits.||N/A||https://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset|
|Spam Email Dataset||This dataset contains a collection of spam and non-spam email messages, with the goal of training a classifier to distinguish between spam and non-spam messages.||N/A||https://www.kaggle.com/uciml/sms-spam-collection-dataset|
|Wine Quality Dataset||This dataset contains wine chemical analysis information, including pH, alcohol content, and other features, with the goal of predicting wine quality.||P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis, “Modeling wine preferences by data mining from physicochemical properties,” Decision Support Systems, 2009.||https://archive.ics.uci.edu/ml/datasets/Wine+Quality|
|Adult Income Dataset||This dataset contains information about individuals, including age, education, occupation, and income, with the goal of predicting whether an individual earns over $50,000 per year.||N/A||https://www.kaggle.com/uciml/adult-census-income|
|Student Performance Dataset||This dataset contains information about students in secondary education, including demographic information, study habits, and grades, with the goal of predicting academic performance.||N/A||https://archive.ics.uci.edu/ml/datasets/Student+Performance|
These datasets provide a good starting point for researchers and practitioners looking to apply logistic regression in their work. The datasets range from small and straightforward to larger and more complex, allowing for a range of different logistic regression use cases to be tested and evaluated.