You are currently viewing 10 Best Datasets for Logistic Regression [2023]

10 Best Datasets for Logistic Regression [2023]

Logistic regression is a popular machine learning algorithm. To train and evaluate logistic regression models, high-quality datasets are needed. The following list of 10 datasets provides a solid starting point for researchers and practitioners who want to apply logistic regression to real-world problems. These datasets cover a range of topics, from predicting the survival status of Titanic passengers to identifying the quality of wine based on chemical analysis. Each dataset includes a brief description and a link to the source where it can be downloaded.

Dataset NameDescriptionReference PaperDownload Link
Titanic DatasetThis dataset contains information about passengers on the Titanic, including demographics, fare information, and survival status.N/Ahttps://www.kaggle.com/c/titanic/data
Pima Indians Diabetes DatasetThis dataset contains medical records for Pima Indians, including various diagnostic measures, and the onset of diabetes.N/Ahttps://www.kaggle.com/uciml/pima-indians-diabetes-database
Credit Card Default DatasetThis dataset contains information on default payments, demographic information, and credit card usage for customers of a credit card company.Y. Ye and W. Yan, “A Study on Default of Credit Card Clients: Is this the End of the Trend?” Int. Conf. on Information and Financial Engineering, 2010.https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset
Bank Marketing DatasetThis dataset contains information about a bank’s past marketing campaigns, including contact information, response rates, and whether the customer subscribed to the bank’s product.M. Moro, R. Laureano, and P. Cortez, “A Data-Driven Approach to Predict the Success of Bank Telemarketing,” Decision Support Systems, 2014.https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
Iris Flower DatasetThis dataset contains measurements of iris flowers, including sepal length, sepal width, petal length, petal width, and species information.R.A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, 1936.https://archive.ics.uci.edu/ml/datasets/Iris
Digits Recognition DatasetThis dataset contains images of handwritten digits, with the goal of training a classifier to recognize the digits.N/Ahttps://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset
Spam Email DatasetThis dataset contains a collection of spam and non-spam email messages, with the goal of training a classifier to distinguish between spam and non-spam messages.N/Ahttps://www.kaggle.com/uciml/sms-spam-collection-dataset
Wine Quality DatasetThis dataset contains wine chemical analysis information, including pH, alcohol content, and other features, with the goal of predicting wine quality.P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis, “Modeling wine preferences by data mining from physicochemical properties,” Decision Support Systems, 2009.https://archive.ics.uci.edu/ml/datasets/Wine+Quality
Adult Income DatasetThis dataset contains information about individuals, including age, education, occupation, and income, with the goal of predicting whether an individual earns over $50,000 per year.N/Ahttps://www.kaggle.com/uciml/adult-census-income
Student Performance DatasetThis dataset contains information about students in secondary education, including demographic information, study habits, and grades, with the goal of predicting academic performance.N/Ahttps://archive.ics.uci.edu/ml/datasets/Student+Performance

These datasets provide a good starting point for researchers and practitioners looking to apply logistic regression in their work. The datasets range from small and straightforward to larger and more complex, allowing for a range of different logistic regression use cases to be tested and evaluated.

Leave a Reply