NumPy arrays are a powerful data structure for doing scientific computing in Python. They are fast, efficient, and easy to use. Plus, they can be used for machine learning and deep learning. In this blog, we will take a look at how to save image dataset into NumPy arrays. I code this while performing image segmentation task on histopathology and mammography images. I convert images and their respective masks into NumPy arrays. The dataset in NumPy array format helps me to work fast on Google colab and kaggle.
Table of Contents
NumPy arrays are a powerful data structure for doing scientific computing in Python. They are fast, efficient, and easy to use. Plus, they can be used for machine learning and deep learning. In this blog, we will take a look at how to save image dataset into NumPy arrays. I code this while performing image segmentation task on histopathology and mammography images. I convert images and their respective masks into NumPy arrays. The dataset in NumPy array format helps me to work fast on Google colab and kaggle.
Step 1: Import Required Libraries
import os
import cv2
import numpy as np
from matplotlib import pyplot as plt
Step 2: Store Dataset Locations into Variables
Images_Dir = 'Input_Images/'
Masks_Dir = 'Mask_Images/'
Step 3: Get the Path of all Images in a Directory and Save into List
Images_List = [ f for f in os.listdir(Images_Dir)]
Masks_List = [ f for f in os.listdir(Mask_Dir)]
Step 4: Sort the List of the Names using np.sort()
Sorted_Images = np.sort(Images_List)
Sorted_Masks = np.sort(Masks_List)
Step 5: Create an Empty List in which we want to Store Images
Images_Data = []
Masks_Data = []
Step 6: Read all Images from their Paths and Store in the Empty List
In this step, we will read all the images using their paths which are stored in Sorted_Images variable using OpenCV library. And append all these image data into empty list Image_Data.
for img_path in Sorted_Images:
images = cv2.imread(Images_Dir + img_path)
Images_Data.append(images)
Perform the same Step with the Image Mask Dataset
for mask_path in Sorted_Masks:
masks = cv2.imread(Masks_Dir + mask_path)
Masks_Data.append(masks)
Step 7: Convert the List into NumPy Array
Now, our image data is in the List format. If you want to perform some machine learning or deep learning task, you have to convert it into NumPy arrays.
Images_Data = np.asarray(Images_Data)
Masks_Data = np.asarray(Masks_Data)
Step 8: Finally, Save the NumPy Array into .npy Format
np.save("Images_Dataset.npy", Images_Data)
np.save("Masks_Dataset.npy", Masks_Data)
If you want to load data into a variable and show it in your notebook
Load Dataset from NumPy Arrays
images_dataset = np.load('Images_Dataset.npy')
masks_dataset = np.load('Masks_Dataset.npy')
Show Images from NumPy Arrays using Matplotlib Library
plt.imshow(images_dataset[98])

plt.imshow(masks_dataset[98])

Complete Code:
This is the complete working code to store the image dataset in the NumPy array. You just need to give the path of directory in Images_Dir variable. The below code is for converting the images from single directory.
# import libraries
import os
import cv2
import numpy as np
from matplotlib import pyplot as plt
# Store Dataset Locations into Variables
Images_Dir = 'Input_Images/'
# Get the Path of all Images in a Directory and Save into List
Images_List = [ f for f in os.listdir(Images_Dir)]
# Sort the List of the Names using np.sort()
Sorted_Images = np.sort(Images_List)
# Create an Empty List in which we want to Store Images
Images_Data = []
# for img_path in Sorted_Images:
images = cv2.imread(Images_Dir + img_path)
Images_Data.append(images)
# Convert the List into NumPy Array
Images_Data = np.asarray(Images_Data)
# Finally, Save the NumPy Array into .npy Format
np.save("Images_Dataset.npy", Images_Data)
Pingback: How to Save a Kaggle Notebook with Outputs - AiOcta