You are currently viewing How to Save Image Dataset into NumPy Arrays

How to Save Image Dataset into NumPy Arrays

NumPy arrays are a powerful data structure for doing scientific computing in Python. They are fast, efficient, and easy to use. Plus, they can be used for machine learning and deep learning. In this blog, we will take a look at how to save image dataset into NumPy arrays. I code this while performing image segmentation task on histopathology and mammography images. I convert images and their respective masks into NumPy arrays. The dataset in NumPy array format helps me to work fast on Google colab and kaggle.

NumPy arrays are a powerful data structure for doing scientific computing in Python. They are fast, efficient, and easy to use. Plus, they can be used for machine learning and deep learning. In this blog, we will take a look at how to save image dataset into NumPy arrays. I code this while performing image segmentation task on histopathology and mammography images. I convert images and their respective masks into NumPy arrays. The dataset in NumPy array format helps me to work fast on Google colab and kaggle.

Step 1: Import Required Libraries

import os
import cv2
import numpy as np
from matplotlib import pyplot as plt

Step 2: Store Dataset Locations into Variables

Images_Dir = 'Input_Images/'
Masks_Dir = 'Mask_Images/'

Step 3: Get the Path of all Images in a Directory and Save into List

Images_List = [ f for f in  os.listdir(Images_Dir)]
Masks_List = [ f for f in  os.listdir(Mask_Dir)]

Step 4: Sort the List of the Names using np.sort()

Sorted_Images = np.sort(Images_List)
Sorted_Masks = np.sort(Masks_List)

Step 5: Create an Empty List in which we want to Store Images

Images_Data = []
Masks_Data = []

Step 6: Read all Images from their Paths and Store in the Empty List

In this step, we will read all the images using their paths which are stored in Sorted_Images variable using OpenCV library. And append all these image data into empty list Image_Data.

for img_path in Sorted_Images:
    images = cv2.imread(Images_Dir + img_path)
    Images_Data.append(images)

Perform the same Step with the Image Mask Dataset

for mask_path in Sorted_Masks:
    masks = cv2.imread(Masks_Dir + mask_path)
    Masks_Data.append(masks)

Step 7: Convert the List into NumPy Array

Now, our image data is in the List format. If you want to perform some machine learning or deep learning task, you have to convert it into NumPy arrays.

Images_Data = np.asarray(Images_Data)
Masks_Data = np.asarray(Masks_Data)

Step 8: Finally, Save the NumPy Array into .npy Format

np.save("Images_Dataset.npy", Images_Data)
np.save("Masks_Dataset.npy", Masks_Data)

If you want to load data into a variable and show it in your notebook

Load Dataset from NumPy Arrays

images_dataset = np.load('Images_Dataset.npy')
masks_dataset = np.load('Masks_Dataset.npy')

Show Images from NumPy Arrays using Matplotlib Library

plt.imshow(images_dataset[98])
plt.imshow(masks_dataset[98])

Complete Code:

This is the complete working code to store the image dataset in the NumPy array. You just need to give the path of directory in Images_Dir variable. The below code is for converting the images from single directory.

# import libraries

import os
import cv2
import numpy as np
from matplotlib import pyplot as plt

# Store Dataset Locations into Variables
Images_Dir = 'Input_Images/'

# Get the Path of all Images in a Directory and Save into List
Images_List = [ f for f in  os.listdir(Images_Dir)]

# Sort the List of the Names using np.sort()
Sorted_Images = np.sort(Images_List)

# Create an Empty List in which we want to Store Images
Images_Data = []

# for img_path in Sorted_Images:
    images = cv2.imread(Images_Dir + img_path)
    Images_Data.append(images)

# Convert the List into NumPy Array
Images_Data = np.asarray(Images_Data)

# Finally, Save the NumPy Array into .npy Format
np.save("Images_Dataset.npy", Images_Data)

This Post Has One Comment

Leave a Reply