Table of Content
- Introduction
- Importing Libraries
- Data Visualization
- Data Pre-Processing
- Building Model
- Model Summary
- Training Model
- Evaluating Model
- Predictions Through Model
- Predictions Using Real Time Data
- Confusion Matrix
- Heatmap
- Conclusion and Summary
Introduction
Convolution Neural Network (CNN) is a deep learning algorithm which takes image as an input then applies feature extraction on it through different hidden layers of neural network and be able to differentiate it from other images. Here the task of labeling the images is done by hidden layers present in our network. The architecture of CNN is similar to that of neurons present in a human brain. We intend to use the CNN implementation to create a automated tagging workflow which identifies the fashion apparel in the inventory of an online or offline retail store. This would reduce the time taken for human classification of inventory.
Various Layers of CNN are –
- Input
- Feature Extraction
- Convolution + RELU(Rectified Linear Unit)
- Pooling
- Dropout
- Classification
- Flatten
- Fully Connected
- Dropout
- Softmax
- Output
Importing Libraries
# Importing Libraries and Dataset import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import keras import cv2 from keras.layers import Dropout from keras.datasets import fashion_mnist from keras.utils import to_categorical from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense from keras.optimizers import Adam from sklearn.metrics import confusion_matrix, classification_report # Load the fashion-mnist data and Split into train test (X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data() Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz 32768/29515 [=================================] - 0s 2us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz 26427392/26421880 [==============================] - 14s 1us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz 8192/5148 [===============================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz 4423680/4422102 [==============================] - 3s 1us/step
X_train.shape
(60000, 28, 28)
X_test.shape
(10000, 28, 28)
Y_train.shape
(60000,)
Y_test.shape
(10000,)
Data Visualization
There are 10 different classes of images, as following:
Label 0 : T-shirt/top
Label 1: Trouser
Label 2: Pullover
Label 3: Dress
Label 4: Coat
Label 5: Sandal
Label 6: Shirt
Label 7: Sneaker
Label 8: Bag
Label 9: Ankle boot
plt.imshow(np.reshape(X_train[1], (28,28)), cmap = 'gray')
plt.title("Label: %i" %Y_train[1])
plt.show()
Figure 1 : Image Sample 1 from MNIST fashion dataset
plt.imshow(np.reshape(X_train[650], (28,28)), cmap = 'gray')
plt.title("Label: %i" %Y_train[650])
plt.show()
Figure 2 : Image Sample 2 from MNIST fashion dataset
Y_train[0:10]
array([9, 0, 0, 3, 0, 2, 7, 2, 5, 5], dtype=uint8)
Data Pre-Processing
# Define Labels
fashion_labels = ['T-shirt/top',
'Trouser',
'Pullover',
'Dress',
'Coat',
'Sandal',
'Shirt',
'Sneaker',
'Bag',
'Ankle boot']
# Image pixel Normalization
X_train = X_train.astype('float32')/255
X_test = X_test.astype('float32')/255
X_train[0]
image_height = 28
image_width = 28
# Grayscale image with num_channels (Rank = 1)
num_channels = 1
# Reshaping of Image = (60000, 28, 28, 1)
train_digits = np.reshape(X_train, newshape=(60000, image_height, image_width, num_channels))
test_digits = np.reshape(X_test, newshape=(10000, image_height, image_width, num_channels))
# 0 - 9 num_classes = 10
# 7 - [0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
# 5 - [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
num_classes = 10
train_labels_class = to_categorical(Y_train, num_classes)
test_labels_class = to_categorical(Y_test, num_classes)
train_labels_class
array([[0., 0., 0., ..., 0., 0., 1.],
[1., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
Building Model
def build_model():
model = Sequential()
# Layer - I (Padding = 'same' --> zero padding)
model.add(Conv2D(filters = 32, kernel_size=(3,3), strides=(1,1), padding = 'same', activation='relu',
input_shape = (image_height, image_width, num_channels)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters = 64, kernel_size=(3,3), strides=(1,1), padding = 'same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters = 128, kernel_size=(3,3), strides=(1,1), padding = 'same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
# Flatten Matrix
model.add(Flatten())
# Fully Connected Layer
model.add(Dense(units=128, activation='relu'))
model.add(Dropout(0.30))
# Output Layer
model.add(Dense(units=10, activation='softmax'))
# Model Compile
optimizers = Adam(learning_rate = 0.001)
# categorical_crossentropy - used for multiclass classification
model.compile(loss = 'categorical_crossentropy', optimizer = optimizers, metrics = ['accuracy'])
return model
model = build_model()
Model Summary
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 14, 14, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 7, 7, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 7, 7, 128) 73856
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 3, 3, 128) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 3, 3, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 1152) 0
_________________________________________________________________
dense (Dense) (None, 128) 147584
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 241,546
Trainable params: 241,546
Non-trainable params: 0
_________________________________________________________________
Training Model
result = model.fit(train_digits, train_labels_class, epochs=50, batch_size=64, validation_split=0.1)
Epoch 1/50
844/844 [==============================] - 70s 80ms/step - loss: 0.9277 - accuracy: 0.6530 - val_loss: 0.3862 - val_accuracy: 0.8540
Epoch 2/50
844/844 [==============================] - 77s 92ms/step - loss: 0.4095 - accuracy: 0.8517 - val_loss: 0.3166 - val_accuracy: 0.8808
Epoch 3/50
844/844 [==============================] - 81s 96ms/step - loss: 0.3535 - accuracy: 0.8710 - val_loss: 0.2824 - val_accuracy: 0.8957
Epoch 4/50
844/844 [==============================] - 81s 96ms/step - loss: 0.3218 - accuracy: 0.8808 - val_loss: 0.2593 - val_accuracy: 0.9037
Epoch 5/50
844/844 [==============================] - 79s 93ms/step - loss: 0.3020 - accuracy: 0.8852 - val_loss: 0.2501 - val_accuracy: 0.9062
...
...
...
...
...
Epoch 45/50
844/844 [==============================] - 108s 128ms/step - loss: 0.1585 - accuracy: 0.9389 - val_loss: 0.1922 - val_accuracy: 0.9283
Epoch 46/50
844/844 [==============================] - 99s 117ms/step - loss: 0.1595 - accuracy: 0.9394 - val_loss: 0.1901 - val_accuracy: 0.9305
Epoch 47/50
844/844 [==============================] - 80s 95ms/step - loss: 0.1565 - accuracy: 0.9402 - val_loss: 0.1953 - val_accuracy: 0.9295
Epoch 48/50
844/844 [==============================] - 81s 95ms/step - loss: 0.1540 - accuracy: 0.9411 - val_loss: 0.1937 - val_accuracy: 0.9288
Epoch 49/50
844/844 [==============================] - 111s 132ms/step - loss: 0.1587 - accuracy: 0.9392 - val_loss: 0.2026 - val_accuracy: 0.9268
Epoch 50/50
844/844 [==============================] - 90s 107ms/step - loss: 0.1568 - accuracy: 0.9406 - val_loss: 0.1945 - val_accuracy: 0.9285
Model Evaluation
model.evaluate(test_digits, test_labels_class)
313/313 [==============================] - 5s 15ms/step - loss: 0.2267 - accuracy: 0.9284
Out[11]: [0.22667253017425537, 0.9283999800682068]
pd.DataFrame(result.history)
loss accuracy val_loss val_accuracy
0 0.270684 0.900074 0.235163 0.910500
1 0.262369 0.903852 0.235229 0.910667
2 0.254143 0.905611 0.225067 0.916833
3 0.244829 0.908963 0.227146 0.916833
4 0.237125 0.911722 0.220977 0.918167
5 0.231625 0.913889 0.219834 0.914667
45 0.159503 0.939389 0.190102 0.930500
46 0.156466 0.940204 0.195315 0.929500
47 0.153997 0.941148 0.193682 0.928833
48 0.158681 0.939167 0.202637 0.926833
49 0.156842 0.940630 0.194501 0.928500
pd.DataFrame(result.history)[['accuracy', 'val_accuracy']].plot()
![]()
Figure 3 : Accuracy Chart for model evaluation of CNN on MNIST fashion data
pd.DataFrame(result.history)[['loss', 'val_loss']].plot()
![]()
Figure 4 : Loss Chart for model evaluation of CNN on MNIST fashion data
Predictions Through the Model
# Converts Categorical o/p into integar o/p yhat = np.argmax(predictions, axis = 1) yhat = np.argmax(model.predict(np.reshape(test_digits[5], (1, 28, 28, 1)))) plt.imshow(np.reshape(test_digits[5], (28,28)), cmap = 'gray') plt.title("Label: %i Prediction: %i" %(Y_train[6], yhat)) plt.show()
![]()
Figure 5 : Predicted Image sample for CNN on MNIST fashion dataset
Predictions Using Real Time Data
# 0 - Gray Scale
# sample data to ingest
![]()
Figure 6 : Our data for testing CNN built on MNIST fashion dataset
img = cv2.imread('test-tshirt.jpg', 0)
img.shape
(1571, 1600)
test_digits.shape
(10000, 28, 28, 1)
img_data = cv2.resize(img, (28, 28))
plt.imshow(img_data, cmap = 'gray')
Figure 7 : Ingest our data for testing CNN built on MNIST fashion dataset
# Bitwise operation not for image samples
img_data = cv2.bitwise_not(img_data)
img_new = np.reshape(img_data, (1, image_height, image_width, num_channels))
model.predict(img_new)
array([[1.0000000e+00, 0.0000000e+00, 5.3774135e-33, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00]], dtype=float32)
plt.imshow(img_data, cmap = 'gray')
plt.title("Predicted O/P :%i" %np.argmax(model.predict(img_new)))
Text(0.5, 1.0, 'Predicted O/P :0')
Figure 8 : Predicted Class of Ingested Image for CNN on MNIST Fashion dataset
Confusion Matrix
predictions = model.predict(test_digits)
yhat = np.argmax(predictions, axis = 1)
confusion_matrix(Y_test, yhat)
array([[853, 0, 15, 11, 4, 1, 110, 0, 6, 0],
[ 0, 985, 0, 9, 2, 0, 3, 0, 1, 0],
[ 18, 1, 872, 6, 56, 0, 46, 0, 1, 0],
[ 7, 4, 8, 940, 20, 0, 21, 0, 0, 0],
[ 0, 0, 21, 20, 907, 0, 52, 0, 0, 0],
[ 0, 0, 0, 0, 0, 987, 0, 9, 0, 4],
[ 68, 0, 36, 27, 66, 0, 800, 0, 3, 0],
[ 0, 0, 0, 0, 0, 7, 0, 980, 0, 13],
[ 1, 1, 1, 2, 1, 2, 2, 0, 990, 0],
[ 0, 0, 0, 0, 0, 4, 1, 25, 0, 970]], dtype=int64)
Heatmap
plt.figure(figsize = (10, 10))
sns.heatmap(confusion_matrix(Y_test, yhat), annot = True, fmt = '0.0f')
![]()
Figure 8 : Cross Tabulation of CNN model on MNIST Fashion data
Conclusion and Summary
In this tutorial we discussed how to predict apparels using Deep Learning Convolution Neural Network (CNN) in Python. Also, we learned how to build model, add different layers, train model and test model using MNIST Fashion Dataset. We also made some predictions by importing a real time image of a T-Shirt and out model predicted correct. An application like this can help in real time tagging for inventory management. Confusion matrix and Heatmap displays strength and weakness of our model. Read this interesting article on MNIST digit classification using logistic regression.
About the Author's:
Write A Public Review