Facial Keypoint Detection

Introduction

This project is to find the important keypoints on a face by using OpenCV and CNN. After finding the keypoints, the program generate the image and camera stream to wear sunglasses using keypoints on the faces. In the middle of this blog, I will compare the performance among the various deep learning models.

Process

Step 0: Import datasets
Step 1: Build CNN model to find keypoints on face
Step 2: Compare various CNN models
Step 3: Face Detection with OpenCV
Step 4: Final product

Environment

Laptop with CUDA
Jupyter Notebook
Python 3.5, Keras 2.2, TensorFlow-gpu 1.8

Step 0: Import Datasets

I used dataset from KaggleKaggle. The dataset is composed of 2140 train set and 1784 test set. The csv data point file contains 15 keypoints(x, y values) and 96x96 image gray color values(0-255) with 1 channel. And each image is a 96x96 face image with 1 channel (gray). The dataset is normalized to make better training. Color value is normalized between 0.0 and 1.0 and x, y values for key points on face are normalized between -1.0 and 1.0.

from utils import *

# Load training set
X_train, y_train = load_data()
print("X_train.shape == {}".format(X_train.shape))
print("y_train.shape == {}; y_train.min == {:.3f}; y_train.max == {:.3f}".format(
    y_train.shape, y_train.min(), y_train.max()))

# Load testing set
X_test, _ = load_data(test=True)
print("X_test.shape == {}".format(X_test.shape))

X_train.shape == (2140, 96, 96, 1)
y_train.shape == (2140, 30); y_train.min == -0.920; y_train.max == 0.996
X_test.shape == (1783, 96, 96, 1)

Visualize the training data

import matplotlib.pyplot as plt
%matplotlib inline

fig = plt.figure(figsize=(20,20))
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
for i in range(9):
    ax = fig.add_subplot(3, 3, i + 1, xticks=[], yticks=[])
    plot_data(X_train[i], y_train[i], ax)

train data

Step 1: Build CNN model to find keypoints on face

I tried to test various models to get the better result. I tested single and multiple conv + maxpool layers with 1~3 fully connected layers at the end of model. I also tested dropout to generalize on conv layer which isn’t work because conv layers is already generalized itself (conv layers’s local connectivity). Dropout is applied to fully connected layer to reduce overfitting. I used ReLu activations through model except the final activation. I chose final activation as Tanh because final output is -1 to 1. The adaptive optimizer which is Adam was produced the best result. Loss function is applied mean squared error that can evaluate continuous numerical outputs. Early Stopping is used to stop when validation loss is not improved in few epochs. The below final model is inspired from VGG-16 and VGG-19.

# Import deep learning resources from Keras
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D, Dropout
from keras.layers import Flatten, Dense

model = Sequential()

model.add(Convolution2D(filters=16, kernel_size=3, padding='same', activation='relu',
                       input_shape=(96, 96, 1)))
model.add(Convolution2D(filters=16, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))

model.add(Convolution2D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(Convolution2D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))

model.add(Convolution2D(filters=64, kernel_size=3, padding='same', activation='relu'))
model.add(Convolution2D(filters=64, kernel_size=3, padding='same', activation='relu'))
model.add(Convolution2D(filters=64, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))

model.add(Convolution2D(filters=128, kernel_size=3, padding='same', activation='relu'))
model.add(Convolution2D(filters=128, kernel_size=3, padding='same', activation='relu'))
model.add(Convolution2D(filters=128, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(30, activation='tanh'))

# Summarize the model
model.summary()

Layer (type)	Output Shape	Param #
conv2d_11 (Conv2D)	(None, 96, 96, 16)	160
conv2d_12 (Conv2D)	(None, 96, 96, 16)	2320
max_pooling2d_5 (MaxPooling2	(None, 48, 48, 16)	0
conv2d_13 (Conv2D)	(None, 48, 48, 32)	4640
conv2d_14 (Conv2D)	(None, 48, 48, 32)	9248
max_pooling2d_6 (MaxPooling2	(None, 24, 24, 32)	0
conv2d_15 (Conv2D)	(None, 24, 24, 64)	18496
conv2d_16 (Conv2D)	(None, 24, 24, 64)	36928
conv2d_17 (Conv2D)	(None, 24, 24, 64)	36928
max_pooling2d_7 (MaxPooling2	(None, 24, 24, 64)	0
conv2d_18 (Conv2D)	(None, 12, 12, 128)	73856
conv2d_19 (Conv2D)	(None, 12, 12, 128)	147584
conv2d_20 (Conv2D)	(None, 12, 12, 128)	147584
max_pooling2d_8 (MaxPooling2	(None, 6, 6, 128)	0
flatten_2 (Flatten)	(None, 4608)	0
dense_3 (Dense)	(None, 512)	2359808
dropout_2 (Dropout)	(None, 512)	0
dense_4 (Dense)	(None, 30)	15390

Total params: 2,852,942
Trainable params: 2,852,942
Non-trainable params: 0

Training and loss/ accuracy

from keras.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam
from keras.callbacks import EarlyStopping, ModelCheckpoint

adam = Adam() #Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False) default epsilon = 1e-07
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mae', 'acc'])

earlystop = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=10, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath='best_my_model.hdf5')
callbacks_list = [earlystop, checkpointer]

hist = model.fit(X_train, y_train, batch_size=16, epochs=3000, verbose=2,  callbacks=callbacks_list, validation_split=0.2, shuffle=True)

model.save('my_model.h5')

Epoch 44/3000 - 145s - loss: 0.0010 - mean_absolute_error: 0.0239 - acc: 0.8002 - val_loss: 9.6063e-04 - val_mean_absolute_error: 0.0219 - val_acc: 0.8037
Epoch 45/3000 - 140s - loss: 0.0010 - mean_absolute_error: 0.0237 - acc: 0.8020 - val_loss: 9.4013e-04 - val_mean_absolute_error: 0.0217 - val_acc: 0.7827
Epoch 46/3000 - 145s - loss: 0.0010 - mean_absolute_error: 0.0235 - acc: 0.7967 - val_loss: 9.1961e-04 - val_mean_absolute_error: 0.0214 - val_acc: 0.7944
Epoch 47/3000 - 147s - loss: 0.0010 - mean_absolute_error: 0.0237 - acc: 0.8072 - val_loss: 9.0973e-04 - val_mean_absolute_error: 0.0213 - val_acc: 0.7874
Epoch 48/3000 - 148s - loss: 9.7474e-04 - mean_absolute_error: 0.0231 - acc: 0.8078 - val_loss: 9.3755e-04 - val_mean_absolute_error: 0.0216 - val_acc: 0.8084
Epoch 49/3000 - 149s - loss: 9.7831e-04 - mean_absolute_error: 0.0232 - acc: 0.7956 - val_loss: 8.6974e-04 - val_mean_absolute_error: 0.0210 - val_acc: 0.8154
Epoch 00049: early stopping

model accuracy model loss

Step 2: Compare various CNN models

I tried many different model to get the better validation accuracy. We can check this through the below the table.

Condition: MSE loss function, Adam optimizer, Early Stopping, ReLu Activation (except final layer), kernel size = 3x3, Dropout(Fully Connected layer) = 50%
Note: the numbers on structure indicate the filter numbers
ex. C(32): Convolution2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)
MaxPooling2D(pool_size=2)
ex. C(32x2) = Convolution2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)
Convolution2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’)
MaxPooling2D(pool_size=2)
ex. D(256) = Dense(256, activation=’relu’)

Structure	Last Activation	Batch Size	Max Acc
C(32)-C(64)-C(128)-D(256)-D(1000)-D(30)	Tanh	256	71.96%
C(32)-C(64)-C(128)-D(512)-D(30)	Tanh	256	71.50%
C(32)-C(64)-C(128)-D(512)-D(30)	X	256	71.03%
C(16)-C(32)-C(64)-C(128)-D(512)-D(30)	Tanh	256	71.96%
C(16)-C(32)-C(64)-C(128)-D(512)-D(30)	X	256	74.30%
C(16)-C(32)-C(64)-C(128)-D(512)-D(30)	X	64	74.77%
C(16)-C(32)-C(64)-C(128)-D(512)-D(30)	X	16	78.74%
C(16)-C(32)-C(64)-C(128)-D(512)-D(30)	Tanh	16	75.93%
C(16x2)-C(32x2)-C(64x3)-C(128x3)-D(512)-D(30)	Tanh	16	81.54%

Step 3: Face Detection with OpenCV and wear sunglasses on the faces through camera stream.

To feed the face into CNN model, we should insert 96x96 face image (gray) with normalization into input layer. So, I converted the color the original image to gray, extract the face using OpenCV cascade classifier and normalize the face image between 0.0 - 1.0. Finally I loaded a sunglasses image and place the sunglasses on face by using face Keypoints.

# Load in sunglasses image - note the usage of the special option
# cv2.IMREAD_UNCHANGED, this option is used because the sunglasses
# image has a 4th channel that allows us to control how transparent each pixel in the image is
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
sunglasses = cv2.imread("images/sunglasses_4.png", cv2.IMREAD_UNCHANGED)

# Plot the image
fig = plt.figure(figsize = (6,6))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.imshow(sunglasses)
ax1.axis('off');

model loss

face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
def wear_suglasses(face_image, suglasses_image):
    gray_image = np.copy(face_image)
    gray_image = cv2.cvtColor(face_image, cv2.COLOR_RGB2GRAY)
    faces = face_cascade.detectMultiScale(gray_image, 1.25, 6)

    faces_cnn_data = np.ndarray(shape=(len(faces), 96, 96, 1), dtype='float', order='F')

    for i, (x,y,w,h) in enumerate(faces):
        faces_cnn_data[i, :, :, 0] = cv2.resize(gray_image[y:y+h, x:x+w], (96, 96), cv2.INTER_LINEAR) / 255

    faces_keypoints = model.predict(faces_cnn_data)

    for i, (x,y,w,h) in enumerate(faces):
        # rescale keypoints on face
        faces_keypoints[i, 0::2] = faces_keypoints[i, 0::2] * (w/2) + (w/2) + x
        faces_keypoints[i, 1::2] = faces_keypoints[i, 1::2] * (h/2) + (h/2) + y
        faces_cnn_data[i,:,:,0] = cv2.resize(gray_image[y:y+h, x:x+w], (96, 96), cv2.INTER_LINEAR) /  255

        # find sunglasses width and height
        w_sunglasses = int(np.linalg.norm(np.array(faces_keypoints[i, 5*2:5*2+2]) - np.array(faces_keypoints[i, 3*2:3*2+2])) * 1.20)   
        h_sunglasses = int(suglasses_image.shape[0] / suglasses_image.shape[1] * w_sunglasses) # keep the w/h ratio

        # resize sunglasses
        resized_sunglasses = cv2.resize(suglasses_image[:,:,:], (w_sunglasses, h_sunglasses), cv2.INTER_LINEAR)

        # make mask for transparent area
        sunglasses_mask = resized_sunglasses[:,:,3] > 0

        # get suglasses x, y
        sunglasses_x = int(((faces_keypoints[i, 2*2] + faces_keypoints[i, 4*2]) / 2) - (w_sunglasses / 2))
        sunglasses_y = int(((faces_keypoints[i, 2*2 + 1] + faces_keypoints[i, 4*2 + 1]) / 2) - (h_sunglasses / 2))

        # replace the sunglasses on face
        face_image[sunglasses_y:sunglasses_y+resized_sunglasses.shape[0],
                     sunglasses_x:sunglasses_x+resized_sunglasses.shape[1],:][sunglasses_mask] = \
                     resized_sunglasses[:,:,0:3][sunglasses_mask]

    return face_image

import cv2
import time
from keras.models import load_model
import numpy as np

def laptop_camera_go():
    # Create instance of video capturer
    cv2.namedWindow("face detection activated")
    vc = cv2.VideoCapture(0)

    # try to get the first frame
    if vc.isOpened():
        rval, frame = vc.read()
    else:
        rval = False

    # Keep video stream open
    while rval:
        # Plot image from camera with detections marked
        frame = wear_suglasses(frame, sunglasses)
        cv2.imshow("face detection activated", frame)

        # Exit functionality - press any key to exit laptop video
        key = cv2.waitKey(20)
        if key < 255: # exit by pressing any key
            # Destroy windows
            cv2.destroyAllWindows()

            for i in range (1,5):
                cv2.waitKey(1)
            return

        # Read next frame
        time.sleep(0.05)             # control framerate for computation - default 20 frames per sec
        rval, frame = vc.read()

# Load facial landmark detector model
model = load_model('my_model.h5')

# Run sunglasses painter
laptop_camera_go()

model loss

Source Code

Reference

[1] Facial Keypoints Detection | Kaggle. (n.d.). Retrieved from https://www.kaggle.com/c/facial-keypoints-detection/data

Gil's Blog