Approach

Our model detects face regions from a photo, crop the face image and classify if the face wears a mask or not. Then, the model detects if people in the image are wearing a mask properly by detecting nose position. In order to successfully perform this process, three steps are required.

First Phase: Face detection
The MTCNN face detection model of facenet-pytorch is used for detecting the face regions in the image.
We first tried to use the Haar Cascade Classifier for the face detection, but it sometimes failed when the person in the image was not facing the front. We figured out that the MTCNN model solves this problem, so we decided to use the MTCNN model instead of Haar Cascade Classifier (see “Experiments & Results” section for more detail).

Second Phase: Face mask detection
For the face mask detection, VGG19 is used. VGG19 can classify if people wear the mask pretty accurately so far.
First, we built two ResNet models to see which model performs better in detecting face masks. One model used an untrained model, and the other model used a pretrained model (transfer learning). The untrained model also used a smaller dataset than the pretrained model. The pretrained ResNet34 model showed higher accuracy (99.6%) than the untrained ResNet50 model (69%), so we decided to use the pretrained ResNet model for face mask detection. However, when we conduct transfer learnings with VGG16 and VGG19, we got interesting results. Even though the VGG16 overfitted, the VGG19 returned a slightly higher test accuracy than the ResNet34 model. The VGG19 model even had an 100% accuracy, but we need to keep in mind that the accuracy could be different if we use other datasets (see “Experiments & Results” section for more detail).

Final Phase: Face mask properly worn
To wear a face mask properly, the nose and the mouse should be completely covered. Thus, the Haar Cascade Classifier is used for detecting if the person in the image is wearing the mask properly by detecting the nose and the mouth of the person. As mentioned before, the Haar Cascade Classifier has a problem when the person is not facing front. Thus, the different model should be used in the future for the cases when the mask is not properly worn, such as wearing a mask below the nose or wearing it under the chin and putting it on the neck in higher accuracy.

This large 19GB dataset labeling different ways of wearing a mask (such as wearing a mask under their nose and chin) could be the potential dataset that makes our mask detection smarter.

Experiments & Results

All source code used in this experiment can be found in the experiment folder in the "face-mask-detection" repository on the GitHub

a) Face detection: Haar Cascade vs. MTCNN

By comparing two images after applying the face detection model, we were able to see which model performed better. While the Haar Cascade model detected only two faces out of many people in the image, the MTCNN model detected almost everyone in the image. The MTCNN model even detected the face of a person with her head down.

b) Face mask detection: Pretrained ResNet vs. Untrained ResNet

The untrained ResNet model and the pretrained ResNet model showed a distinct difference in train loss and test accuracy. Based on the graphs, the train loss of the untrained model was much higher than the pretrained model, and the test accuracy of the untrained model was around 67%, much lower than the test accuracy, 99.7%, of the pretrained model.

c) Face mask detection: ResNet-18 vs. ResNet-34 vs. VGG16 vs. VGG19

ResNet18: Avg. accuracy is 99.496 %	ResNet34: Avg. accuracy is 99.899 %
VGG16: Avg. accuracy is 99.798 %	VGG19: Avg. accuracy is 100%

We tested four models for face mask detection, ResNet-18, ResNet-34, VGG16, and VGG19, to find the model that performs the best. All the models performed over 99% accuracy, but the VGG19 had the highest accuracy of 100%. From the VGG16’s train loss graph, we can see that train loss is lower than validation accuracy starting epoch 6. From this we can conclude that the VGG16 is overfitting. This can be fixed with more dataset and more images; however, that may cause the accuracy of the model to fluctuate from what it is now. For VGG19, it is currently at 100% accuracy, but it is only for the limited datasets that we have picked. When we test it on other datasets, the accuracy can change. Nonetheless, based on these results, we can see that the VGG19 has the best performance in face mask detection.

Discussion

Due to the pandemic situation, it is impossible to enter public facilities without wearing a face mask. It is not efficient to check people one by one to see if they are wearing a mask or not. Thus, the face mask detection model can be developed into a program that automatically determines if the person is properly wearing a mask or not by using a camera.

Changing from Haar cascade to MTCNN

Like we saw from earlier, changing from Haar Cascade Classifier to MTCNN was a game changer. It detected way more people and the people in the image did not need to be facing front. However, we are currently using the Haar Cascade Classifier to tell whether people are properly wearing the mask or not. This is because MTCNN is too accurate and can tell people’s noses and mouths even under the mask! As a result, it classified a person properly wearing a mask as “not properly wearing a mask”. Thus, our future work of this project consists of finding a good middle point of mixing Haar Cascade Classifier and MTCNN and making a smarter model by training with more datasets.

Pre Trained vs Untrained

The difference between an untrained model and a pretrained model was immense. The train loss of an untrained model was above 1, but the train loss of a pretrained model was below 1. The accuracy of an untrained model peaked at 67%, but the accuracy of a pretrained model was averaging 99%. Additionally, the pretrained model seemed more stable than the untrained model. For the pretrained model, train loss kept decreasing as epoch increased unlike untrained model’s fluctuating train loss. Same goes for accuracy; The pretrained model had a steady 99%, but the untrained model often fluctuated between 67% and 33%.

ResNet vs VGG (VGG is more accurate but ResNet is faster)

Though we settled with VGG, ResNet was not strictly worse than VGG. ResNet still had 99.5% accuracy, which is still amazing. Additionally, ResNet was faster than VGG. According to the result photos below, VGG can detect face masks more accurately than ResNet especially in crowded places, and both models can sometimes find the improper way of wearing masks without using Haar Cascade nose detection.

Special case

For a mask that makes a person look like the person is not wearing a mask (look at 4th photo), we noticed that ResNet classifies the person as not wearing a mask. VGG was better than ResNet because it successfully classified the person as wearing a mask. However, the Haar Cascade Classifier somewhat failed and indicated that the mask was below the nose. Thus, as mentioned before, we would need to find a good middle point of mixing Haar Cascade Classifier and MTCNN and making a smarter model by training with more datasets.

Presentation Video

References

Datasets

face-mask-12k-images-dataset: this dataset consists of almost 12K images and is used to train our models (ResNet18, ResNet34, VGG16, and VGG19).
With/Without Mask: this dataset consists of almost 1K images and is used to train our first ResNet50.

Source code

face-mask-detection: this repository contains source codes for the website, the face mask detector app, and Google Colab used in experiments.

Models

Face Recognition Using PyTorch: this repository contains MTCNN model which we used to detect faces on a photo.
TORCHVISION.MODELS: this document contains how to call pre-trained VGG16/19 and ResNet18/34 models from TorchVision module.
Repository for OpenCV's extra modules: this repository contains the Haar-Cascade model in XML file that we used it to detect someone's nose in a photo.
Deep Residual Learning for Image Recognitionthe research paper introducing ResNet
Very Deep Convolutional Networks for Large-Scale Image Recognition: the research paper introducing VGG

Others

An Efficient Face Mask Detector with PyTorch and Deep Learning: this research paper inspired us to use ResNet in the first experiment.

Detect Face Masks
In Photos Using
Computer Vision

Approach

Experiments & Results

a) Face detection: Haar Cascade vs. MTCNN

b) Face mask detection: Pretrained ResNet vs. Untrained ResNet

c) Face mask detection: ResNet-18 vs. ResNet-34 vs. VGG16 vs. VGG19

ResNet18: Avg. accuracy is 99.496 %

ResNet34: Avg. accuracy is 99.899 %

VGG16: Avg. accuracy is 99.798 %

VGG19: Avg. accuracy is 100%

Discussion

Changing from Haar cascade to MTCNN

Pre Trained vs Untrained

ResNet vs VGG (VGG is more accurate but ResNet is faster)

Special case

Presentation Video

Results

ResNet34

VGG19

References

Datasets

Source code

Models

Others

Our Team

Hideyuki Komaki

Elizabeth Lin

Christopher Kim

Hyeona Jang