
Hideyuki Komaki
UW Seattle
It has been more than a year since the WHO declared a pandemic on March 11, 2020. According to the WHO, there have been more than 170 million confirmed cases of COVID-19 globally, including 3.69 million deaths. The importance of wearing a face mask in everyday life is emphasized since COVID-19 virus spreads through respiratory droplets. According to the scientific journal the Lancet, the effect of reducing the risk of infection is about 85 percent if the mask is properly worn. In relation to that, we were curious whether we can build a model that detects people in images wearing face masks or not? If people in the image are wearing a face mask, can the model detect if the mask is properly worn? We used Face Mask 12k Kaggle Dataset to get photos of people with face masks and without face masks.
Our model detects face regions from a photo, crop the face image and classify if the face wears a mask or not. Then, the model detects if people in the image are wearing a mask properly by detecting nose position. In order to successfully perform this process, three steps are required.
|
|
|
All source code used in this experiment can be found in the experiment folder in the "face-mask-detection" repository on the GitHub
By comparing two images after applying the face detection model, we were able to see which model performed better. While the Haar Cascade model detected only two faces out of many people in the image, the MTCNN model detected almost everyone in the image. The MTCNN model even detected the face of a person with her head down.
The untrained ResNet model and the pretrained ResNet model showed a distinct difference in train loss and test accuracy. Based on the graphs, the train loss of the untrained model was much higher than the pretrained model, and the test accuracy of the untrained model was around 67%, much lower than the test accuracy, 99.7%, of the pretrained model.
ResNet18: Avg. accuracy is 99.496 %![]() |
ResNet34: Avg. accuracy is 99.899 %![]() |
VGG16: Avg. accuracy is 99.798 %![]() |
VGG19: Avg. accuracy is 100%![]() |
We tested four models for face mask detection, ResNet-18, ResNet-34, VGG16, and VGG19, to find the model that performs the best. All the models performed over 99% accuracy, but the VGG19 had the highest accuracy of 100%. From the VGG16’s train loss graph, we can see that train loss is lower than validation accuracy starting epoch 6. From this we can conclude that the VGG16 is overfitting. This can be fixed with more dataset and more images; however, that may cause the accuracy of the model to fluctuate from what it is now. For VGG19, it is currently at 100% accuracy, but it is only for the limited datasets that we have picked. When we test it on other datasets, the accuracy can change. Nonetheless, based on these results, we can see that the VGG19 has the best performance in face mask detection.
Due to the pandemic situation, it is impossible to enter public facilities without wearing a face mask. It is not efficient to check people one by one to see if they are wearing a mask or not. Thus, the face mask detection model can be developed into a program that automatically determines if the person is properly wearing a mask or not by using a camera.
Like we saw from earlier, changing from Haar Cascade Classifier to MTCNN was a game changer. It detected way more people and the people in the image did not need to be facing front. However, we are currently using the Haar Cascade Classifier to tell whether people are properly wearing the mask or not. This is because MTCNN is too accurate and can tell people’s noses and mouths even under the mask! As a result, it classified a person properly wearing a mask as “not properly wearing a mask”. Thus, our future work of this project consists of finding a good middle point of mixing Haar Cascade Classifier and MTCNN and making a smarter model by training with more datasets.
The difference between an untrained model and a pretrained model was immense. The train loss of an untrained model was above 1, but the train loss of a pretrained model was below 1. The accuracy of an untrained model peaked at 67%, but the accuracy of a pretrained model was averaging 99%. Additionally, the pretrained model seemed more stable than the untrained model. For the pretrained model, train loss kept decreasing as epoch increased unlike untrained model’s fluctuating train loss. Same goes for accuracy; The pretrained model had a steady 99%, but the untrained model often fluctuated between 67% and 33%.
Though we settled with VGG, ResNet was not strictly worse than VGG. ResNet still had 99.5% accuracy, which is still amazing. Additionally, ResNet was faster than VGG. According to the result photos below, VGG can detect face masks more accurately than ResNet especially in crowded places, and both models can sometimes find the improper way of wearing masks without using Haar Cascade nose detection.
For a mask that makes a person look like the person is not wearing a mask (look at 4th photo), we noticed that ResNet classifies the person as not wearing a mask. VGG was better than ResNet because it successfully classified the person as wearing a mask. However, the Haar Cascade Classifier somewhat failed and indicated that the mask was below the nose. Thus, as mentioned before, we would need to find a good middle point of mixing Haar Cascade Classifier and MTCNN and making a smarter model by training with more datasets.