Hoosthere - An embedded security system based on face recognition

I. Abstract

The project mainly aims to design and develop an embedded system that detects and recognizes human faces collected from the live video stream by implementing a state-of-the art Human Face Recognition application. Furthermore, the project is designed to combine several major technologies and concepts, such as Internet of Things (IoT) using Raspberry Pi board with its complementary modules, namely Pi Camera and LCD touch screen, and Mobile Application Development using Swift 3 on iOS mobile devices. In doing so, the system collects the visual inputs via the camera module attached to the Raspberry Pi board and then performs Human Face Recognition whilst providing real time interaction with the mobile application by means of visual and audio notifications. Furthermore, the project extends its scope to Speaker and Speech Recognition to add additional security layers to the application. In this report, the main motivations and the methods being applied in the projects are discussed and the application is evaluated in terms of achieving an acceptable success rate in face and speaker recognition through robust mobile and IoT application.

II. Introduction

a. Background

Not only has Human Face Recognition (HFR) been one of the hottest research topics in computer science as one of the most challenging and fastest growing area of study in the past several decades, but it has also offered practical solutions for some of the most controversial areas in the society, e.g. information security, surveillance, and identification systems [1][2]. As Sharma et al. indicate, one of the fascinating abilities of human vision and brain is to successfully detect and recognize faces, especially human faces, in various environments, even from unfavorable angles or under poor lighting conditions. Not surprisingly, designing a computer machine that accomplishes the same task successfully to some degree is not an easy task. Thus, Mirhosseini et al. claim that the concept of Human Face Recognition is more likely to maintain, if not improve, its significance in computer vision and image processing research in the future [3]. Nevertheless, computer scientists, including Turk et al., Kirby et al. and Belhumeur et al., have been fortunate enough to solve this problem to a certain degree and proposed various methods and theories over the last two decades. Despite there are a large number of different algorithms and methods proposed for human face recognition, nearly all approaches follow the same workflow for face recognition. That is, to detect and therefore to identify human faces in given video inputs by computing algorithms over a database of faces, often called training sets. This process often involves face detection from captured scenes, and feature extraction from various regions of faces, e.g. eyes, nose, eyebrows, and mouth. Over the last two decades, with the rapid improvements in the field of machine learning and artificial intelligence technologies, as Rowley et al. revealed, the focus has shifted to the design of machine learning algorithms that automatically detects and recognizes faces after being trained over a large set of images [9].

b. Literature Review

Among current alternatives, appearance-based face recognition methods are usually considered as fairly efficient and accurate ways of HFR. The Eigenfaces and the Fisherfaces methods are two of the leading Appearance-based face recognition approaches, and with the emergence of Deep Neural Network Methods, they are currently being debated in terms of their efficiency and accuracy. The Eigenfaces method, which is based on Principal Component Analysis (PCA) algorithm, offers a way of recognizing human faces by expressing them as linear combinations of the singular vectors for each set of faces. On the other hand, The Fisherfaces method aims for dimension reduction for distinguishing images by applying Linear Discriminant Analysis (LDA) algorithms. In this project, LDA, the Fisherfaces, is chosen due to its practicability, accuracy, and capability to recognize human faces under various experimental conditions, such as different photo angles, light conditions, and processing performances. The Fisherfaces method proposes a method for reducing dimensions by means of a linear projection yet still satisfy the linear separation within projected vectors. Linear Discriminant Analysis (LDA), often used interchangeably with the Fisherfaces method, aims to distinguish the classes in the data by trying to shape the scatter both within and between the classes to provide a better classification method.

More formally, the Fisherfaces defines the scatter between classes, between-class scatter, and the scatter within classes, within-class scatter. However, as defined by Belhumeur et al., the main difference between these two approaches is that the Fisherfaces method applies class-specific linear projection on the sample set [4]. Then, the optimal projection, Wopt, is now calculated by choosing the orthonormal columns of the matrix that maximize the ratio of the determinant of the between-class scatter and the determinant of the within-class scatter [4][5]. The variance among the faces in the sample image space may be related to some external factors, such as facial expressions, using accessories or changing illumination. Moreover, these variations due to such external factors may beat the variances among standard face, though not desired. Hence, LDA seeks for a basis for linear projection model that minimizes the variation within classes as well as preserving the variations between classes. Instead of modeling the variation by applying linear projection to the images, as in the Eigenfaces model, rather it does a linear projection of the given sample set of images into a subspace of a lower dimension in a way that it highlights the deviations between classes. In doing so, Belhumeur et al. argue that the Fisherfaces model eliminates the influences of external factors to a certain degree by minimizing the variations between classes, and therefore it enhances the accuracy of the face recognition applications under varying conditions.

As for Face Detection, Haar-like feature technique offers a quite practicable and computationally efficient way to detect faces, and object in general, on a given image. Intuitively, the feature calculation and extraction is computationally expensive when working solely based on image intensities, i.e. RGB values at each pixel. As Papageorgiou et al [7]. propose, the feature set based on Haar wavelets can be used to accomplish such tasks with less computational effort, compared to generic image intensities methods. Viola and Jones has adopted this concept and established their method based on Haar wavelets, also known as Haar-like features [6]. To put it simply, Haar-like features method analyzes rectangular regions at specific locations in the image and it calculates the pixel intensities and therefore difference between distinctly patterned areas on an image. Since eyes are relatively darker than cheeks, and noses and eyebrows are lighter than eyes, they are likely to constitute two adjacent Haar-like rectangles when Haar-like feature method is applied. Further, these assumptions can be used to build the model for specific tasks in image processing to decrease the computational effort made for image and object detection on images, like face detection.

In the project, Haar-like features model, interchangeably used with the term Haar-cascade, is preferred due to its advantages over other methods, such as requiring less computational effort, and being flexible to define new Haar models, later the project utilizes this feature to implement detection algorithm for cats.

III. System Design

a. Software Design

The project is mainly composed of 3 main components in terms of software design, namely the face detection and recognition algorithm on Raspberry Pi 3 Board, the backend on the cloud, and the mobile application that enables users to interact with their visitors in real time. To start with, the Raspberry Pi 3 board hosts the main codes, based on Python and OpenCV libraries, for the face and speaker recognition algorithms of the application. The main reason of deploying such parts of the application that require the most computational effort was the concerns over security related the user data. In doing so, the project ensures that the data of each visitor is securely protected and not shared against the will of any user. However, this approach comes at a cost, that is the slow performance on a large number of data set on the Pi board due to its computational limits. As a matter of fact, the application is more likely to perform better and much faster if the cloud services are utilized in this manner, or even better recognition methods, e.g. Deep Neural Networks, can be applied by utilizing the computational performance on the cloud. Nonetheless, the application is observed to perform at an acceptable rate of accuracy and performance speed under varying conditions, such as different number of sample sets and number of people in the system.

As for the cloud services, Amazon Web Services (AWS) are chosen as the main and only cloud services provider due to various reasons, namely the free tier packages available for non-commercial use, the deployment speed and performance on the cloud, high scalability if needed, and high security measures. In the project, the main AWS products utilized are Simple Notification System (SNS), EC 2 Amazon Linux Server, and Simple Bucket Storage (S3), all provided under the umbrella of Elastic Beanstalk (EB). While EB provides the backbone of the backend by gathering some of the most useful features of AWS in one package, all these components listed previously can be modified individually. Speaking of individual components, EC2 serves as the main Linux host for the cloud; SNS provides the notification system for the application; and S3 simply holds the instantaneous data being collected by the cloud before sending it to the mobile Application. Last but not least, Django, an open source application development framework for Python language, is chosen for coding the backend due to its available libraries, community and its flexibility to work with other languages and technologies. On the mobile side, considering that the team members feel more comfortable on iOS platform, iOS and Swift 3 was the mobile platform and the language used throughout the project. Despite being a relatively simple application, the mobile application is capable of accomplishing its designed tasks very successfully, e.g. notifying the users when needed, storing the information of recent visitors, and allowing house owners to communicate with the visitor in real time via messages.

b. Hardware Design

Compared to the software design of the project, the hardware design seems relatively simple and consists of less number of parts in the structure. Nevertheless, all the components that functions in the hardware is crucial for the application as they compose the physical interface of the project. The main parts being used are Raspberry Pi 3 Model 2B, Raspberry LCD Touch Screen, and Pi Camera. Further, some extra components are added to the artefact for enhancing the appearance and extending the scope of the project, e.g. 3D printed box, sound adapter and a microphone. Despite some difficulties have been experienced throughout the project course, overall results are considered as fairly satisfying in terms of both appearance and functionality.

The two major leading obstacles encountered within the project can be listed as creating the 3D printed box and functionality issues related to touch screen. Firstly, the project required a series of revisions on the box design to be printed in parallel to the number of modifications made on the hardware design. Though, we were quite comfortable with designing the box a few times from scratch, the main issue was our inability to print the designs in the campus. Nonetheless, the box design was realized after several attempts on both design and printing. Secondly, the application faces functionality problems due to some issues with the touch screen, e.g. non-functional touch screen and late responses.

To draw the 3D design for the hard cover, Siemens NX, which is a software tool for an integrated product design, engineering and manufacturing systems, is chosen. After carefully measuring the dimensions and the sizes of the hardware, the box is designed to fit tightly to the inner walls of the box in the most stable way. Further, several holes both are placed both on the sides and on the front cover to enable the input slots for the Pi Board as well as putting the camera module above the touch screen. IV. Analysis & Results At the end of the semester, we have delivered three major products, which are a Raspberry Pi with a screen and a camera that is capable of recognizing human faces and interacting with the cloud, a mobile application that allows users to control Raspberry Pi and a backend server that acts as a bridge between the Raspberry Pi and the mobile application. Moreover, in terms of performance and features, we have delivered more than we promised in our proposal. Three major parts of the project will be analyzed step by step through the rest of this section. First and the most important part of the project is the customized Raspberry Pi and the software running on it. We created a black box with Raspberry Pi with a camera and a touch screen. Every hardware worked flawlessly throughout the whole process except touch feature of the screen. We had to use mouse when touch feature stopped working. Software running on the device was completely written in Python. There are several stages that are done repeatedly while the system is working.

1. Face and cat detection: Haar-cascade method is utilized for face detection and cat detection. We set it to require at least 13 feature points to match for detection. For face detector training we used MUCT Face Database [8] and for cat detector training we used The Oxford-IIIT Pet Dataset [9]. With these configurations, we got remarkably well true positive test results 99.6% and 98.8% for human faces and cats, respectively. Detection on Raspberry takes less than 100 milliseconds for a single video frame on average.
1. Face recognition: As mentioned earlier, we implemented Fischer Face Recognition algorithm for this job. It resulted highly acceptable results. During the algorithm tests we have seen that true positive rate is directly proportional to number of training images whereas it is inversely proportional to number of classes, or residents. Face recognition process takes 138 milliseconds per face on average. You can see the test results in Table 1 with respect to changing variables. We used 50 images which are different than training images for every person at each test.
1. Voice recognition: We also added voice recognition layer as an extra security level. Its effect on test results can be seen in Table 2. Voice recognition takes 8 seconds on average depending on the length of test sentence.

Secondly, we developed an iOS application to give user control over the Raspberry Pi. Basically, the app notifies you when there is a visitor, it allows you to communicate with the visitor. It lists every visit in the main page. It gets information from the AWS services and sends the information to there. Unless there is a problem with internet connection, all the interactions between phone and the cloud will be completed under 200 milliseconds. Lastly, we created a backend service on AWS systems with Python frameworks. The service has all the information in database. The communication between the cloud and the phone or Raspberry is done via REST API endpoints.

V. Conclusions

As a result, Hoosthere have successfully fulfilled all the promises in the proposal form which were creating a mobile app, implementing face recognition on Raspberry Pi and establishing a solid backend service. Hence, it accomplishes these tasks flawlessly. Additionally, we have implemented a feature to detect cat and grant different type of access. We allowed our users to see their visitors’ short videos on the mobile phone to add another security layer to our application. Moreover, speaker recognition feature, which is added later in the project, increases the security level remarkably. Overall, we believe that we have created an end-to-end product with exciting features that can be directly delivered to customer. In the future, we want to improve our product by changing our face recognition algorithm from Fischer face method to more reliable neural network model. Also, we can add infrared sensor to be used during face recognition process to check whether the recognized person is real or just a picture. Thus, we can provide 100% secure system that does not require any extra algorithm. Lastly, instead of using Raspberry that comes with a lot of unnecessary features, we can design and produce our own circuit board. In conclusion, although the project has encountered a large number of difficulties during its course, we are proud to develop fully working product that can be delivered to end user easily.

VI. References

[1] Sharma, R., and M. S. Patterh. "A broad review about face recognition – feature extraction and recognition techniques." The Imaging Science Journal 63.7 (2015): 361-77.
[2] Wang, Xing, Meng Yang, and Linlin Shen. "Structured regularized robust coding for face recognition." Neurocomputing 216 (2016): 18-27.
[3] Mirhosseini, Ali Reza, Hong Yan, Kin-Man Lam, and Tuan Pham. "Human Face Image Recognition: An Evidence Aggregation Approach." Computer Vision and Image Understanding 71.2 (1998): 213-30.
[4] Belhumeur, P. N., Hespenha, J. P., AND Kriegman, D. J. “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection.” IEEE Trans. Patt. Anal. Mach. Intell. 19, (1997). 711–720.
[5] M. Sharkas, M.A. Elenien, “Eigenfaces vs. Fisherfaces vs. ICA for face recognition: a comparative study”, in: Proc. Signal Proc., (2008).
[6] Viola and Jones, "Rapid object detection using a boosted cascade of simple features." Computer Vision and Pattern Recognition, (2001).
[7] Papageorgiou, Oren and Poggio, "A general framework for object detection." International Conference on Computer Vision, (1998).
[8] "The MUCT Face Database", Milbo.org, 2017. [Online]. Available: http://www.milbo.org/muct/. [Accessed: 31- May- 2017].
[9] “Visual Geometry Group: Oxford-IIIT Pet Dataset", Robots.ox.ac.uk, 2017. [Online]. Available: http://www.robots.ox.ac.uk/~vgg/data/pets/. [Accessed: 31- May- 2017].

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
backend/hoo		backend/hoo
cognitive_sr		cognitive_sr
data		data
deprecated		deprecated
web/video		web/video
.gitignore		.gitignore
1model.mdl		1model.mdl
App.py		App.py
BingSpeechAPI.py		BingSpeechAPI.py
Network.py		Network.py
README.md		README.md
Recognizer.py		Recognizer.py
View.py		View.py
aa.py		aa.py
app.desktop		app.desktop
app.sh		app.sh
app_script.sh		app_script.sh
bcssns.py		bcssns.py
data.tar.gz		data.tar.gz
face_collect.py		face_collect.py
haarcascade_frontalcatface.xml		haarcascade_frontalcatface.xml
haarcascade_frontalface_default.xml		haarcascade_frontalface_default.xml
header("Location: https:::www.youtube.com:watch?v=		header("Location: https:::www.youtube.com:watch?v=
icon.png		icon.png
model copy.mdl		model copy.mdl
model.mdl		model.mdl
sound_recog.wav		sound_recog.wav
sound_recorder.py		sound_recorder.py

the-robot/Hoosthere

Folders and files

Latest commit

History

Repository files navigation

Hoosthere - An embedded security system based on face recognition

I. Abstract

II. Introduction

a. Background

b. Literature Review

III. System Design

a. Software Design

b. Hardware Design

V. Conclusions

VI. References

About

Resources

Stars

Watchers

Forks

Languages