Face Recognition

One other very important application of Convolutional NN is face recognition.

Face Recognition is not the same as Face Verification:

  • FR has a database of K persons
  • Gets an input image
  • Output ID if the image is any of the K persons (or person not recognized)
  • In FR you only get one input image and you need to identify it, this is called a One Shot Learning and NN are not good at just one input data which is a tiny dataset
  • This is solved by learning a similarity function
  • So we measure the (d) = degree of difference between the input and the images in the database
  • Then you compare the image to all the others and select the one with the smallest d
  • We can do that with the Siamese Network

Siamese Network

  • We input an image X1 and end up with a feature vector of 128 units
  • Then you feed the second image through the network and arrive at the output feature vector
  • So now we have two encodings of the two images then we calculate the difference between the two encodings
  • Note these two networks below have the same parameters because all we do in the second instance is just feed an image

  • In a nutshell we have to learn the parameters so that the difference is small if xi and xj are the same person

One way to learn the parameters so they give us a good encoding for our pictures of faces, is to define and apply gradient descent on the triplet loss function.

Triplet Loss Function


To learn the parameters of the NN, you have to look at several pictures at the same time. So you want to use images of the same person. So you want to look at one anchor image and one positive image (meaning of the same person) and you calculate the difference. Then you want to compare the anchor image to a negative which has a much larger difference. So we have an anchor, positive and negative APAN. Where you want the distance of the A & P to be less than the distance of the A & N.

In the image below we want the difference to be much smaller than 0. We set a margin \(\alpha\) to force the distance pairs to be further apart from each other.

So you have to have multiple images of the same person or it will not work.

In order to have an effective NN we need to choose hard to train triplets and not randomly, because randomly chosen ones might contain some that are an easy match

So you choose pairs of same and different persons to train a triplet loss NN