Once the preprocessing is done, the HOG Feature vectors are created for these images. Go to the link for a good resource to understand
how the Histogram of Oriented Gradients work.
After getting the HOG feature vectors for these images, they are fed to a Support Vector Classifier using the
scikit-learn. A short intro to
SVM. We have used
polynomial kernel of degree 3. It performed the best out of all.
This fits the data to a model that can be later used predict whether the object to be detected (phone) is present in the
image or not.
But the work is not yet done. The model still contains a lot false positives. In order to get rid of the same, we use
a method called Hard Negative Mining. As the name suggests, we mine for false positives, i.e. those images that actually
don't contain the object yet have been identified as positive by our model. So we run a predictor using the model we
just created on the negative images. And the images for which we get the predictor output as positive image, we add
that image's feature vector to the negative feature vectors and again train the model. Now the new trained model is
robust to false positives. This model is saved as a .joblib file that can be imported later for predicting whether the
phone is present or not.
Now for detecting whether a phone is present in a given image or not, the image is again divided into same number of
grids like the traning stage. HOG feature vectors are found for all the grids and fed to the model for prediction. If
prediction is false for all the grids, then it means that the phone is not observed in the image. If there are multiple
adjoining grid cells with prediction true, the cell with maximum probability is selected as the location of the phone.
|