Sunday, June 9, 2013

Improving an Object Detector and Extracting Regions using Superpixels (CVPR 2013)

As usual there will be shortpapers in our journal club on Thursday. The one I picked titled "Improving an Object Detector and Extracting Regions using Superpixels" is from the upcoming CVPR and the authors are Guang Shu, Afshin Dehghan, Mubarak Shah from the Computer Vision Lab at the University of Central Florida. A pdf can be obtained here.
Their goal is to improve offline-trained object detectors such as Deformable Parts Model (DPM), that are trained generally on training data that do not neccessarily represent future test data. Variant illuminations, background and camera viewpoints will degrade detector performance. The authors' approach can be subsumed in four steps:

Initial Detection

Using a DPM detector with a low threshold \(t_d\) (design parameter) they obtain a large number of true detections and also many false alarms. Given the detector's confidence scores they classify each target positive or hard. Negative examples are obtained from background (without overlap).

Superpixels and Appearance Method

Using SLIC Superpixels each target is segmented into a given number of superpixels \(N_{sp}\)(design parameter, chosen such that each superpixel is roughly uniform in color and preserves object boundaries). Each superpixel is described by a five-dimensional feature vector containg the average CIELAB colorspace value and the average location of all contained pixels.
Using K-Means an \(M\)-word (design parameter) vocabulary is created and superpixels are aggregated into an \(M\)-bin L2-normalized histogram for each target (representation in a Bag-of-Word (BoW)-fashion).

Classification

A support vector machine (SVM) is trained for classification of hard examples, based on positive and negative examples. Hard examples with high scores will get the label positive, low scores will be negative. The SVM is retrained until all example labels stay unchanged.

Region Extraction

To get en estimate of the actual shape of the target rather than a bounding box a confidence map for each superpixel belonging to the target is calculated. First all superpixels of the negative samples are clustered into \(M_n\) (design parameter) clusters using CIELAB color features. For every superpixel of a positive example the similarity to each of the negative cluster centers is measured by
\[ W_{i,j} = exp(||Sp(i) - clst(j)||\times prior(j))\]
where \(W_{i,j}\) is the respective entry in the similarity matrix, \(Sp(i)\) is the i-th superpixel of a positive example, \(clst(j)\) is the j-th negative cluster center, \(prior(j)\) is the prior probability that a cluster belongs to the background, defined by the number of superpixels in that cluster.
The similarity matrix can be used to calculated the confidence \(Q_i\) of superpixel i belonging to the target
\[Q(i) = 1 - max_jW_{i,j}\]
Using this confidence map to form unary potentials \(\Psi(c_i|s_i)\) and pairwise edge potentials (to account for smoothness) \(\Phi(c_i,c_j|s_i,s_j)\), the energy in the sense of a Conditional Random Field (CRF) needs to be minimized, to achieve the most probable solution.
\[E=\sum_{s_i\in Sp}\Psi(c_i|s_i)+\omega\sum_{s_i,s_j\in Edge}\Phi(c_i, c_j|s_i, s_j)\]
\(\Phi\) is specified in the paper. \(c_i\) is the label for each superpixel \(s_i\). The weight between unary and 
binary terms \(\omega\) is another design parameter.

Experiments

The proposed method is compared to the original DPM approach and outperforms it on all datasets. The used values of the design parameters are specified (except for \(\omega\), but the authors do not state how they obtained them.
\[t_d = -2, N_{sp} = 100, M = 400, M_n = 200, \omega=?\]
It would be very interesting to know how problem specific or general those parameters are.

Conclusion

The paper is quite straight forward, well written and easy to understand. Few typos and misaligned images indicate that the paper's been finished just before the deadline.
When I first read the title and abstract, I was hoping for a method that did not need segmentation candidates and would not introduce many design parameters. Both assumptions were wrong (first segmentation using DPM, five design parameters), but the approach might still be interesting to the cell tracking project I am working on.

No comments:

Post a Comment