Segmentation Results: VOC2011 BETA

Competition "comp5" (train on VOC2011 data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.

Average Precision (AP %)

  mean

aero
plane
bicycle

bird

boat

bottle

bus

car

cat

chair

cow

dining
table
dog

horse

motor
bike
person

potted
plant
sheep

sofa

train

tv/
monitor
submission
date
O2P_SVRSEGM_CPMC_CSI [?] 48.867.831.144.535.648.464.163.149.217.354.030.741.459.758.155.133.657.333.649.245.220-Nov-2012
BONN_SVRSEGM [?] 43.354.323.939.535.342.665.453.546.115.047.430.133.948.854.446.428.851.326.244.937.213-Oct-2011
BONN_FGT_SEGM [?] 41.451.723.746.033.949.466.256.241.710.441.929.624.449.150.539.619.944.926.140.041.613-Oct-2011
NUS_SEG_DET_MASK_CLS_CRF [?] 37.741.520.230.429.147.461.247.735.08.538.314.528.636.547.842.528.537.826.443.545.813-Oct-2011
NUS_Context_SVM [?] 35.140.519.028.427.840.756.445.033.17.237.417.426.833.746.640.623.333.423.941.238.605-Oct-2011
Struct_Det_CRF [?] 31.336.618.69.211.029.859.050.325.511.829.024.816.029.147.941.916.134.011.643.331.713-Oct-2011

Abbreviations

TitleMethodAffiliationContributorsDescriptionDate
BONN_FGT_SEGMBONN_FGT_SEGM¹University of Bonn, ²Vienna University of Technology, ³Georgia Institute of TechnologyJoao Carreira¹, Adrian Ion², Fuxin Li³, Cristian Sminchisescu¹We present a joint image segmentation and labeling model which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales using CPMC (Carreira and Sminchisescu, CVPR2010), constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag (Ion, Carreira, Sminchisescu, ICCV11) , followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure (Ion, Carreira, Sminchisescu, NIPS2011).2011-10-13 23:04:11
SVR on CPMC-generated Figure-ground segmentationsBONN_SVRSEGMUniversity of BonnJoao Carreira, Fuxin Li, Cristian SminchisescuWe present a recognition system based on sequential figure-ground ranking. We extract a bag of figure-ground segments using CPMC (Carreira and Sminchisescu, CVPR 2010). The bag is then filtered down to 100 segments using a class-independent ranker. Using these features we learn one nonlinear Support Vector Regressor (SVR) for each category that predicts the overlap between each segment and an object from that category. A complete image interpretation is obtained by sequentially selecting segments using combination and non-maxima suppression schemes. Details can be found in respectively (F. Li, J.Carreira, C. Sminchisescu, CVPR 2010, IJCV11). Additionally, the system is trained with both object segmentation layouts and weak annotations from bounding boxes.2011-10-13 23:21:00
Context-SVM based submission for 3 tasksNUS_Context_SVMNational University of SingaporeZheng Song, Qiang Chen, Shuicheng YanClassification uses the BoW framework. Dense-SIFT, HOG^2, LBP and color moment features are extracted. We then use VQ and fisher vector for feature coding and SPM and Generalized Pyramid Matching(GPM) to generate image representations. Context-aware features are also extracted based on [1]. The classification models are learnt via kernel SVM. Then final classification scores are refined with kernel mapping[2]. Detection and segmentation results use the baseline of [3] using HOG and LBP feature. And then based on [1], we further learn context model and refine the detection results. The final segmentation result uses the learnt average masks for each detection component learnt using segmentation training set to substitute the rectangle detection boxes. [1] Zheng Song*, Qiang Chen*, Zhongyang Huang, Yang Hua, and Shuicheng Yan. Contextualizing Object Detection and Classification. [2] http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/workshop/nuspsl.pdf [3] http://people.cs.uchicago.edu/~pff/latent/ 2011-10-05 09:01:23
Segmentation Using CRF with Detection MaskNUS_SEG_DET_MASK_CLS_CRFNational University of SingaporeWei XIA, Zheng SONG, Qiang CHEN, Shuicheng YAN, Loong Fah CHEONG The solution is based on CRF model and the key contribution is the utilization of various types of binary regularization terms. Object detection also plays a very significant role in guiding semantic object segmentation. In this solution, the CRF model is built to integrate the global classification score and local unary and binary information to perform semantic segmentation. What’s more, the detection masks trained by setting a hard threshold of the detection confidence maps are applied as extra unary and smooth terms in the CRF model. Some of masks with high confidence are also used in the post-processing stage to do some refinement at the mask boundaries.2011-10-13 18:10:08
O2P+SVRSEGM Regressor + Composite Statistical InferenceO2P_SVRSEGM_CPMC_CSI(1) Georgia Institute of Technology (2) University of California - Berkeley (3) Amazon Inc. (4) Lund University Fuxin Li (1), Joao Carreira (2), Guy Lebanon (3), Cristian Sminchisescu (4)We utilize a novel probabilistic inference procedure, Composite Statisitcal Inference (CSI) [1], on semantic segmentation using predictions on overlapping figure-ground hypotheses. Regressor predictions on segment overlaps to the ground truth object are modelled as generated by the true overlap with the ground truth segment plus noise, parametrized on the unknown percentage of each superpixel that belongs to the unknown ground truth. A joint optimization on all the superpixels and all the categories is then performed in order to maximize the likelihood of the SVR predictions. The optimization has a tight convex relaxation so solutions can be expected to be close to the global optimum. A fast and optimal search algorithm is then applied to retrieve each object. CSI takes the intuition from the SVRSEGM inference algorithm that multiple predictions on similar segments can be combined to better consolidate the segment mask. But it fully develops the idea by constructing a probabilistic framework and performing maximum composite likelihood jointly on all segments and categories. Therefore it is able to consolidate better object boundaries and handle hard cases when objects interact closely and heavily occlude each other. For each image, we use 150 overlapping figure-ground hypotheses generated by the CPMC algorithm (Carreira and Sminchisescu, PAMI 2012), SVRSEGM results, and linear SVR predictions on them with the novel second order O2P features (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONN_CMBR_O2P_CPMC_LIN) as the input to the inference algorithm. [1] Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu. Composite Statistical Inference for Semantic Segmentation. CVPR 2013. 2012-11-20 18:36:07
Structured Detection and Segmentation CRFStruct_Det_CRFOxford Brookes UniversityJonathan Warrell, Vibhav Vineet, Paul Sturgess, Philip TorrWe form a hierarchical CRF which jointly models a pool of candidate detections and the multiclass pixel segmentation of an image. Attractive and repulsive pairwise terms are allowed between detection nodes (cf Desai et al, ICCV 2009), which are integrated into a Pn-Potts based hierarchical segmentation energy (cf Ladicky et al, ECCV 2010). A cutting-plane algorithm is used to train the model, using approximate MAP inference. We form a joint loss which combines segmentation and detection components (i.e. paying a penalty both for each pixel incorrectly labelled, and each false detection node which is active in a solution), and use different weightings of this loss to train the model to perform detection and segmentation. The segmentation results thus make use of the bounding box annotations. The candidate detections are generated using the Felzenschwalb et al. CVPR 2008/2010 detector, and as features for segmentation we use textons, SIFT, LBPs and the detection response surfaces themselves.2011-10-13 03:27:02