PASCAL VOC Challenge performance evaluation server

Segmentation Results: VOC2011 ^BETA

Competition "comp5" (train on VOC2011 data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.

The highest scoring entry in each column is shown in bold.
Clicking on the blue arrow symbol () at the top of a column will order the submissions from high to low wrt performance on that column.

Average Precision (AP %)

		mean	aero plane	bicycle	bird	boat	bottle	bus	car	cat	chair	cow	dining table	dog	horse	motor bike	person	potted plant	sheep	sofa	train	tv/ monitor	submission date
	O2P_SVRSEGM_CPMC_CSI ^[?]	48.8	67.8	31.1	44.5	35.6	48.4	64.1	63.1	49.2	17.3	54.0	30.7	41.4	59.7	58.1	55.1	33.6	57.3	33.6	49.2	45.2	20-Nov-2012
	BONN_SVRSEGM ^[?]	43.3	54.3	23.9	39.5	35.3	42.6	65.4	53.5	46.1	15.0	47.4	30.1	33.9	48.8	54.4	46.4	28.8	51.3	26.2	44.9	37.2	13-Oct-2011
	BONN_FGT_SEGM ^[?]	41.4	51.7	23.7	46.0	33.9	49.4	66.2	56.2	41.7	10.4	41.9	29.6	24.4	49.1	50.5	39.6	19.9	44.9	26.1	40.0	41.6	13-Oct-2011
	NUS_SEG_DET_MASK_CLS_CRF ^[?]	37.7	41.5	20.2	30.4	29.1	47.4	61.2	47.7	35.0	8.5	38.3	14.5	28.6	36.5	47.8	42.5	28.5	37.8	26.4	43.5	45.8	13-Oct-2011
	NUS_Context_SVM ^[?]	35.1	40.5	19.0	28.4	27.8	40.7	56.4	45.0	33.1	7.2	37.4	17.4	26.8	33.7	46.6	40.6	23.3	33.4	23.9	41.2	38.6	05-Oct-2011
	Struct_Det_CRF ^[?]	31.3	36.6	18.6	9.2	11.0	29.8	59.0	50.3	25.5	11.8	29.0	24.8	16.0	29.1	47.9	41.9	16.1	34.0	11.6	43.3	31.7	13-Oct-2011

Abbreviations

Title	Method	Affiliation	Contributors	Description	Date
BONN_FGT_SEGM	BONN_FGT_SEGM	¹University of Bonn, ²Vienna University of Technology, ³Georgia Institute of Technology	Joao Carreira¹, Adrian Ion², Fuxin Li³, Cristian Sminchisescu¹	We present a joint image segmentation and labeling model which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales using CPMC (Carreira and Sminchisescu, CVPR2010), constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag (Ion, Carreira, Sminchisescu, ICCV11) , followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure (Ion, Carreira, Sminchisescu, NIPS2011).	2011-10-13 23:04:11
SVR on CPMC-generated Figure-ground segmentations	BONN_SVRSEGM	University of Bonn	Joao Carreira, Fuxin Li, Cristian Sminchisescu	We present a recognition system based on sequential figure-ground ranking. We extract a bag of figure-ground segments using CPMC (Carreira and Sminchisescu, CVPR 2010). The bag is then filtered down to 100 segments using a class-independent ranker. Using these features we learn one nonlinear Support Vector Regressor (SVR) for each category that predicts the overlap between each segment and an object from that category. A complete image interpretation is obtained by sequentially selecting segments using combination and non-maxima suppression schemes. Details can be found in respectively (F. Li, J.Carreira, C. Sminchisescu, CVPR 2010, IJCV11). Additionally, the system is trained with both object segmentation layouts and weak annotations from bounding boxes.	2011-10-13 23:21:00
Context-SVM based submission for 3 tasks	NUS_Context_SVM	National University of Singapore	Zheng Song, Qiang Chen, Shuicheng Yan	Classification uses the BoW framework. Dense-SIFT, HOG^2, LBP and color moment features are extracted. We then use VQ and fisher vector for feature coding and SPM and Generalized Pyramid Matching(GPM) to generate image representations. Context-aware features are also extracted based on [1]. The classification models are learnt via kernel SVM. Then final classification scores are refined with kernel mapping[2]. Detection and segmentation results use the baseline of [3] using HOG and LBP feature. And then based on [1], we further learn context model and refine the detection results. The final segmentation result uses the learnt average masks for each detection component learnt using segmentation training set to substitute the rectangle detection boxes. [1] Zheng Song, Qiang Chen, Zhongyang Huang, Yang Hua, and Shuicheng Yan. Contextualizing Object Detection and Classification. [2] http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/workshop/nuspsl.pdf [3] http://people.cs.uchicago.edu/~pff/latent/	2011-10-05 09:01:23
Segmentation Using CRF with Detection Mask	NUS_SEG_DET_MASK_CLS_CRF	National University of Singapore	Wei XIA, Zheng SONG, Qiang CHEN, Shuicheng YAN, Loong Fah CHEONG	The solution is based on CRF model and the key contribution is the utilization of various types of binary regularization terms. Object detection also plays a very significant role in guiding semantic object segmentation. In this solution, the CRF model is built to integrate the global classification score and local unary and binary information to perform semantic segmentation. What’s more, the detection masks trained by setting a hard threshold of the detection confidence maps are applied as extra unary and smooth terms in the CRF model. Some of masks with high confidence are also used in the post-processing stage to do some refinement at the mask boundaries.	2011-10-13 18:10:08
O2P+SVRSEGM Regressor + Composite Statistical Inference	O2P_SVRSEGM_CPMC_CSI	(1) Georgia Institute of Technology (2) University of California - Berkeley (3) Amazon Inc. (4) Lund University	Fuxin Li (1), Joao Carreira (2), Guy Lebanon (3), Cristian Sminchisescu (4)	We utilize a novel probabilistic inference procedure, Composite Statisitcal Inference (CSI) [1], on semantic segmentation using predictions on overlapping figure-ground hypotheses. Regressor predictions on segment overlaps to the ground truth object are modelled as generated by the true overlap with the ground truth segment plus noise, parametrized on the unknown percentage of each superpixel that belongs to the unknown ground truth. A joint optimization on all the superpixels and all the categories is then performed in order to maximize the likelihood of the SVR predictions. The optimization has a tight convex relaxation so solutions can be expected to be close to the global optimum. A fast and optimal search algorithm is then applied to retrieve each object. CSI takes the intuition from the SVRSEGM inference algorithm that multiple predictions on similar segments can be combined to better consolidate the segment mask. But it fully develops the idea by constructing a probabilistic framework and performing maximum composite likelihood jointly on all segments and categories. Therefore it is able to consolidate better object boundaries and handle hard cases when objects interact closely and heavily occlude each other. For each image, we use 150 overlapping figure-ground hypotheses generated by the CPMC algorithm (Carreira and Sminchisescu, PAMI 2012), SVRSEGM results, and linear SVR predictions on them with the novel second order O2P features (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONN_CMBR_O2P_CPMC_LIN) as the input to the inference algorithm. [1] Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu. Composite Statistical Inference for Semantic Segmentation. CVPR 2013.	2012-11-20 18:36:07
Structured Detection and Segmentation CRF	Struct_Det_CRF	Oxford Brookes University	Jonathan Warrell, Vibhav Vineet, Paul Sturgess, Philip Torr	We form a hierarchical CRF which jointly models a pool of candidate detections and the multiclass pixel segmentation of an image. Attractive and repulsive pairwise terms are allowed between detection nodes (cf Desai et al, ICCV 2009), which are integrated into a Pn-Potts based hierarchical segmentation energy (cf Ladicky et al, ECCV 2010). A cutting-plane algorithm is used to train the model, using approximate MAP inference. We form a joint loss which combines segmentation and detection components (i.e. paying a penalty both for each pixel incorrectly labelled, and each false detection node which is active in a solution), and use different weightings of this loss to train the model to perform detection and segmentation. The segmentation results thus make use of the bounding box annotations. The candidate detections are generated using the Felzenschwalb et al. CVPR 2008/2010 detector, and as features for segmentation we use textons, SIFT, LBPs and the detection response surfaces themselves.	2011-10-13 03:27:02