PASCAL VOC Challenge performance evaluation and download server |
|
Home | Leaderboard |
mean | aero plane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | dining table | dog | horse | motor bike | person | potted plant | sheep | sofa | train | tv/ monitor | submission date | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
O2P_SVRSEGM_CPMC_CSI [?] | 48.8 | 67.8 | 31.1 | 44.5 | 35.6 | 48.4 | 64.1 | 63.1 | 49.2 | 17.3 | 54.0 | 30.7 | 41.4 | 59.7 | 58.1 | 55.1 | 33.6 | 57.3 | 33.6 | 49.2 | 45.2 | 20-Nov-2012 | |
BONN_SVRSEGM [?] | 43.3 | 54.3 | 23.9 | 39.5 | 35.3 | 42.6 | 65.4 | 53.5 | 46.1 | 15.0 | 47.4 | 30.1 | 33.9 | 48.8 | 54.4 | 46.4 | 28.8 | 51.3 | 26.2 | 44.9 | 37.2 | 13-Oct-2011 | |
BONN_FGT_SEGM [?] | 41.4 | 51.7 | 23.7 | 46.0 | 33.9 | 49.4 | 66.2 | 56.2 | 41.7 | 10.4 | 41.9 | 29.6 | 24.4 | 49.1 | 50.5 | 39.6 | 19.9 | 44.9 | 26.1 | 40.0 | 41.6 | 13-Oct-2011 | |
NUS_SEG_DET_MASK_CLS_CRF [?] | 37.7 | 41.5 | 20.2 | 30.4 | 29.1 | 47.4 | 61.2 | 47.7 | 35.0 | 8.5 | 38.3 | 14.5 | 28.6 | 36.5 | 47.8 | 42.5 | 28.5 | 37.8 | 26.4 | 43.5 | 45.8 | 13-Oct-2011 | |
NUS_Context_SVM [?] | 35.1 | 40.5 | 19.0 | 28.4 | 27.8 | 40.7 | 56.4 | 45.0 | 33.1 | 7.2 | 37.4 | 17.4 | 26.8 | 33.7 | 46.6 | 40.6 | 23.3 | 33.4 | 23.9 | 41.2 | 38.6 | 05-Oct-2011 | |
Struct_Det_CRF [?] | 31.3 | 36.6 | 18.6 | 9.2 | 11.0 | 29.8 | 59.0 | 50.3 | 25.5 | 11.8 | 29.0 | 24.8 | 16.0 | 29.1 | 47.9 | 41.9 | 16.1 | 34.0 | 11.6 | 43.3 | 31.7 | 13-Oct-2011 |
Title | Method | Affiliation | Contributors | Description | Date |
---|---|---|---|---|---|
BONN_FGT_SEGM | BONN_FGT_SEGM | ¹University of Bonn, ²Vienna University of Technology, ³Georgia Institute of Technology | Joao Carreira¹, Adrian Ion², Fuxin Li³, Cristian Sminchisescu¹ | We present a joint image segmentation and labeling model which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales using CPMC (Carreira and Sminchisescu, CVPR2010), constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag (Ion, Carreira, Sminchisescu, ICCV11) , followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure (Ion, Carreira, Sminchisescu, NIPS2011). | 2011-10-13 23:04:11 |
SVR on CPMC-generated Figure-ground segmentations | BONN_SVRSEGM | University of Bonn | Joao Carreira, Fuxin Li, Cristian Sminchisescu | We present a recognition system based on sequential figure-ground ranking. We extract a bag of figure-ground segments using CPMC (Carreira and Sminchisescu, CVPR 2010). The bag is then filtered down to 100 segments using a class-independent ranker. Using these features we learn one nonlinear Support Vector Regressor (SVR) for each category that predicts the overlap between each segment and an object from that category. A complete image interpretation is obtained by sequentially selecting segments using combination and non-maxima suppression schemes. Details can be found in respectively (F. Li, J.Carreira, C. Sminchisescu, CVPR 2010, IJCV11). Additionally, the system is trained with both object segmentation layouts and weak annotations from bounding boxes. | 2011-10-13 23:21:00 |
Context-SVM based submission for 3 tasks | NUS_Context_SVM | National University of Singapore | Zheng Song, Qiang Chen, Shuicheng Yan | Classification uses the BoW framework. Dense-SIFT, HOG^2, LBP and color moment features are extracted. We then use VQ and fisher vector for feature coding and SPM and Generalized Pyramid Matching(GPM) to generate image representations. Context-aware features are also extracted based on [1]. The classification models are learnt via kernel SVM. Then final classification scores are refined with kernel mapping[2]. Detection and segmentation results use the baseline of [3] using HOG and LBP feature. And then based on [1], we further learn context model and refine the detection results. The final segmentation result uses the learnt average masks for each detection component learnt using segmentation training set to substitute the rectangle detection boxes. [1] Zheng Song*, Qiang Chen*, Zhongyang Huang, Yang Hua, and Shuicheng Yan. Contextualizing Object Detection and Classification. [2] http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/workshop/nuspsl.pdf [3] http://people.cs.uchicago.edu/~pff/latent/ | 2011-10-05 09:01:23 |
Segmentation Using CRF with Detection Mask | NUS_SEG_DET_MASK_CLS_CRF | National University of Singapore | Wei XIA, Zheng SONG, Qiang CHEN, Shuicheng YAN, Loong Fah CHEONG | The solution is based on CRF model and the key contribution is the utilization of various types of binary regularization terms. Object detection also plays a very significant role in guiding semantic object segmentation. In this solution, the CRF model is built to integrate the global classification score and local unary and binary information to perform semantic segmentation. What’s more, the detection masks trained by setting a hard threshold of the detection confidence maps are applied as extra unary and smooth terms in the CRF model. Some of masks with high confidence are also used in the post-processing stage to do some refinement at the mask boundaries. | 2011-10-13 18:10:08 |
O2P+SVRSEGM Regressor + Composite Statistical Inference | O2P_SVRSEGM_CPMC_CSI | (1) Georgia Institute of Technology (2) University of California - Berkeley (3) Amazon Inc. (4) Lund University | Fuxin Li | We utilize a novel probabilistic inference procedure, Composite Statisitcal Inference (CSI) [1], on semantic segmentation using predictions on overlapping figure-ground hypotheses. Regressor predictions on segment overlaps to the ground truth object are modelled as generated by the true overlap with the ground truth segment plus noise, parametrized on the unknown percentage of each superpixel that belongs to the unknown ground truth. A joint optimization on all the superpixels and all the categories is then performed in order to maximize the likelihood of the SVR predictions. The optimization has a tight convex relaxation so solutions can be expected to be close to the global optimum. A fast and optimal search algorithm is then applied to retrieve each object. CSI takes the intuition from the SVRSEGM inference algorithm that multiple predictions on similar segments can be combined to better consolidate the segment mask. But it fully develops the idea by constructing a probabilistic framework and performing maximum composite likelihood jointly on all segments and categories. Therefore it is able to consolidate better object boundaries and handle hard cases when objects interact closely and heavily occlude each other. For each image, we use 150 overlapping figure-ground hypotheses generated by the CPMC algorithm (Carreira and Sminchisescu, PAMI 2012), SVRSEGM results, and linear SVR predictions on them with the novel second order O2P features (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONN_CMBR_O2P_CPMC_LIN) as the input to the inference algorithm. [1] Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu. Composite Statistical Inference for Semantic Segmentation. CVPR 2013. | 2012-11-20 18:36:07 |
Structured Detection and Segmentation CRF | Struct_Det_CRF | Oxford Brookes University | Jonathan Warrell, Vibhav Vineet, Paul Sturgess, Philip Torr | We form a hierarchical CRF which jointly models a pool of candidate detections and the multiclass pixel segmentation of an image. Attractive and repulsive pairwise terms are allowed between detection nodes (cf Desai et al, ICCV 2009), which are integrated into a Pn-Potts based hierarchical segmentation energy (cf Ladicky et al, ECCV 2010). A cutting-plane algorithm is used to train the model, using approximate MAP inference. We form a joint loss which combines segmentation and detection components (i.e. paying a penalty both for each pixel incorrectly labelled, and each false detection node which is active in a solution), and use different weightings of this loss to train the model to perform detection and segmentation. The segmentation results thus make use of the bounding box annotations. The candidate detections are generated using the Felzenschwalb et al. CVPR 2008/2010 detector, and as features for segmentation we use textons, SIFT, LBPs and the detection response surfaces themselves. | 2011-10-13 03:27:02 |