PASCAL VOC Challenge performance evaluation server

Segmentation Results: VOC2011 ^BETA

Competition "comp6" (train on own data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.

The highest scoring entry in each column is shown in bold.
Clicking on the blue arrow symbol () at the top of a column will order the submissions from high to low wrt performance on that column.

Average Precision (AP %)

		mean	aero plane	bicycle	bird	boat	bottle	bus	car	cat	chair	cow	dining table	dog	horse	motor bike	person	potted plant	sheep	sofa	train	tv/ monitor	submission date
	hrnet_baseline ^[?]	79.1	92.9	43.2	82.1	64.4	83.0	95.2	91.0	93.9	47.6	90.1	58.8	90.7	89.9	89.1	88.0	66.7	90.0	47.6	88.7	74.0	25-Jan-2020
	Deeplab v3 ^[?]	75.7	85.8	40.1	75.2	66.6	75.7	93.3	85.8	90.1	38.8	85.0	60.7	87.0	86.0	86.4	83.4	66.6	84.6	50.9	83.5	71.0	22-Oct-2018
	Oxford_TVG_CRF_RNN_COCO ^[?]	75.0	90.7	58.1	88.1	66.6	70.8	90.7	81.8	84.3	34.8	81.6	63.4	79.2	83.5	86.7	79.1	59.2	79.9	52.8	79.9	69.7	22-Apr-2015
	Oxford_TVG_CRF_RNN_VOC ^[?]	72.4	87.1	40.3	77.2	66.8	69.5	90.6	80.4	84.5	33.0	83.6	58.0	81.3	80.5	82.9	79.2	60.6	80.1	44.9	79.4	66.8	22-Apr-2015
	Weak_manifold_CNN ^[?]	65.8	81.4	33.5	73.0	60.1	64.0	87.7	74.0	77.7	29.5	66.7	52.0	72.8	72.6	71.3	72.6	54.9	67.7	43.4	73.7	61.7	14-Nov-2016
	CRF_RNN ^[?]	65.4	83.1	34.7	70.3	51.6	64.0	83.4	76.9	80.1	26.2	71.5	50.4	73.9	71.4	76.4	75.2	49.0	72.1	40.2	71.8	59.0	10-Feb-2015
	TTI_zoomout ^[?]	64.1	80.8	36.8	76.8	55.7	57.9	81.7	74.9	78.2	23.5	70.8	51.4	73.1	76.9	76.0	67.7	44.6	67.3	37.0	67.9	56.4	25-Nov-2014
	FCN-8s ^[?]	62.7	79.1	35.1	65.4	49.2	61.9	81.5	75.7	78.0	23.4	67.3	45.6	71.2	67.3	75.7	72.1	46.1	70.6	35.1	70.9	55.6	12-Nov-2014
	Berkeley_Region_Classify ^[?]	39.1	48.9	20.0	32.8	28.2	41.1	53.9	48.3	48.0	6.0	34.9	27.5	35.0	47.2	47.3	48.4	20.6	52.7	25.0	36.6	35.4	13-Oct-2011

Abbreviations

Title	Method	Affiliation	Contributors	Description	Date
Classification of low-level regions	Berkeley_Region_Classify	UC Berkeley	Pablo Arbelaez, Bharath Hariharan, Saurabh Gupta, Chunhui Gu, Lubomir Bourdev and Jitendra Malik	We propose a semantic segmentation approach that represents and classifies generic regions from low-level segmentation. We extract object candidates using ultrametric contour maps (Arbelaez et al., TPAMI 2011) at several image resolutions. We represent each region using mid- and high-level features that capture its appearance (color, shape , texture) and also its compatibility with the activations of a part detector (we use the poselets from Bourdev et al, ECCV 2010.) . A category label is assigned to each region using a hierarchy of IKSVM classifiers (Maji et al, CVPR 2008).	2011-10-13 22:22:06
CRF as RNN	CRF_RNN	University of Oxford	Shuai Zheng; Sadeep Jayasumana; Bernardino Romera-Paredes; Philip Torr	We introduce a new form of convolutional neural network, called CRF-RNN, which expresses a conditional random field (CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. See the paper: "Conditional Random Fields as Recurrent Neural Networks".	2015-02-10 10:57:12
Deeplab v3_MS-COCO	Deeplab v3	Peking University	Wuxiaochun, Renxiaohang	We used resnet_v2_101 model pretrained with VOC2012 as pre_trained_model, and used MS-COCO dataset to train a deeplab v3 model. we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks. We propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. See http://arxiv.org/abs/1706.05587 for further information.	2018-10-22 06:39:58
Fully convolutional net	FCN-8s	UC Berkeley	Jonathan Long Evan Shelhamer Trevor Darrell	We apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net.	2014-11-12 08:57:33
Oxford_TVG_CRF_RNN_COCO	Oxford_TVG_CRF_RNN_COCO	[1] University of Oxford / [2] Baidu IDL	Shuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]	We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, Berkeley augmented data and a subset of COCO 2014 train data. More details will be available in the paper http://arxiv.org/abs/1502.03240.	2015-04-22 14:00:29
Oxford_TVG_CRF_RNN_VOC	Oxford_TVG_CRF_RNN_VOC	[1] University of Oxford / [2] Baidu IDL	Shuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]	We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, and Berkeley augmented data (COCO dataset was not used). More details will be available in the paper http://arxiv.org/abs/1502.03240.	2015-04-22 10:30:12
Feedforward segmentation with zoom-out features	TTI_zoomout	TTI-Chicago	Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich	Same as before, except using VGG 16-layer network instead of VGG CNN-S network. Fine-tuning on VOC-2012 was not performed. See http://arxiv.org/abs/1412.0774 for details.	2014-11-25 18:43:36
CNN segmentation based on manifold learning	Weak_manifold_CNN	University of Central Florida	Marzieh Edraki	Manifold learning has been used to train deep convolutional neural network in weakly supervised manner. The only required annotation is bounding box. Model was trained based on all training sample of pascal voc 2011. The model is based on VGG16 architecture that fully connected layers were replaced by convolution layers like FCN model. We used the hierarchical feature generation property of deep convolutional neural networks to design new cost function that can be applied on top of most of deep CNN semantic segmentation model and only needs bounding box in training	2016-11-14 05:41:33
high revolution network baseline	hrnet_baseline	University of Chinese Academy of Sciences	xiaoyang	This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise.	2020-01-25 12:42:00