PASCAL VOC Challenge performance evaluation and download server |
|
Home | Leaderboard |
mean | aero plane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | dining table | dog | horse | motor bike | person | potted plant | sheep | sofa | train | tv/ monitor | submission date | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hrnet_baseline [?] | 79.1 | 92.9 | 43.2 | 82.1 | 64.4 | 83.0 | 95.2 | 91.0 | 93.9 | 47.6 | 90.1 | 58.8 | 90.7 | 89.9 | 89.1 | 88.0 | 66.7 | 90.0 | 47.6 | 88.7 | 74.0 | 25-Jan-2020 | |
Deeplab v3 [?] | 75.7 | 85.8 | 40.1 | 75.2 | 66.6 | 75.7 | 93.3 | 85.8 | 90.1 | 38.8 | 85.0 | 60.7 | 87.0 | 86.0 | 86.4 | 83.4 | 66.6 | 84.6 | 50.9 | 83.5 | 71.0 | 22-Oct-2018 | |
Oxford_TVG_CRF_RNN_COCO [?] | 75.0 | 90.7 | 58.1 | 88.1 | 66.6 | 70.8 | 90.7 | 81.8 | 84.3 | 34.8 | 81.6 | 63.4 | 79.2 | 83.5 | 86.7 | 79.1 | 59.2 | 79.9 | 52.8 | 79.9 | 69.7 | 22-Apr-2015 | |
Oxford_TVG_CRF_RNN_VOC [?] | 72.4 | 87.1 | 40.3 | 77.2 | 66.8 | 69.5 | 90.6 | 80.4 | 84.5 | 33.0 | 83.6 | 58.0 | 81.3 | 80.5 | 82.9 | 79.2 | 60.6 | 80.1 | 44.9 | 79.4 | 66.8 | 22-Apr-2015 | |
Weak_manifold_CNN [?] | 65.8 | 81.4 | 33.5 | 73.0 | 60.1 | 64.0 | 87.7 | 74.0 | 77.7 | 29.5 | 66.7 | 52.0 | 72.8 | 72.6 | 71.3 | 72.6 | 54.9 | 67.7 | 43.4 | 73.7 | 61.7 | 14-Nov-2016 | |
CRF_RNN [?] | 65.4 | 83.1 | 34.7 | 70.3 | 51.6 | 64.0 | 83.4 | 76.9 | 80.1 | 26.2 | 71.5 | 50.4 | 73.9 | 71.4 | 76.4 | 75.2 | 49.0 | 72.1 | 40.2 | 71.8 | 59.0 | 10-Feb-2015 | |
TTI_zoomout [?] | 64.1 | 80.8 | 36.8 | 76.8 | 55.7 | 57.9 | 81.7 | 74.9 | 78.2 | 23.5 | 70.8 | 51.4 | 73.1 | 76.9 | 76.0 | 67.7 | 44.6 | 67.3 | 37.0 | 67.9 | 56.4 | 25-Nov-2014 | |
FCN-8s [?] | 62.7 | 79.1 | 35.1 | 65.4 | 49.2 | 61.9 | 81.5 | 75.7 | 78.0 | 23.4 | 67.3 | 45.6 | 71.2 | 67.3 | 75.7 | 72.1 | 46.1 | 70.6 | 35.1 | 70.9 | 55.6 | 12-Nov-2014 | |
Berkeley_Region_Classify [?] | 39.1 | 48.9 | 20.0 | 32.8 | 28.2 | 41.1 | 53.9 | 48.3 | 48.0 | 6.0 | 34.9 | 27.5 | 35.0 | 47.2 | 47.3 | 48.4 | 20.6 | 52.7 | 25.0 | 36.6 | 35.4 | 13-Oct-2011 |
Title | Method | Affiliation | Contributors | Description | Date |
---|---|---|---|---|---|
Classification of low-level regions | Berkeley_Region_Classify | UC Berkeley | Pablo Arbelaez, Bharath Hariharan, Saurabh Gupta, Chunhui Gu, Lubomir Bourdev and Jitendra Malik | We propose a semantic segmentation approach that represents and classifies generic regions from low-level segmentation. We extract object candidates using ultrametric contour maps (Arbelaez et al., TPAMI 2011) at several image resolutions. We represent each region using mid- and high-level features that capture its appearance (color, shape , texture) and also its compatibility with the activations of a part detector (we use the poselets from Bourdev et al, ECCV 2010.) . A category label is assigned to each region using a hierarchy of IKSVM classifiers (Maji et al, CVPR 2008). | 2011-10-13 22:22:06 |
CRF as RNN | CRF_RNN | University of Oxford | Shuai Zheng | We introduce a new form of convolutional neural network, called CRF-RNN, which expresses a conditional random field (CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. See the paper: "Conditional Random Fields as Recurrent Neural Networks". | 2015-02-10 10:57:12 |
Deeplab v3_MS-COCO | Deeplab v3 | Peking University | Wuxiaochun, Renxiaohang | We used resnet_v2_101 model pretrained with VOC2012 as pre_trained_model, and used MS-COCO dataset to train a deeplab v3 model. we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks. We propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. See http://arxiv.org/abs/1706.05587 for further information. | 2018-10-22 06:39:58 |
Fully convolutional net | FCN-8s | UC Berkeley | Jonathan Long Evan Shelhamer Trevor Darrell | We apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net. | 2014-11-12 08:57:33 |
Oxford_TVG_CRF_RNN_COCO | Oxford_TVG_CRF_RNN_COCO | [1] University of Oxford / [2] Baidu IDL | Shuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1] | We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, Berkeley augmented data and a subset of COCO 2014 train data. More details will be available in the paper http://arxiv.org/abs/1502.03240. | 2015-04-22 14:00:29 |
Oxford_TVG_CRF_RNN_VOC | Oxford_TVG_CRF_RNN_VOC | [1] University of Oxford / [2] Baidu IDL | Shuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1] | We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, and Berkeley augmented data (COCO dataset was not used). More details will be available in the paper http://arxiv.org/abs/1502.03240. | 2015-04-22 10:30:12 |
Feedforward segmentation with zoom-out features | TTI_zoomout | TTI-Chicago | Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich | Same as before, except using VGG 16-layer network instead of VGG CNN-S network. Fine-tuning on VOC-2012 was not performed. See http://arxiv.org/abs/1412.0774 for details. | 2014-11-25 18:43:36 |
CNN segmentation based on manifold learning | Weak_manifold_CNN | University of Central Florida | Marzieh Edraki | Manifold learning has been used to train deep convolutional neural network in weakly supervised manner. The only required annotation is bounding box. Model was trained based on all training sample of pascal voc 2011. The model is based on VGG16 architecture that fully connected layers were replaced by convolution layers like FCN model. We used the hierarchical feature generation property of deep convolutional neural networks to design new cost function that can be applied on top of most of deep CNN semantic segmentation model and only needs bounding box in training | 2016-11-14 05:41:33 |
high revolution network baseline | hrnet_baseline | University of Chinese Academy of Sciences | xiaoyang | This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. | 2020-01-25 12:42:00 |