Segmentation Results: VOC2011 BETA

Competition "comp6" (train on own data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.

Average Precision (AP %)

  mean

aero
plane
bicycle

bird

boat

bottle

bus

car

cat

chair

cow

dining
table
dog

horse

motor
bike
person

potted
plant
sheep

sofa

train

tv/
monitor
submission
date
hrnet_baseline [?] 79.192.943.282.164.483.095.291.093.947.690.158.890.789.989.188.066.790.047.688.774.025-Jan-2020
Deeplab v3 [?] 75.785.840.175.266.675.793.385.890.138.885.060.787.086.086.483.466.684.650.983.571.022-Oct-2018
Oxford_TVG_CRF_RNN_COCO [?] 75.090.758.188.166.670.890.781.884.334.881.663.479.283.586.779.159.279.952.879.969.722-Apr-2015
Oxford_TVG_CRF_RNN_VOC [?] 72.487.140.377.266.869.590.680.484.533.083.658.081.380.582.979.260.680.144.979.466.822-Apr-2015
Weak_manifold_CNN [?] 65.881.433.573.060.164.087.774.077.729.566.752.072.872.671.372.654.967.743.473.761.714-Nov-2016
CRF_RNN [?] 65.483.134.770.351.664.083.476.980.126.271.550.473.971.476.475.249.072.140.271.859.010-Feb-2015
TTI_zoomout [?] 64.180.836.876.855.757.981.774.978.223.570.851.473.176.976.067.744.667.337.067.956.425-Nov-2014
FCN-8s [?] 62.779.135.165.449.261.981.575.778.023.467.345.671.267.375.772.146.170.635.170.955.612-Nov-2014
Berkeley_Region_Classify [?] 39.148.920.032.828.241.153.948.348.06.034.927.535.047.247.348.420.652.725.036.635.413-Oct-2011

Abbreviations

TitleMethodAffiliationContributorsDescriptionDate
Classification of low-level regionsBerkeley_Region_ClassifyUC BerkeleyPablo Arbelaez, Bharath Hariharan, Saurabh Gupta, Chunhui Gu, Lubomir Bourdev and Jitendra MalikWe propose a semantic segmentation approach that represents and classifies generic regions from low-level segmentation. We extract object candidates using ultrametric contour maps (Arbelaez et al., TPAMI 2011) at several image resolutions. We represent each region using mid- and high-level features that capture its appearance (color, shape , texture) and also its compatibility with the activations of a part detector (we use the poselets from Bourdev et al, ECCV 2010.) . A category label is assigned to each region using a hierarchy of IKSVM classifiers (Maji et al, CVPR 2008).2011-10-13 22:22:06
CRF as RNNCRF_RNNUniversity of OxfordShuai Zheng; Sadeep Jayasumana; Bernardino Romera-Paredes; Philip TorrWe introduce a new form of convolutional neural network, called CRF-RNN, which expresses a conditional random field (CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. See the paper: "Conditional Random Fields as Recurrent Neural Networks".2015-02-10 10:57:12
Deeplab v3_MS-COCO Deeplab v3Peking UniversityWuxiaochun, Renxiaohang We used resnet_v2_101 model pretrained with VOC2012 as pre_trained_model, and used MS-COCO dataset to train a deeplab v3 model. we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks. We propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. See http://arxiv.org/abs/1706.05587 for further information.2018-10-22 06:39:58
Fully convolutional netFCN-8sUC BerkeleyJonathan Long Evan Shelhamer Trevor DarrellWe apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net.2014-11-12 08:57:33
Oxford_TVG_CRF_RNN_COCOOxford_TVG_CRF_RNN_COCO[1] University of Oxford / [2] Baidu IDLShuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, Berkeley augmented data and a subset of COCO 2014 train data. More details will be available in the paper http://arxiv.org/abs/1502.03240.2015-04-22 14:00:29
Oxford_TVG_CRF_RNN_VOCOxford_TVG_CRF_RNN_VOC[1] University of Oxford / [2] Baidu IDLShuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, and Berkeley augmented data (COCO dataset was not used). More details will be available in the paper http://arxiv.org/abs/1502.03240. 2015-04-22 10:30:12
Feedforward segmentation with zoom-out featuresTTI_zoomoutTTI-ChicagoMohammadreza Mostajabi, Payman Yadollahpour, Gregory ShakhnarovichSame as before, except using VGG 16-layer network instead of VGG CNN-S network. Fine-tuning on VOC-2012 was not performed. See http://arxiv.org/abs/1412.0774 for details.2014-11-25 18:43:36
CNN segmentation based on manifold learningWeak_manifold_CNNUniversity of Central FloridaMarzieh EdrakiManifold learning has been used to train deep convolutional neural network in weakly supervised manner. The only required annotation is bounding box. Model was trained based on all training sample of pascal voc 2011. The model is based on VGG16 architecture that fully connected layers were replaced by convolution layers like FCN model. We used the hierarchical feature generation property of deep convolutional neural networks to design new cost function that can be applied on top of most of deep CNN semantic segmentation model and only needs bounding box in training2016-11-14 05:41:33
high revolution network baselinehrnet_baselineUniversity of Chinese Academy of SciencesxiaoyangThis is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. 2020-01-25 12:42:00