Segmentation Results: VOC2012 BETA

Competition "comp6" (train on own data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.Entries equivalent to a selected submission are determined by bootstrapping the performance measure, and assessing if the differences between the selected submission and the others are not statistically significant (see sec 3.5 in VOC 2014 paper).

Average Precision (AP %)

  mean

aero
plane
bicycle

bird

boat

bottle

bus

car

cat

chair

cow

dining
table
dog

horse

motor
bike
person

potted
plant
sheep

sofa

train

tv/
monitor
submission
date
CASIA_IVA_SDN [?] 86.096.676.495.679.381.497.091.796.445.594.178.792.495.492.590.573.793.864.990.283.015-Jun-2017
SYSU_SceneParsing_COCO [?] 85.794.666.793.371.883.595.389.693.652.394.975.893.295.591.789.375.593.778.494.080.022-Feb-2017
DeepLabv3 [?] 85.796.476.692.777.887.696.790.295.447.593.476.391.497.291.092.171.390.968.990.879.320-Jun-2017
PSPNet [?] 85.495.872.795.078.984.494.792.095.743.191.080.391.396.392.390.171.594.466.988.882.006-Dec-2016
** ResNet-38_COCO ** [?] 84.996.275.295.474.481.793.789.992.548.292.079.990.195.591.891.273.090.565.488.780.622-Jan-2017
Multipath-RefineNet [?] 84.295.073.293.578.184.895.689.894.143.792.077.290.893.488.688.170.192.964.387.778.817-Jan-2017
Large_Kernel_Matters [?] 83.695.368.794.172.682.496.089.393.047.889.670.889.293.390.191.272.089.867.888.976.916-Mar-2017
ResNet-38_MS [?] 83.195.272.595.170.878.591.790.092.441.990.873.990.693.890.589.572.689.863.287.879.109-Dec-2016
TuSimple [?] 83.192.164.694.771.081.094.689.794.945.693.774.492.095.190.088.769.190.462.786.478.209-Nov-2016
Deep Layer Cascade (LC) [?] 82.785.566.794.567.284.096.189.893.547.290.471.588.991.789.289.170.489.470.784.279.606-Apr-2017
SegModel [?] 81.893.660.293.669.176.496.388.295.537.990.873.391.194.388.688.664.890.163.787.378.223-Aug-2016
HikSeg_COCO [?] 81.495.064.291.579.078.793.488.494.345.889.665.290.692.888.787.562.488.456.486.275.302-Oct-2016
DP_ResNet_CRF [?] 81.094.059.591.868.175.995.288.993.237.790.870.889.292.787.787.965.590.362.687.275.510-Nov-2016
OBP-HJLCN [?] 80.492.754.891.668.076.995.789.392.635.289.069.389.492.787.987.566.888.562.286.176.213-Sep-2016
CentraleSupelec Deep G-CRF [?] 80.292.961.291.066.377.795.388.992.433.888.469.189.892.987.787.562.689.959.287.174.212-Aug-2016
CMT-FCN-ResNet-CRF [?] 80.092.555.392.266.076.995.188.693.935.187.671.689.392.887.988.062.088.059.786.175.702-Aug-2016
DeepLabv2-CRF [?] 79.792.660.491.663.476.395.088.492.632.788.567.689.692.187.087.463.388.360.086.874.506-Jun-2016
LRR_4x_ResNet_COCO [?] 79.392.445.194.665.275.895.189.192.339.085.770.488.689.488.686.665.886.257.485.777.318-Jul-2016
CASIA_SegResNet_CRF_COCO [?] 79.393.842.293.168.675.395.388.892.536.584.364.286.887.887.588.569.289.764.186.874.603-Jun-2016
Adelaide_VeryDeep_FCN_VOC [?] 79.191.948.193.469.375.594.287.592.836.786.965.289.190.286.587.264.690.159.785.572.713-May-2016
LRR_4x_COCO [?] 78.793.244.289.465.474.993.987.092.042.983.768.986.588.089.087.267.385.664.084.171.516-Jun-2016
CASIA_IVA_OASeg [?] 78.393.841.989.467.571.594.685.389.538.188.464.887.090.584.983.367.586.968.183.474.021-May-2016
Oxford_TVG_HO_CRF [?] 77.992.559.190.370.674.492.484.188.336.885.667.185.186.988.282.662.685.056.381.972.516-Mar-2016
Adelaide_Context_CNN_CRF_COCO [?] 77.892.939.684.067.975.392.783.890.144.385.564.987.388.884.585.568.189.062.881.271.406-Nov-2015
CUHK_DPN_COCO [?] 77.589.061.687.766.874.791.284.387.636.586.366.184.487.885.685.463.687.361.379.466.422-Sep-2015
Adelaide_Context_CNN_CRF_COCO [?] 77.292.338.882.966.175.192.483.188.641.885.962.886.788.484.085.467.488.861.981.971.713-Aug-2015
Ladder_DenseNet [?] 77.091.444.686.564.674.692.186.291.337.086.062.685.283.584.185.461.591.959.583.671.426-May-2017
DeepLab-CRF-Attention-DT [?] 76.393.241.788.061.774.992.984.590.433.082.863.284.585.087.285.760.587.757.884.368.203-Feb-2016
CentraleSuperBoundaries++ [?] 76.091.138.590.968.774.289.985.389.134.482.565.683.182.985.785.460.684.559.980.269.913-Jan-2016
LRR_4x_de_pyramid_VOC [?] 75.991.841.083.062.374.393.086.888.736.681.863.484.785.985.183.162.084.655.684.970.007-Jun-2016
DeepLab-CRF-Attention [?] 75.791.140.986.962.174.292.384.490.134.081.766.083.583.986.584.659.187.259.681.066.203-Feb-2016
Adelaide_Context_CNN_CRF_VOC [?] 75.390.637.680.067.874.492.085.286.239.181.258.983.883.984.384.862.183.258.280.872.330-Aug-2015
MSRA_BoxSup [?] 75.289.838.089.268.968.089.683.087.734.483.667.181.583.785.283.558.684.955.881.270.718-May-2015
MERL_UMD_Deep_GCRF_COCO [?] 74.889.942.690.065.069.289.983.988.231.381.866.482.981.185.783.458.488.456.777.764.315-Jan-2016
POSTECH_DeconvNet_CRF_VOC [?] 74.890.040.884.267.370.790.984.887.434.883.058.782.387.186.982.464.584.654.977.564.118-Aug-2015
Oxford_TVG_CRF_RNN_COCO [?] 74.790.455.388.768.469.888.382.485.132.678.564.479.681.986.481.858.682.453.577.470.122-Apr-2015
UNIST_GDN_CRF_ENS [?] 74.088.648.688.864.770.487.281.886.432.077.164.180.578.084.083.359.285.956.877.965.029-Jul-2016
DeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint [?] 73.989.246.788.563.568.487.081.286.332.680.762.481.081.384.382.156.284.658.376.267.226-Apr-2015
UNIST_GDN_CRF [?] 73.287.937.888.864.570.787.781.387.132.576.766.680.376.682.282.357.984.555.978.564.229-Jul-2016
MERL_DEEP_GCRF [?] 73.285.243.983.365.268.389.082.785.331.179.563.380.579.385.581.060.585.552.077.365.117-Oct-2015
Bayesian Dilation Network [?] 73.188.639.086.263.367.188.181.986.834.781.157.181.386.583.483.453.784.053.380.562.507-Jun-2016
DeepLab-CRF-COCO-LargeFOV [?] 72.789.138.388.163.369.787.183.185.029.376.556.579.877.985.882.457.484.354.980.564.118-Mar-2015
POSTECH_EDeconvNet_CRF_VOC [?] 72.589.939.379.763.968.287.481.286.128.577.062.079.080.383.680.258.883.454.380.765.022-Apr-2015
Dual-Multi-Reso-MR [?] 72.487.640.380.662.971.388.184.484.729.677.858.580.081.085.482.155.083.848.280.365.303-Nov-2016
CCBM [?] 72.387.846.779.063.670.583.775.586.931.081.961.381.585.981.176.558.777.750.476.669.829-Nov-2015
Oxford_TVG_CRF_RNN_VOC [?] 72.087.539.079.764.268.387.680.884.430.478.260.480.577.883.180.659.582.847.878.367.122-Apr-2015
DeepLab-MSc-CRF-LargeFOV [?] 71.684.454.581.563.665.985.179.183.430.774.159.879.076.183.280.859.782.250.473.163.702-Apr-2015
MSRA_BoxSup [?] 71.086.435.579.765.265.284.378.583.730.576.262.679.376.182.181.357.078.255.072.568.110-Feb-2015
FCN_CLC_MSP [?] 70.886.240.183.957.864.787.981.385.928.380.061.980.782.579.780.254.781.339.378.959.201-Jul-2016
DeepLab-CRF-COCO-Strong [?] 70.485.336.284.861.267.584.681.481.030.873.853.877.576.582.381.656.378.952.376.663.311-Feb-2015
DeepLab-CRF-LargeFOV [?] 70.383.536.682.562.366.585.478.583.730.472.960.478.575.582.179.758.282.048.873.763.328-Mar-2015
DeepSqueeNet_CRF [?] 70.185.737.483.459.767.885.279.881.427.972.360.476.578.282.778.857.378.649.077.661.021-Jul-2016
TTI_zoomout_v2 [?] 69.685.637.383.262.566.085.180.784.927.273.257.578.179.281.177.153.674.049.271.763.330-Mar-2015
RRF-4s [?] 69.479.557.378.761.864.183.978.180.430.073.059.474.373.980.877.953.976.446.171.763.930-Nov-2016
VGG19_FCN [?] 68.181.735.979.857.566.984.179.680.828.272.153.374.072.178.578.255.576.743.473.865.106-Apr-2017
FCN-8s-heavy [?] 67.282.436.175.661.565.483.477.280.127.966.851.573.671.978.977.155.373.444.374.063.206-Feb-2016
DeepLab-CRF-MSc [?] 67.180.436.877.455.266.481.577.578.927.168.252.774.369.679.479.056.978.845.272.759.330-Dec-2014
DeepLab-CRF [?] 66.478.433.178.255.665.381.375.578.625.369.252.775.269.079.177.654.778.345.173.356.223-Dec-2014
DeepSqueeNet [?] 65.776.134.376.456.062.082.775.478.325.664.358.873.369.379.376.753.272.146.269.359.120-Jul-2016
Bayesian FCN [?] 65.480.834.975.257.064.180.977.278.026.465.644.072.670.878.776.852.471.040.473.861.807-Jun-2016
Weak_manifold_CNN [?] 65.380.932.973.257.763.083.973.576.627.065.952.670.969.873.074.953.370.145.472.462.711-Nov-2016
CRF_RNN [?] 65.280.934.072.952.662.579.876.379.923.667.751.874.869.976.976.949.074.742.772.159.610-Feb-2015
UNIST_GDN_FCN_FC [?] 64.475.631.569.251.662.978.876.778.724.661.760.374.562.676.174.351.570.647.374.058.427-Jul-2016
TTI_zoomout_16 [?] 64.481.935.178.257.456.580.574.079.822.469.653.774.076.076.668.844.370.240.268.955.324-Nov-2014
Hypercolumn [?] 62.668.733.569.851.370.281.171.974.923.960.646.972.168.374.572.952.664.445.464.957.409-Apr-2015
FCN-8s [?] 62.276.834.268.949.460.375.374.777.621.462.546.871.863.976.573.945.272.437.470.955.112-Nov-2014
UNIST_GDN_FCN [?] 62.274.531.966.749.760.576.975.976.022.957.654.573.059.475.073.751.067.543.370.056.427-Jul-2016
MSRA_CFM [?] 61.875.726.769.548.865.681.069.273.330.068.751.569.168.171.767.550.466.544.458.953.517-Dec-2014
** SegNet ** [?] 59.973.637.662.046.858.679.170.165.423.660.445.661.863.575.374.942.663.742.567.852.710-Nov-2015
TTI_zoomout [?] 58.470.331.968.346.452.175.368.475.319.258.449.969.663.070.167.641.564.034.964.247.317-Nov-2014
SDS [?] 51.663.325.763.039.859.270.961.454.916.845.048.250.551.057.763.331.858.731.255.748.521-Jul-2014
NUS_UDS [?] 50.067.024.547.245.047.965.360.658.515.550.837.445.859.962.052.740.848.236.853.145.629-Oct-2014
TTIC-divmbest-rerank [?] 48.162.725.646.943.054.858.458.655.614.647.531.244.751.060.953.536.650.930.150.246.815-Nov-2012
BONN_O2PCPMC_FGT_SEGM [?] 47.864.027.354.139.248.756.657.752.514.254.829.642.258.054.850.236.658.631.648.438.608-Aug-2013
BONN_O2PCPMC_FGT_SEGM [?] 47.563.427.356.137.747.257.959.355.011.550.830.545.058.457.448.634.653.332.447.639.223-Sep-2012
BONNGC_O2P_CPMC_CSI [?] 46.863.626.845.641.747.154.358.655.114.549.030.946.152.658.253.432.044.534.645.343.123-Sep-2012
BONN_CMBR_O2P_CPMC_LIN [?] 46.763.923.844.640.345.559.658.757.111.745.934.943.054.958.051.534.644.129.950.544.523-Sep-2012
FER_WSSS_REGION_SCORE_POOL [?] 38.033.121.727.717.738.455.838.357.913.637.429.243.939.152.444.430.248.726.431.836.314-Jun-2016

Abbreviations

TitleMethodAffiliationContributorsDescriptionDate
Adelaide_Context_CNN_CRF_COCOAdelaide_Context_CNN_CRF_COCOThe University of Adelaide; ACRV; D2DCRCGuosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;Please refer to our technical report: http://arxiv.org/abs/1504.01013. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both CNNs and CRFs. 2015-11-06 07:46:13
Adelaide_Context_CNN_CRF_COCOAdelaide_Context_CNN_CRF_COCOThe University of Adelaide; ACRV; D2DCRCGuosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;Please refer to our technical report: Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation (available at: http://arxiv.org/abs/1504.01013). This technical report will be updated later. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both DCNNs and CRFs. Specifically, we train CRFs whose potential functions are modelled by fully convolutional neural networks (FCNNs). The resulted deep conditional random fields (DCRFs) are thus able to learn complex feature representations; and during the course of learning, dependencies between the output variables are taken into account. As in conventional DCNNs, the training of our model is performed in an end-to-end fashion using back-propagation. Different from directly maximizing likelihood, however, inference may be needed at each gradient descent iteration, which can be computationally very expensive since typically millions of iterations are required. To enable efficient training, we propose to use approximate training, namely, piecewise training of CRFs, avoiding repeated inference. 2015-08-13 04:13:59
Adelaide_Context_CNN_CRF_VOCAdelaide_Context_CNN_CRF_VOCThe University of Adelaide; ACRV; D2DCRCGuosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;Please refer to our technical report: Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation (available at: http://arxiv.org/abs/1504.01013). This technical report will be updated later. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both DCNNs and CRFs. Specifically, we train CRFs whose potential functions are modelled by fully convolutional neural networks (FCNNs). The resulted deep conditional random fields (DCRFs) are thus able to learn complex feature representations; and during the course of learning, dependencies between the output variables are taken into account. As in conventional DCNNs, the training of our model is performed in an end-to-end fashion using back-propagation. Different from directly maximizing likelihood, however, inference may be needed at each gradient descent iteration, which can be computationally very expensive since typically millions of iterations are required. To enable efficient training, we propose to use approximate training, namely, piecewise training of CRFs, avoiding repeated inference. 2015-08-30 11:49:27
High-performance Very Deep FCNAdelaide_VeryDeep_FCN_VOCThe University of Adelaide; D2DCRCZifeng Wu, Chunhua Shen, Anton van den HengelWe propose a method for high-performance semantic image segmentation based on very deep fully convolutional networks. A few design factors are carefully examined to achieve the result. Details can be found in the paper "High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks”, Zifeng Wu, Chunhua Shen, Anton van den Hengel”: http://arxiv.org/abs/1604.04339. Note that the system used for this submission was trained on the augmented VOC 2012 data ONLY. 2016-05-13 04:57:00
O2P Regressor + Composite Statistical InferenceBONNGC_O2P_CPMC_CSI(1) University of Bonn, (2) Georgia Institute of Technology, (3) University of CoimbraJoao Carreira (1,3) Fuxin Li (2) Guy Lebanon (2) Cristian Sminchisescu (1)We utilize a novel probabilistic inference procedure (unpublished yet), Composite Statisitcal Inference (CSI), on semantic segmentation using predictions on overlapping figure-ground hypotheses. Regressor predictions on segment overlaps to the ground truth object are modelled as generated by the true overlap with the ground truth segment plus noise. A model of ground truth overlap is defined by parametrizing on the unknown percentage of each superpixel that belongs to the unknown ground truth. A joint optimization on all the superpixels and all the categories is then performed in order to maximize the likelihood of the SVR predictions. The optimization has a tight convex relaxation so solutions can be expected to be close to the global optimum. A fast and optimal search algorithm is then applied to retrieve each object. CSI takes the intuition from the SVRSEGM inference algorithm that multiple predictions on similar segments can be combined to better consolidate the segment mask. But it fully develops the idea by constructing a probabilistic framework and performing composite MLE jointly on all segments and categories. Therefore it is able to consolidate better object boundaries and handle hard cases when objects interact closely and heavily occlude each other. For each image, we use 150 overlapping figure-ground hypotheses generated by the CPMC algorithm (Carreira and Sminchisescu, PAMI 2012), and linear SVR predictions on them with the novel second order O2P features (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONN_CMBR_O2P_CPMC_LIN) as the input to the inference algorithm.2012-09-23 23:49:02
Linear SVR with second-order pooling.BONN_CMBR_O2P_CPMC_LIN(1) University of Bonn, (2) University of CoimbraJoao Carreira (2,1) Rui Caseiro (2) Jorge Batista (2) Cristian Sminchisescu (1)We present a novel effective local feature aggregation method that we use in conjunction with an existing figure-ground segmentation sampling mechanism. This submission is described in detail in [1]. We sample multiple figure-ground segmentation candidates per image using the Constrained Parametric Min-Cuts (CPMC) algorithm. SIFT, masked SIFT and LBP features are extracted on the whole image, then pooled over each object segmentation candidate to generate global region descriptors. We employ a novel second-order pooling procedure, O2P, with two non-linearities: a tangent space mapping and power normalization. The global region descriptors are passed through linear regressors for each category, then labeled segments in each image having scores above some threshold are pasted onto the image in the order of these scores. Learning is performed using an epsilon-insensitive loss function on overlap with ground truth, similar to [2], but within a linear formulation (using LIBLINEAR). comp6: learning uses all images in the segmentation+detection trainval sets, and external ground truth annotations provided by courtesy of the Berkeley vision group. comp5: one model is trained for each category using the available ground truth segmentations from the 2012 trainval set. Then, on each image having no associated ground truth segmentations, the learned models are used together with bounding box constraints, low-level cues and region competition to generate predicted object segmentations inside all bounding boxes. Afterwards, learning proceeds similarly to the fully annotated case. 1. “Semantic Segmentation with Second-Order Pooling”, Carreira, Caseiro, Batista, Sminchisescu. ECCV 2012. 2. "Object Recognition by Ranking Figure-Ground Hypotheses", Li, Carreira, Sminchisescu. CVPR 2010.2012-09-23 19:11:47
BONN_O2PCPMC_FGT_SEGMBONN_O2PCPMC_FGT_SEGM(1) Universitfy of Bonn, (2) University of Coimbra, (3) Georgia Institute of Technology, (4) Vienna University of TechnologyJoao Carreira(1,2), Adrian Ion(4), Fuxin Li(3), Cristian Sminchisescu(1)We present a joint image segmentation and labeling model which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales using CPMC (Carreira and Sminchisescu, PAMI 2012), constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag (Ion, Carreira, Sminchisescu, ICCV2011), followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure (Ion, Carreira, Sminchisescu, NIPS2011). As meta-features we combine outputs from linear SVRs using novel second order O2P features to predict the overlap between segments and ground-truth objects of each class (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONNCMBR_O2PCPMC_LINEAR), bounding box object detectors, and kernel SVR outputs trained to predict the overlap between segments and ground-truth objects of each class (Carreira, Li, Sminchisescu, IJCV 2012). comp6: the O2P SVR learning uses all images in the segmentation+detection trainval sets, and external ground truth annotations provided by courtesy of the Berkeley vision group.2012-09-23 21:39:35
BONN_O2PCPMC_FGT_SEGMBONN_O2PCPMC_FGT_SEGM(1) Universitfy of Bonn, (2) University of Coimbra, (3) Georgia Institute of Technology, (4) Vienna University of TechnologyJoao Carreira(1,2), Adrian Ion(4), Fuxin Li(3), Cristian Sminchisescu(1) Same as before, except tilings non-maximal2013-08-08 05:54:53
Bayesian Dilation NetworkBayesian Dilation NetworkUniversity of CambridgeAlex Kendallhttp://arxiv.org/abs/1511.026802016-06-07 08:28:00
Bayesian FCNBayesian FCNUniversity of CambridgeAlex Kendallhttp://mi.eng.cam.ac.uk/projects/segnet/2016-06-07 08:36:38
Objectness-aware Semantic SegmentationCASIA_IVA_OASegInstitute of Automation, Chinese Academy of SciencesYuhang Wang, Jing Liu, Yong Li, Jun Fu, Hang Song, Hanqing LuWe propose an objectness-aware semantic segmentation framework (OA-Seg) consisting of two deep networks. One is a lightweight deconvolutional neural network (Light-DCNN) which obviously decreases model size and convergence time with impressive segmentation performance. The other one is an object proposal network (OPN) used to roughly locate object regions. MSCOCO is used to extend training data and CRF is used as post-processing.2016-05-21 01:52:15
CASIA_IVA_SDNCASIA_IVA_SDNNational Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of SciencesJun Fu, Jing Liu, Yuhang Wang, Zhenwei Shen, Zhiwei Fang, Hanqing LuWe propose a stacked deconvolutional network (SDN) for semantic segmentation. SDN can effectively encode input images and recover the spatial resolution for accurate boundary localization. Dense connections and hierarchical supervision are used. CRF is not employed! 2017-06-15 09:45:39
CASIA_SegResNet_CRF_COCOCASIA_SegResNet_CRF_COCOInstitude of Automation, Chinese Academy of SciencesXinze Chen, Guangliang Cheng, Yinghao CaiWe propose a novel semantic segmentation method, which consists of three parts: a SAR-based data augmentation method, a deeper residual network including three effective techniques and an online hard pixels mining. We combine these three parts to train an end-to-end network. 2016-06-03 09:20:50
CCBMCCBMUniversity of TsinghuaQiurui Wang, Chun Yuan, Zhihui Lin, Zhicheng Wang, Xin QiuWe propose a method combined with convolutional neural network and Conditional Boltzmann Machines for object segmentation, called CCBM, which further utilizes human visual border detection method. We use CNNs to extract features and segment them by improved Conditional Boltzmann Machines. We also use Structured Random Forests based method to detect object border for a better effert. Finally, each superpixel is labelled as output. The proposed method for this submission was trained on VOC 2012 Segmentation training data and a subset of COCO 2014 training data.2015-11-29 07:26:11
CMT-FCN-ResNet-CRFCMT-FCN-ResNet-CRFIntel labs China and Tsinghua UniversityLibin Wang, Anbang, Yao, Jianguo Li, Yurong Chen, Li Zhang?We propose a novel coupled multi-task FCN. Both VOC 2012 and COCO dataset are used for training, and CRF is applied as post-processing step.2016-08-02 09:57:05
CRF as RNNCRF_RNNUniversity of OxfordShuai Zheng; Sadeep Jayasumana; Bernardino Romera-Paredes; Philip TorrWe introduce a new form of convolutional neural network, called CRF-RNN, which expresses a conditional random field (CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. See the paper: "Conditional Random Fields as Recurrent Neural Networks".2015-02-10 11:03:16
Deep Parsing NetworkCUHK_DPN_COCOThe Chinese University of Hong KongZiwei Liu*, Xiaoxiao Li*, Ping Luo, Chen Change Loy, Xiaoou TangThis work addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN architecture to model unary terms and additional layers are carefully devised to approximate the mean field algorithm (MF) for pairwise terms. It has several appealing properties. First, different from the recent works that combined CNN and MRF, where many iterations of MF were required for each training image during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing works as its special cases. Third, DPN makes MF easier to be parallelized and speeded up in Graphical Processing Unit (GPU). The system used for this submission was trained on augmented VOC 2012 and MS-COCO 2014 training set. Please refer to the paper "Semantic Image Segmentation via Deep Parsing Network" (http://arxiv.org/abs/1509.02634) for further information. 2015-09-22 16:52:27
Deep G-CRF (QO) combined with Deeplab-v2CentraleSupelec Deep G-CRFCentraleSupelec / INRIASiddhartha Chandra & Iasonas KokkinosWe employ the deep Gaussian CRF Quadratic Optimization formulation to learn pairwise terms for semantic segmentation using the Deeplab-v2-resnet-101 network. Additionally, we use the dense-CRF post-processing to refine object boundaries. This work is an accepted paper at ECCV 2016 and will be presented at the conference. Please refer to our arXiv report here: http://arxiv.org/abs/1603.08358 We will update the report with more details soon. 2016-08-12 11:21:28
"Super-Human" boundaries combined with DeeplabCentraleSuperBoundaries++CentraleSupelec / INRIAIasonas KokkinosWe exploit our "super-human" boundary detector with a multi-resolution variant of the Deeplab system (LargeFOV, pre-trained on MSCOCO). The boundary information comes in the form of Normalized Cut eigenvectors used in DenseCRF inference and boundary-dependent pairwise terms, used in Graph-Cut inference. This is an updated version of our earlier submission, using more training rounds and a single-shot training algorithm. Details on the system and our "super human" boundary detector are provided in http://arxiv.org/abs/1511.073862016-01-13 16:00:02
DP_ResNet_CRFDP_ResNet_CRF(1) Beijing University of Posts and Telecommunications (BUPT); (2) Beijing Moshanghua Tech (DressPlus)Lu Yang(1, 2), Qing Song(1), Bin Liu(2), Yuhang He(2), Zuoxin Li(2), Xiongwei Xia(2)Our network is based on ResNet-152, dilation convolution \ data augmentation \ pre-train on coco \ multi scale test are used for this submission. We also use densecrf as post-processing to refine object boundaries.2016-11-10 12:05:10
Deep Layer Cascade (LC)Deep Layer Cascade (LC)The Chinese University of Hong KongXiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou TangWe propose a novel deep layer cascade (LC) method to improve the accuracy and speed of semantic segmentation. Unlike the conventional model cascade (MC) that is composed of multiple independent models, LC treats a single deep model as a cascade of several sub-models. Earlier sub-models are trained to handle easy and confident regions, and they progressively feed-forward harder regions to the next sub-model for processing. Convolutions are only calculated on these regions to reduce computations. The proposed method possesses several advantages. First, LC classifies most of the easy regions in the shallow stage and makes deeper stage focuses on a few hard regions. Such an adaptive and 'difficulty-aware' learning improves segmentation performance. Second, LC accelerates both training and testing of deep network thanks to early decisions in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable framework, allowing joint learning of all sub-models. We evaluate our method on PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and fast speed. Please refer to the paper "Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade" (https://arxiv.org/abs/1704.01344) for further information. 2017-04-06 14:46:45
DeepLab-CRFDeepLab-CRF(1) UCLA (2) Google (3) TTIC (4) ECP / INRIALiang-Chieh Chen (1) and George Papandreou (2,3) and Iasonas Kokkinos (4) and Kevin Murphy (2) and Alan L. Yuille (1)This work brings together methods from Deep Convolutional Neural Networks (DCNNs) and probabilistic graphical models for the task of semantic image segmentation. We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Efficient computation is achieved by (i) careful network re-purposing and (ii) a novel application of the ’hole’ algorithm from the wavelet community, allowing dense computation of neural net responses at 8 frames per second on a modern GPU. See http://arxiv.org/abs/1412.7062 for further information.2014-12-23 02:29:44
DeepLab-CRF-AttentionDeepLab-CRF-Attention(1) UCLA (2) BaiduLiang-Chieh Chen (1) and Yi Yang (2) and Jiang Wang (2) and Wei Xu (2) and Alan L. Yuille (1)This work is the extension of DeepLab-CRF-COCO-LargeFOV (pretrained on MS-COCO) by further incorporating (1) multi-scale inputs (2) extra supervision and (3) attention model. Further information will be provided in an *updated* version of http://arxiv.org/abs/1511.03339.2016-02-03 23:10:45
DeepLab-CRF-Attention-DTDeepLab-CRF-Attention-DT(1) UCLA (2) GoogleLiang-Chieh Chen (1) and Jonathan T. Barron (2) and George Papandreou (2) and Kevin Murphy (2) and Alan L. Yuille (1)This work is the extension of DeepLab-CRF-Attention by further incorporating a discriminatively trained Domain Transform. Further information will be provided in an *updated* version of http://arxiv.org/abs/1511.03328.2016-02-03 23:13:01
DeepLab-CRF-COCO-LargeFOVDeepLab-CRF-COCO-LargeFOV(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) Similar to DeepLab-CRF-COCO-Strong, but the network has a larger field-of-view on the image. Further information will be provided in an updated version of http://arxiv.org/abs/1502.02734.2015-03-18 04:09:39
DeepLab-CRF-COCO-StrongDeepLab-CRF-COCO-Strong(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) Similar to DeepLab-CRF, but network training also included the pixel-level semantic segmentation annotations of the MS-COCO (v. 2014) dataset. See http://arxiv.org/abs/1502.02734 for further information.2015-02-11 01:44:22
DeepLab-CRF-LargeFOVDeepLab-CRF-LargeFOV(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) Similar to DeepLab-CRF, but the network has a larger field-of-view on the image. Note that the model has NOT been fine-tuned on MS-COCO dataset. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062.2015-03-28 17:22:26
DeepLab-CRF-MScDeepLab-CRF-MSc(1) UCLA (2) Google (3) TTIC (4) ECP / INRIALiang-Chieh Chen (1) and George Papandreou (2,3) and Iasonas Kokkinos (4) and Kevin Murphy (2) and Alan L. Yuille (1)Similar to DeepLab-CRF, except that multiscale features (direct connections from intermediate layers to the classifier) are also exploited. Specifically, we attach to the input image and each of the first four max pooling layers a two-layer MLP (first layer: 128 3x3 convolutional filters, second layer: 128 1x1 convolutional filters) whose score map is concatenated to the VGG final layer score map. The final score map fed into the softmax layer thus consists of 4,096 + 5 * 128 = 4,736 channels.2014-12-30 02:52:40
DeepLab-MSc-CRF-LargeFOVDeepLab-MSc-CRF-LargeFOV(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) Similar to DeepLab-MSc-CRF, but the network has a larger field-of-view on the image. Note that the model has NOT been fine-tuned on MS-COCO dataset. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062.2015-04-02 06:57:21
DeepLab-MSc-CRF-LargeFOV-COCO-CrossJointDeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2)Similar to Deeplab-CRF model, but with feature extraction at multiple network levels and large field of view. We jointly train DeepLab on Pascal VOC 2012 and MS-COCO, sharing the top-level network weights for the common classes, using pixel-level annotation in both datasets. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062 and http://arxiv.org/abs/1502.02734.2015-04-26 17:48:09
DeepLabv2-CRFDeepLabv2-CRF(1) UCLA (2) Google (3) ECP / INRIALiang-Chieh Chen (1,2) and George Papandreou (2) and Iasonas Kokkinos (3) and Kevin Murphy (2) and Alan L. Yuille (1)DeepLabv2-CRF is based on three main methods. First, we employ convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool to repurpose ResNet-101 (trained on image classification task) in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within DCNNs. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and fully connected Conditional Random Fields (CRFs). See http://arxiv.org/abs/1606.00915 for further information.2016-06-06 01:59:20
DeepLabv3DeepLabv3Google Inc.Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig AdamIn this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks. We propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. See http://arxiv.org/abs/1706.05587 for further information.2017-06-20 01:59:26
DeepSqueeNetDeepSqueeNetSun Yat-sen University, SYSUHongPeng Wu,Long Chen, Kai HuangWe propose a method for semantic image segmentation. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1)SmallerDNNsrequirelesscommunicationacrossservers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an au-tonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To pro-vide all of these advantages, we propose a CNN architecture called DeepSqueeNet to semantic image segmentation . It based on SqueezeNet and VGG16. DeepSqueeNet achieves Deeplab(Based on VGG16) accuracy on semantic image segmentation with 10x fewer parameters.2016-07-20 13:16:16
DeepSqueeNet_CRFDeepSqueeNet_CRFSun Yat-sen University, SYSUHongPeng Wu,Long Chen, Kai HuangWe propose a method for semantic image segmentation. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1)SmallerDNNsrequirelesscommunicationacrossservers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an au-tonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To pro-vide all of these advantages, we propose a CNN architecture called DeepSqueeNet to semantic image segmentation . It based on SqueezeNet and VGG16. DeepSqueeNet achieves Deeplab(Based on VGG16) accuracy on semantic image segmentation with 10x fewer parameters. we add CRF2016-07-21 12:47:19
Dual Multi-Scale Manifold Ranking NetworkDual-Multi-Reso-MRWuhan UniversityMi Zhang, Ye Lv, Min Luo, Jiasi YiWe proposed a multi-scale network which utilize the dilated and non-dilated convolutional network as a dual. In both networks, a manifold ranking optimization method is embedded to optimize in a single stream jointly, i.e. no need to train the unary and pairwise network separately. And such a feedforward network makes it possible to train in an end-to-end fashion and guarantee a global optimal.2016-11-03 12:27:49
Fully convolutional netFCN-8sUC BerkeleyJonathan Long, Evan Shelhamer, Trevor DarrellWe apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net.2014-11-12 09:08:39
Fully convolutional netFCN-8s-heavyUC BerkeleyJonathan Long, Evan Shelhamer, Trevor DarrellWe apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net. The network is learned online with high momentum for better optimization.2016-02-06 09:57:31
FCN with Cross-layer Concat and Multi-scale PredFCN_CLC_MSPNational Tsing Hua University, TaiwanTun-Huai Shih, Chiou-Ting HsuWe replace the original fc layers in VGG-16 with several conv and pool layers to extract hierarchical features (Pool3-5 and additional pool6-8). We then use pool3-8 to generate multi-scale predictions, and aggregate them to derive the dense prediction result. To jointly exploit the information from lower- and higher-level layers when making prediction, we adopt cross-layer concatenation to combine poolx features (lower-level) with prediction result of coarser stream (high-level). This makes the predictions of finer streams more robust. We do not adopt any pre- or post- processing steps. The number of parameters is about 36M, while the original FCN is 134M. We train all prediction streams at the same time using VOC additional annotated images (10582 in total), and it takes less than one day to train our FCN model on a single GTX Titan X GPU.2016-07-01 04:27:14
Weaky sup. segmentation by region scores' poolingFER_WSSS_REGION_SCORE_POOLUniversity of ZagrebJosip Krapac Sinisa SegvicWe address the problem of semantic segmentation of objects in weakly supervised setting, when only image-wide labels are available. We describe an image with a set of pre-trained convolutional features (from layer conv5.4 of 19-layer VGG-E network) and embed this set into a Fisher vector (64 component GMM, diagonal covariance for components, normalization only with inverse of Fisher matrix). We learn a linear classifier (logistic regression), apply the learned classifier on the set of all image regions (efficiently, using integral images), and propagate region scores back to the pixels. Compared to the alternatives the proposed method is simple, fast in inference, and especially in training. The details are described in the conference paper Krapac, Segvic: "Weakly-supervised semantic segmentation by redistributing region scores back to the pixels", GCPR 20162016-06-14 15:02:23
HikSeg_COCOHikSeg_COCOHikvision Research InstituteHaiming Sun, Di Xie, Shiliang PuWe begin with DilatedNet, and add a module which multi-scale features are combined step-wise. The network is able to learn to put different weights to features of different scales. This submission is first trained on COCO training set and validation set, then fine-tuned on PASCAL training set.2016-10-02 09:16:41
HypercolumnHypercolumnUC BerkeleyBharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra MalikRecognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the con- trary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activa- tions of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmenta- tion, where we improve state-of-the-art from 49.7 mean APr to 60.0, keypoint localization, where we get a 3.3 point boost over and part labeling, where we show a 6.6 point gain over a strong baseline.2015-04-09 02:01:36
Laplacian reconstruction and refinementLRR_4x_COCOUniversity of California IrvineGolnaz Ghiasi, Charless C. FowlkesWe introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission is based on VGG-16 and it was trained on augmented PASCAL VOC and MS-COCO data. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation (http://arxiv.org/abs/1605.02264). 2016-06-16 06:19:08
Laplacian reconstruction and refinementLRR_4x_ResNet_COCOUniversity of California IrvineGolnaz Ghiasi Charless C. FowlkesWe introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission is based on ResNet-101 and it was trained on augmented PASCAL VOC and MS-COCO data. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation (http://arxiv.org/abs/1605.02264). 2016-07-18 19:07:32
Laplacian reconstruction and refinementLRR_4x_de_pyramid_VOCUniversity of California IrvineCharless C. Fowlkes Golnaz GhiasiWe introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission was trained on augmented PASCAL VOC. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation 2016-06-07 03:55:11
ICCV submission, paper ID 3205Ladder_DenseNetanonymousanonymousICCV submission paper ID 3205: Ladder-style DenseNets for Semantic Segmentation of Large Natural Images2017-05-26 13:58:49
Large_Kernel_MattersLarge_Kernel_MattersTsinghua UniversityPeng Chao, Yu Gang, Zhang XiangyuWe use the large kernel to generate the feature map and score map, resnet101 is applied with COCO, SBD datasets. No CRF or similar post processing methods are employed! No Multiscale2017-03-16 01:58:16
Deep Gaussian CRFMERL_DEEP_GCRFMitsubishi Electric Research LaboratoriesRaviteja Vemulapalli Oncel Tuzel We use two deep networks, one for generating unary potentials and the other for generating pairwise potentials. Then we use Gaussian CRF model for structured prediction. 2015-10-17 14:55:31
Gaussian CRF on top of Deeplab CNNMERL_UMD_Deep_GCRF_COCOUniversity of Maryland, College ParkRaviteja Vemulapalli (UMD) Oncel Tuzel (MERL) Ming-Yu Liu (MERL) Rama Chellappa (UMD)We use two deep networks, one for generating unary potentials and the other for generating pairwise potentials. Then we use a Gaussian CRF model for structured prediction. The entire model is trained end-to-end.2016-01-15 05:23:48
Box-SupervisionMSRA_BoxSupMicrosoft Research AsiaJifeng Dai, Kaiming He, Jian SunBoxSup makes use of bounding box annotations to supervise convolutional networks for semantic segmentation. From these boxes, we estimate segmentation masks with the help of region proposals. These masks are used to update the convolutional network, which is in turn fed back to mask estimation. This procedure is iterated. This result is achieved by semi-supervised training on the segmentation masks from PASCAL VOC and a large amount of bounding boxes from Microsoft COCO. See http://arxiv.org/abs/1503.01640 for details.2015-02-10 09:35:40
MSRA_BoxSupMSRA_BoxSupMicrosoft Research AsiaJifeng Dai, Kaiming He, Jian SunThis is an implementation of "BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation". We train a BoxSup model using the union set of VOC 2007 boxes, COCO boxes, and the augmented VOC 2012 training set. See http://arxiv.org/abs/1503.01640 for details.2015-05-18 09:42:54
Convolutional Feature MaskingMSRA_CFMMicrosoft Research AsiaJifeng Dai, Kaiming He, Jian SunThe method exploits shape information via ``masking" convolutional features. The proposal segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The CNN features of segments are directly masked out from these maps and used to train classifiers for recognition. Competitive accuracy and compelling computational speed are demonstrated by the proposed method. We achieve this result by utilizing segment proposal generated by Multi-scale Combinatorial Grouping (MCG), and initializing network parameters from the VGG 16-layer net. See http://arxiv.org/abs/1412.1283 for details.2014-12-17 02:56:52
Multipath-RefineNetMultipath-RefineNetThe University of Adelaide; ACRV;Guosheng Lin; Anton Milan; Chunhua Shen; Ian Reid;Please refer to our technical report for details: "RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation" (https://arxiv.org/abs/1611.06612). Our source code is available at: https://github.com/guosheng/refinenet2017-01-17 18:03:57
Unified Object Detection and Semantic SegmentationNUS_UDSNUSJian Dong, Qiang Chen, Shuicheng Yan, Alan YuilleMotivated by the complementary effect observed from the typical failure cases of object detection and semantic segmentation, we propose a uni?ed framework for joint object detection and semantic segmentation [1]. By enforcing the consistency between final detection and segmentation results, our unified framework can effectively leverage the advantages of leading techniques for these two tasks. Furthermore, both local and global context information are integrated into the framework to better distinguish the ambiguous samples. By jointly optimizing the model parameters for all the components, the relative importance of different component is automatically learned for each category to guarantee the overall performance. [1] Jian Dong, Qiang Chen, Shuicheng Yan, Alan Yuille: Towards Unified Object Detection and Semantic Segmentation. ECCV 20142014-10-29 16:07:10
Joint a network to guided and maskingOBP-HJLCNnational central university Jia-Ching Wang , Chien-Yao Wang, Jyun-Hong Li We proposed a hierarchical joint guided networks which has ability to predict objects greater and finer. We also proposed a novel way to guided segmentation by object and boundary.2016-09-13 15:21:45
Oxford_TVG_CRF_RNN_COCOOxford_TVG_CRF_RNN_COCO[1] University of Oxford / [2] Baidu IDLShuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, Berkeley augmented data and a subset of COCO 2014 train data. More details will be available in the paper http://arxiv.org/abs/1502.03240.2015-04-22 11:26:57
Oxford_TVG_CRF_RNN_VOCOxford_TVG_CRF_RNN_VOC[1] University of Oxford / [2] Baidu IDLShuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, and Berkeley augmented data (COCO dataset was not used). More details will be available in the paper http://arxiv.org/abs/1502.03240. 2015-04-22 10:24:43
Higher Order CRF in CNNOxford_TVG_HO_CRFUniversity of OxfordAnurag Arnab Sadeep Jayasumana Shuai Zheng Philip TorrWe integrate a conditional random field with higher order potentials into a deep neural network. Our higher order potentials are based on object detector outputs and superpixel oversegmentation, and formulated such that their corresponding mean-field updates are differentiable. For further details, please refer to http://arxiv.org/abs/1511.08119 2016-03-16 21:12:47
POSTECH_DeconvNet_CRF_VOCPOSTECH_DeconvNet_CRF_VOCPOSTECH (Pohang University of Science and Technology)Hyeonwoo Noh, Seunghoon Hong, Bohyung Han.We propose a novel semantic segmentation algorithm by learning a deconvolution network. Our deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. The trained network is applied to each proposal in an input image, and the final semantic segmentation map is constructed by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks; our segmentation method typically identifies more detailed structures and handles objects in multiple scales more naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset without external training data. See http://arxiv.org/abs/1505.04366 for details.2015-08-18 18:42:18
POSTECH_EDeconvNet_CRF_VOCPOSTECH_EDeconvNet_CRF_VOCPOSTECH(Pohang University of Science and Technology)Hyeonwoo Noh, Seunghoon Hong, Bohyung HanWe propose a novel semantic segmentation algorithm by learning a deconvolution network. Our deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. The trained network is applied to each proposal in an input image, and the final semantic segmentation map is constructed by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks; our segmentation method typically identifies more detailed structures and handles objects in multiple scales more naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset without external training data. 2015-04-22 21:33:03
PSPNetPSPNetCUHK, SenseTimeHengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya JiaScene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU score as 85.4% on PASCAL VOC 2012 and 80.2% on Cityscapes. https://arxiv.org/abs/1612.011052016-12-06 02:22:13
Residual Forest classifier with FCN featuresRRF-4sMonash UniversityYan Zuo, Tom DrummondWe replace the solver component of FCN with a Random Residual Forest (RRF) Classifier and treat FCN as a generic feature extractor to train the RRF classifier2016-11-30 23:31:43
ResNet-38 with COCOResNet-38_COCOThe University of AdelaideZifeng Wu, Chunhua Shen, Anton van den HengelPre-trained with COCO, and tested with multiple scales. See our report https://arxiv.org/abs/1611.10080 for details.2017-01-22 04:44:14
ResNet-38 Multi-scaleResNet-38_MSThe University of AdelaideZifeng Wu, Chunhua Shen, Anton van den HengelSingle model; multi-scale testing; NO COCO; NO CRF-based post-processing. For more details, refer to our report https://arxiv.org/abs/1611.10080 and code https://github.com/itijyou/ademxapp.2016-12-09 12:19:24
SDSSDSUC BerkeleyBharath Hariharan Pablo Arbelaez Ross Girshick Jitendra MalikWe aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [1]), introducing a novel architecture tailored for SDS. We then use category-specific, top-down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 4 point boost (8% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work.2014-07-21 22:46:22
SYSU_SceneParsing_COCOSYSU_SceneParsing_COCOSun Yat-sen UniversityGuangrun Wang, Liang LinScene Parsing with CNN. CNN is ResNet-101.2017-02-22 06:34:43
SegModelSegModelPeking UniverisityFalong Shen, Peking UniversityDeep fully convolutional networks with conditional random field. Trained on MSCOCO trainval set and Pascal VOC 12 train set.2016-08-23 04:04:21
SegNetSegNetUniversity of CambridgeAlex Kendall, Vijay Badrinarayanan and Roberto CipollaSegNet is a memory efficient real time deep convolutional encoder-decoder architecture. For more information, please see our publications and web demo at: http://mi.eng.cam.ac.uk/projects/segnet/2015-11-10 09:48:12
Diverse M-Best with discriminative rerankingTTIC-divmbest-rerank(1) Toyota Technological Institute at Chicago, (2) Virginia TechPayman Yadollahpour (1), Dhruv Batra (1,2), Greg Shakhnarovich (1)We generate a set of M=10 full image segmentations using Diverse M-Best algorithm from [BYGS'12], applied to inference in the O2P model (Carreira et al., 2012). Then we discriminatively train a reranker based on a novel set of features. The learning of the reranker uses relative loss, with the objective to minimize gap with the oracle (the hindsight-best of the M segmentations), and relies on slack-rescaling structural SVM. The details are described in [YBS'13]. References: [BYGS'12] Batra, Yadollahpour, Guzman, Shakhnarovich, ECCV 2012. [YBS'13] Yadollahpour, Batra, Shakhnarovich, CVPR 2013.2012-11-15 04:03:01
Feedforward segmentation with zoom-out featuresTTI_zoomoutTTI-ChicagoMohammadreza Mostajabi, Payman Yadollahpour, Gregory ShakhnarovichOur method uses a feedforward network to directly label superpixels. For each superpixel we use features extracted from a nested set of "zoom-out" regions, from purely local to image-level. 2014-11-17 04:57:49
Feedforward segmentation with zoom-out featuresTTI_zoomout_16TTI-ChicagoMohammadreza Mostajabi, Payman Yadollahpour, Gregory ShakhnarovichSame as before, except using VGG 16-layer network instead of VGG CNN-S network. Fine-tuning on VOC-2012 was not performed. See http://arxiv.org/abs/1412.0774 for details.2014-11-24 08:54:05
Feedforward semantic segmentation with zoom-out featuresTTI_zoomout_v2TTI-ChicagoMohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich Similar to TTI_zoomout_16, except the way that we set the number and scope of zoom-out levels. In this version, zoom-out levels correspond to receptive field sizes of different layers in a convolutional neural network. Our model is trained only on VOC-2012. Details are provided in our CVPR 2015 paper available at http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mostajabi_Feedforward_Semantic_Segmentation_2015_CVPR_paper.pdf.2015-03-30 18:40:04
TuSimpleTuSimpleUC San Diego, CMU, UIUC, TuSimplePanqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, Garrison CottrellWe use ResNet 152, COCO Data, and denseCRF2016-11-09 01:10:04
Global Deconvolutional Network with CRFUNIST_GDN_CRFUlsan National Institute of Science and Technology (UNIST)Vladimir Nekrasov, Janghoon Ju, Jaesik ChoiWe propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline DeepLab-CRF. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.2016-07-29 07:23:03
Global Deconvolutional Network with CRFUNIST_GDN_CRF_ENSUlsan National Institute of Science and Technology (UNIST)Vladimir Nekrasov, Janghoon Ju, Jaesik ChoiWe propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline DeepLab-CRF. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.2016-07-29 07:25:56
Global Deconvolutional NetworkUNIST_GDN_FCNUlsan National Institute of Science and Technology (UNIST)Vladimir Nekrasov, Janghoon Ju, Jaesik ChoiWe propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline FCN-32s. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.2016-07-27 01:39:17
Global Deconvolutional NetworkUNIST_GDN_FCN_FCUlsan National Institute of Science and Technology (UNIST)Vladimir Nekrasov, Janghoon Ju, Jaesik ChoiWe propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Besides that, we append a fully-connected layer after the down-sampled image to refine current predictions. Our model shows superior performance over baseline FCN-32s and even outperforms more powerful multi-scale variant. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.2016-07-27 01:49:02
Fully convolutional neural net using VGG19VGG19_FCN-Sharif Amit Kamran , Md. Asif Bin Khaled , Sabit Bin Kabir , Dr. Hasan Muhammad , Moin Mostakim We use VGG-19 classification neural net and then make it fully convolulational. Moreover, we use skip architectures by concatenating upsampled pool 1 to 4 with the score layer to get finer features. Training was done on two stages, first on Pascal VOC training dataset , secondly on both SBD training plus validation datasets. 2017-04-06 23:22:53
CNN segmentation based on manifold learningWeak_manifold_CNNUniversity of Central FloridaMarzieh Edraki CNN manifold learning for segmentation 2016-11-11 23:34:20