Segmentation Results: VOC2012 BETA

Competition "comp6" (train on own data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.Entries equivalent to a selected submission are determined by bootstrapping the performance measure, and assessing if the differences between the selected submission and the others are not statistically significant (see sec 3.5 in VOC 2014 paper).

Average Precision (AP %)

  mean

aero
plane
bicycle

bird

boat

bottle

bus

car

cat

chair

cow

dining
table
dog

horse

motor
bike
person

potted
plant
sheep

sofa

train

tv/
monitor
submission
date
** SegNeXt ** [?] 90.698.385.097.688.391.397.591.498.360.496.785.095.798.294.292.782.597.377.793.184.319-Sep-2022
** EfficientNet-L2 + NAS-FPN + Noisy Student ** [?] 90.598.084.889.688.291.098.393.098.557.598.481.898.498.095.893.283.297.875.091.890.015-Jun-2020
DeepLabv3+_JFT [?] 89.097.577.996.280.490.898.395.597.658.896.179.295.097.394.193.878.595.574.493.881.609-Feb-2018
** RecoNet152_coco ** [?] 89.097.380.496.583.889.597.695.497.750.196.882.695.197.795.192.680.295.271.792.183.826-Oct-2019
SRC-B-MachineLearningLab [?] 88.597.278.697.180.689.797.493.796.759.195.481.193.297.594.292.973.593.374.291.085.019-Apr-2018
DeepLabv3+_AASPP [?] 88.597.480.397.180.189.397.494.196.961.995.177.294.297.594.493.072.493.872.693.383.322-May-2018
** SepaNet ** [?] 88.397.280.296.280.089.297.394.797.748.695.081.695.297.595.192.779.595.468.890.983.425-Oct-2019
** EMANet152 ** [?] 88.296.879.496.083.688.197.195.096.649.495.477.894.896.895.192.079.395.968.591.785.615-Aug-2019
** SpDConv2 ** [?] 88.196.979.796.880.287.898.092.396.057.295.882.192.397.393.693.071.492.375.890.783.806-Jan-2021
** KSAC-H ** [?] 88.197.279.996.376.586.597.594.596.954.895.381.493.797.294.092.877.394.473.591.183.426-Oct-2019
MSCI [?] 88.096.876.897.080.689.397.493.897.156.794.378.393.597.194.092.872.392.673.690.885.408-Jul-2018
** A new feature fusion method: FillIn ** [?] 88.097.180.896.777.689.297.492.296.958.394.379.493.197.394.493.273.693.072.689.783.425-May-2020
ExFuse [?] 87.996.880.397.082.587.896.392.696.453.394.378.494.194.991.692.381.794.870.390.183.822-May-2018
DeepLabv3+ [?] 87.897.077.197.179.389.397.493.296.656.995.079.293.197.094.092.871.392.972.491.084.909-Feb-2018
** CaCNet ** [?] 87.597.180.396.179.786.797.293.896.445.595.082.192.797.094.691.878.295.465.792.382.229-May-2020
** CFNet ** [?] 87.296.779.794.378.483.097.791.696.750.195.379.693.697.294.291.778.495.469.690.081.412-Jun-2019
DeepLabv3-JFT [?] 86.996.973.295.578.486.596.890.397.151.495.073.494.096.894.092.381.595.467.290.881.805-Aug-2017
DIS [?] 86.894.073.393.579.184.895.489.593.453.694.879.093.695.291.589.678.193.079.494.381.313-Sep-2017
** Gluon DeepLabV3 152 ** [?] 86.796.574.396.180.285.297.093.896.449.793.677.695.195.393.989.675.894.470.889.778.703-Oct-2018
CASIA_IVA_SDN [?] 86.696.978.696.079.684.197.191.996.648.594.378.993.695.592.191.175.093.864.889.084.629-Jul-2017
** APDN ** [?] 86.494.565.494.282.788.195.791.795.745.594.382.893.894.892.491.773.793.472.891.982.428-May-2019
IDW-CNN [?] 86.394.867.393.474.884.695.389.693.654.194.979.093.395.591.789.277.593.779.294.080.830-Jun-2017
DFN [?] 86.296.478.695.579.186.497.191.495.047.792.977.291.096.792.291.776.593.164.488.381.215-Jan-2018
** GluonCV DeepLabV3 ** [?] 86.296.369.793.576.286.596.592.295.847.895.081.693.096.091.290.777.194.768.989.381.707-Sep-2018
EncNet [?] 85.995.376.994.280.285.396.590.896.347.993.980.092.496.690.591.570.993.666.587.780.815-Mar-2018
** HamNet_w/o_COCO ** [?] 85.996.874.696.575.379.697.493.497.342.594.076.195.396.391.091.078.493.268.790.080.725-Jan-2021
HPN [?] 85.894.167.095.281.988.395.590.495.940.092.782.591.795.392.691.673.694.169.491.181.913-Dec-2017
DeepLabv3 [?] 85.796.476.692.777.887.696.790.295.447.593.476.391.497.291.092.171.390.968.990.879.320-Jun-2017
** XC-FLATTENET ** [?] 85.796.579.295.575.384.395.991.393.945.195.979.288.896.791.691.175.794.062.887.782.617-Jan-2020
** Auto-DeepLab-L ** [?] 85.696.577.394.874.184.097.188.794.553.591.679.288.494.290.291.275.190.170.789.179.711-Jan-2019
** DP-CAN_decoder ** [?] 85.595.977.891.675.081.796.692.497.142.793.574.193.995.091.491.278.194.666.589.879.126-Jan-2019
PSPNet [?] 85.495.872.795.078.984.494.792.095.743.191.080.391.396.392.390.171.594.466.988.882.006-Dec-2016
** Res2Net ** [?] 85.396.177.696.177.384.596.792.595.040.591.978.392.293.792.789.677.693.763.587.378.622-Feb-2020
** CTNet ** [?] 85.396.175.996.878.082.495.392.396.742.093.871.293.895.090.590.677.995.262.989.578.429-Oct-2020
** GluonCV PSP ** [?] 85.195.770.992.875.685.096.591.795.041.892.378.890.495.693.490.676.193.566.789.578.407-Sep-2018
** ResNet-38_COCO ** [?] 84.996.275.295.474.481.793.789.992.548.292.079.990.195.591.891.273.090.565.488.780.622-Jan-2017
** DP-CAN ** [?] 84.696.577.787.673.979.996.892.995.740.892.974.091.795.092.589.777.294.664.690.277.125-Jan-2019
** DCANet ** [?] 84.496.044.895.175.185.897.291.095.047.594.575.893.996.092.289.774.595.466.391.179.813-Jan-2020
** resnet 101 + fast laddernet ** [?] 84.295.473.994.975.783.296.391.293.935.390.079.490.294.292.890.173.292.364.588.077.529-Oct-2018
Multipath-RefineNet [?] 84.295.073.293.578.184.895.689.894.143.792.077.290.893.488.688.170.192.964.387.778.817-Jan-2017
FDNet_16s [?] 84.095.477.995.969.180.696.492.695.540.592.670.693.893.190.489.971.292.763.188.577.722-Mar-2018
PAN [?] 84.095.775.294.073.779.696.493.794.140.593.372.489.194.191.689.573.693.262.887.378.604-Jul-2018
** GluonCV FCN ** [?] 83.694.859.594.671.581.995.691.293.942.191.377.091.593.291.090.074.092.568.188.677.207-Sep-2018
** multi-scale feature fusion network ** [?] 83.696.076.295.470.782.195.090.492.740.292.575.788.696.191.088.472.292.760.785.376.826-Nov-2018
Large_Kernel_Matters [?] 83.695.368.794.172.682.496.089.393.047.889.670.889.293.390.191.272.089.867.888.976.916-Mar-2017
** LDN-161 ** [?] 83.693.476.692.770.977.696.790.296.347.891.272.692.893.088.788.172.690.963.589.474.418-Apr-2019
** Xception65_ConcatASPP_Decoder ** [?] 83.594.344.992.877.485.596.791.194.651.091.971.891.295.392.890.569.691.766.388.380.726-Jul-2019
** DREN ** [?] 83.594.770.694.173.682.595.487.792.344.290.275.189.794.590.488.968.391.367.687.977.129-Mar-2019
TKCNet [?] 83.294.746.594.977.783.792.692.294.945.391.172.490.795.891.690.369.993.862.188.782.520-Apr-2018
ResNet-38_MS [?] 83.195.272.595.170.878.591.790.092.441.990.873.990.693.890.589.572.689.863.287.879.109-Dec-2016
ResNet_DUC_HDC [?] 83.192.164.694.771.081.094.689.794.945.693.774.492.095.190.088.769.190.462.786.478.201-Mar-2017
** dsanet ** [?] 83.093.566.095.377.482.495.491.895.436.192.074.292.093.390.388.473.892.357.587.073.523-Nov-2019
Deep Layer Cascade (LC) [?] 82.785.566.794.567.284.096.189.893.547.290.471.588.991.789.289.170.489.470.784.279.606-Apr-2017
** AAF_PSPNet ** [?] 82.291.372.990.768.277.795.590.794.740.989.572.691.694.188.388.867.392.962.685.274.021-Aug-2018
SegModel [?] 81.893.660.293.669.176.496.388.295.537.990.873.391.194.388.688.664.890.163.787.378.223-Aug-2016
** DeepLab_XI ** [?] 81.696.245.094.976.382.196.183.295.047.994.151.292.796.489.390.958.992.468.290.176.907-May-2019
** xing ** [?] 81.595.542.194.475.377.996.092.494.642.494.859.192.395.188.888.968.894.756.588.977.010-Jul-2020
HikSeg_COCO [?] 81.495.064.291.579.078.793.488.494.345.889.665.290.692.888.787.562.488.456.486.275.302-Oct-2016
dscnn [?] 81.294.058.591.369.278.295.589.892.938.590.370.290.893.587.087.463.489.565.188.975.825-May-2018
MSRSegNet-UW [?] 81.093.764.192.568.979.791.286.490.441.988.372.689.390.286.086.667.289.566.583.776.623-Nov-2017
** MasksegNet ** [?] 81.095.343.993.472.980.591.186.191.944.287.765.890.993.292.490.272.092.060.686.374.416-May-2019
Feature_Pyramids [?] 81.093.960.286.870.775.392.991.392.042.790.071.388.792.988.889.360.788.365.787.776.206-Jun-2018
DP_ResNet_CRF [?] 81.094.059.591.868.175.995.288.993.237.790.870.889.292.787.787.965.590.362.687.275.510-Nov-2016
ResSegNet [?] 80.493.665.292.467.074.993.988.592.837.488.872.789.191.988.786.668.685.959.182.073.328-May-2018
OBP-HJLCN [?] 80.492.754.891.668.076.995.789.392.635.289.069.389.492.787.987.566.888.562.286.176.213-Sep-2016
CentraleSupelec Deep G-CRF [?] 80.292.961.291.066.377.795.388.992.433.888.469.189.892.987.787.562.689.959.287.174.212-Aug-2016
CMT-FCN-ResNet-CRF [?] 80.092.555.392.266.076.995.188.693.935.187.671.689.392.887.988.062.088.059.786.175.702-Aug-2016
DeepLabv2-CRF [?] 79.792.660.491.663.476.395.088.492.632.788.567.689.692.187.087.463.388.360.086.874.506-Jun-2016
** PSP_flow ** [?] 79.486.244.293.472.175.893.791.295.038.686.763.989.089.490.488.464.491.860.982.673.813-Jul-2021
LRR_4x_ResNet_COCO [?] 79.392.445.194.665.275.895.189.192.339.085.770.488.689.488.686.665.886.257.485.777.318-Jul-2016
CASIA_SegResNet_CRF_COCO [?] 79.393.842.293.168.675.395.388.892.536.584.364.286.887.887.588.569.289.764.186.874.603-Jun-2016
** hrnet_baseline ** [?] 79.393.843.584.863.982.492.891.093.845.688.061.490.090.288.088.166.891.153.387.174.426-Jan-2020
Adelaide_VeryDeep_FCN_VOC [?] 79.191.948.193.469.375.594.287.592.836.786.965.289.190.286.587.264.690.159.785.572.713-May-2016
** EfficientNet_MSCID_Segmentation ** [?] 78.992.142.191.673.880.793.888.191.638.784.368.590.388.786.384.864.787.358.685.371.415-Aug-2019
BlitzNet512 [?] 78.892.442.778.867.577.095.288.590.139.185.573.285.589.688.587.367.885.962.988.874.519-Jul-2017
LRR_4x_COCO [?] 78.793.244.289.465.474.993.987.092.042.983.768.986.588.089.087.267.385.664.084.171.516-Jun-2016
** weak_semi_seg ** [?] 78.692.262.090.064.877.193.384.891.431.489.173.388.087.786.184.565.485.456.985.167.803-Jul-2021
Ladder_DenseNet [?] 78.390.368.789.060.871.991.085.591.734.781.968.286.786.687.185.966.589.259.878.674.225-Jul-2017
CASIA_IVA_OASeg [?] 78.393.841.989.467.571.594.685.389.538.188.464.887.090.584.983.367.586.968.183.474.021-May-2016
Oxford_TVG_HO_CRF [?] 77.992.559.190.370.674.492.484.188.336.885.667.185.186.988.282.662.685.056.381.972.516-Mar-2016
Adelaide_Context_CNN_CRF_COCO [?] 77.892.939.684.067.975.392.783.890.144.385.564.987.388.884.585.568.189.062.881.271.406-Nov-2015
CUHK_DPN_COCO [?] 77.589.061.687.766.874.791.284.387.636.586.366.184.487.885.685.463.687.361.379.466.422-Sep-2015
Adelaide_Context_CNN_CRF_COCO [?] 77.292.338.882.966.175.192.483.188.641.885.962.886.788.484.085.467.488.861.981.971.713-Aug-2015
DeepLab-CRF-Attention-DT [?] 76.393.241.788.061.774.992.984.590.433.082.863.284.585.087.285.760.587.757.884.368.203-Feb-2016
CentraleSuperBoundaries++ [?] 76.091.138.590.968.774.289.985.389.134.482.565.683.182.985.785.460.684.559.980.269.913-Jan-2016
LRR_4x_de_pyramid_VOC [?] 75.991.841.083.062.374.393.086.888.736.681.863.484.785.985.183.162.084.655.684.970.007-Jun-2016
DeepLab-CRF-Attention [?] 75.791.140.986.962.174.292.384.490.134.081.766.083.583.986.584.659.187.259.681.066.203-Feb-2016
Curtin_Qilin [?] 75.685.438.586.563.874.891.386.888.333.584.162.483.687.784.983.561.488.558.080.869.009-Mar-2018
BlitzNet [?] 75.690.138.787.568.670.193.186.489.232.381.767.982.282.984.781.563.385.555.583.170.617-Mar-2017
BlitzNet300 [?] 75.591.540.482.664.571.793.385.284.941.879.170.679.382.786.684.255.381.060.185.671.619-Jul-2017
Adelaide_Context_CNN_CRF_VOC [?] 75.390.637.680.067.874.492.085.286.239.181.258.983.883.984.384.862.183.258.280.872.330-Aug-2015
MSRA_BoxSup [?] 75.289.838.089.268.968.089.683.087.734.483.667.181.583.785.283.558.684.955.881.270.718-May-2015
FSSI300 [?] 75.191.142.689.166.469.292.588.586.833.279.263.282.481.486.982.158.183.253.083.171.521-Jun-2018
POSTECH_DeconvNet_CRF_VOC [?] 74.890.040.884.267.370.790.984.887.434.883.058.782.387.186.982.464.584.654.977.564.118-Aug-2015
MERL_UMD_Deep_GCRF_COCO [?] 74.889.942.690.065.069.289.983.988.231.381.866.482.981.185.783.458.488.456.777.764.315-Jan-2016
Oxford_TVG_CRF_RNN_COCO [?] 74.790.455.388.768.469.888.382.485.132.678.564.479.681.986.481.858.682.453.577.470.122-Apr-2015
UNIST_GDN_CRF_ENS [?] 74.088.648.688.864.770.487.281.886.432.077.164.180.578.084.083.359.285.956.877.965.029-Jul-2016
** fdsf ** [?] 73.990.139.985.760.870.687.486.689.632.277.658.085.884.882.982.858.587.347.684.066.822-Nov-2018
DeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint [?] 73.989.246.788.563.568.487.081.286.332.680.762.481.081.384.382.156.284.658.376.267.226-Apr-2015
BlitzNet [?] 73.991.440.476.462.674.891.186.285.235.683.159.077.984.684.180.657.286.556.178.867.417-Mar-2017
MERL_DEEP_GCRF [?] 73.285.243.983.365.268.389.082.785.331.179.563.380.579.385.581.060.585.552.077.365.117-Oct-2015
UNIST_GDN_CRF [?] 73.287.937.888.864.570.787.781.387.132.576.766.680.376.682.282.357.984.555.978.564.229-Jul-2016
Bayesian Dilation Network [?] 73.188.639.086.263.367.188.181.986.834.781.157.181.386.583.483.453.784.053.380.562.507-Jun-2016
DeepLab-CRF-COCO-LargeFOV [?] 72.789.138.388.163.369.787.183.185.029.376.556.579.877.985.882.457.484.354.980.564.118-Mar-2015
POSTECH_EDeconvNet_CRF_VOC [?] 72.589.939.379.763.968.287.481.286.128.577.062.079.080.383.680.258.883.454.380.765.022-Apr-2015
Dual-Multi-Reso-MR [?] 72.487.640.380.662.971.388.184.484.729.677.858.580.081.085.482.155.083.848.280.365.303-Nov-2016
CCBM [?] 72.387.846.779.063.670.583.775.586.931.081.961.381.585.981.176.558.777.750.476.669.829-Nov-2015
Oxford_TVG_CRF_RNN_VOC [?] 72.087.539.079.764.268.387.680.884.430.478.260.480.577.883.180.659.582.847.878.367.122-Apr-2015
** AGV BANA RES NAL ** [?] 71.781.636.686.258.776.878.682.087.334.479.363.882.679.778.579.856.584.555.370.760.331-Jan-2022
DeepLab-MSc-CRF-LargeFOV [?] 71.684.454.581.563.665.985.179.183.430.774.159.879.076.183.280.859.782.250.473.163.702-Apr-2015
** resnet38_deeplab ** [?] 71.489.137.384.656.468.290.883.789.028.484.747.084.787.180.277.149.387.049.875.656.206-Nov-2021
MSRA_BoxSup [?] 71.086.435.579.765.265.284.378.583.730.576.262.679.376.182.181.357.078.255.072.568.110-Feb-2015
** FCN16s-Resnet101 ** [?] 71.083.949.379.156.670.487.582.784.927.074.153.679.976.781.981.755.376.950.879.066.626-Jan-2019
** DFPnet ** [?] 71.088.437.683.352.775.889.185.889.331.665.933.783.575.382.382.860.575.952.680.570.526-Aug-2018
FCN_CLC_MSP [?] 70.886.240.183.957.864.787.981.385.928.380.061.980.782.579.780.254.781.339.378.959.201-Jul-2016
DeepLab-CRF-COCO-Strong [?] 70.485.336.284.861.267.584.681.481.030.873.853.877.576.582.381.656.378.952.376.663.311-Feb-2015
DeepLab-CRF-LargeFOV [?] 70.383.536.682.562.366.585.478.583.730.472.960.478.575.582.179.758.282.048.873.763.328-Mar-2015
DeepSqueeNet_CRF [?] 70.185.737.483.459.767.885.279.881.427.972.360.476.578.282.778.857.378.649.077.661.021-Jul-2016
TTI_zoomout_v2 [?] 69.685.637.383.262.566.085.180.784.927.273.257.578.179.281.177.153.674.049.271.763.330-Mar-2015
RRF-4s [?] 69.479.557.378.761.864.183.978.180.430.073.059.474.373.980.877.953.976.446.171.763.930-Nov-2016
Score Map Pyramid Net [?] 69.380.938.579.058.568.683.280.085.731.066.156.276.271.081.181.654.974.649.475.968.906-Jul-2018
FCN-2s_Dilated_VGG19 [?] 69.081.837.079.557.267.583.879.383.028.574.557.576.075.979.578.657.077.845.373.763.211-Jul-2017
VGG19_FCN [?] 68.181.735.979.857.566.984.179.680.828.272.153.374.072.178.578.255.576.743.473.865.106-Apr-2017
** ESPNetv2 ** [?] 68.087.536.975.964.063.887.273.776.526.770.357.568.970.682.978.948.176.446.977.764.123-Mar-2019
FCN-2s_Dilated_VGG16 [?] 67.681.135.778.058.563.982.879.781.427.871.253.675.174.879.277.855.374.545.572.760.020-Jul-2017
FCN-8s-heavy [?] 67.282.436.175.661.565.483.477.280.127.966.851.573.671.978.977.155.373.444.374.063.206-Feb-2016
DeepLab-CRF-MSc [?] 67.180.436.877.455.266.481.577.578.927.168.252.774.369.679.479.056.978.845.272.759.330-Dec-2014
DeepLab-CRF [?] 66.478.433.178.255.665.381.375.578.625.369.252.775.269.079.177.654.778.345.173.356.223-Dec-2014
DeepSqueeNet [?] 65.776.134.376.456.062.082.775.478.325.664.358.873.369.379.376.753.272.146.269.359.120-Jul-2016
** AGV BANA VGG NAL attempt 5 ** [?] 65.677.131.672.154.863.882.876.082.026.665.058.575.564.275.870.954.076.444.674.960.130-Jan-2022
Bayesian FCN [?] 65.480.834.975.257.064.180.977.278.026.465.644.072.670.878.776.852.471.040.473.861.807-Jun-2016
Weak_manifold_CNN [?] 65.380.932.973.257.763.083.973.576.627.065.952.670.969.873.074.953.370.145.472.462.711-Nov-2016
CRF_RNN [?] 65.280.934.072.952.662.579.876.379.923.667.751.874.869.976.976.949.074.742.772.159.610-Feb-2015
** deeplabv3+ resnet50 ** [?] 65.277.933.486.119.663.884.174.990.127.981.248.385.585.881.869.647.884.544.741.253.911-Dec-2018
** deeplabv3+ resnet50 ** [?] 64.678.732.979.719.567.888.075.589.624.780.646.185.183.883.165.548.183.744.041.352.811-Dec-2018
UNIST_GDN_FCN_FC [?] 64.475.631.569.251.662.978.876.778.724.661.760.374.562.676.174.351.570.647.374.058.427-Jul-2016
TTI_zoomout_16 [?] 64.481.935.178.257.456.580.574.079.822.469.653.774.076.076.668.844.370.240.268.955.324-Nov-2014
** deeplabv3+ vgg16 ** [?] 64.385.032.183.519.463.888.773.788.524.476.949.582.379.882.266.056.381.444.646.639.812-Dec-2018
** deeplabv3+ vgg16 ** [?] 63.984.631.278.819.064.187.974.387.724.777.549.683.381.882.466.254.180.144.644.039.712-Dec-2018
Hypercolumn [?] 62.668.733.569.851.370.281.171.974.923.960.646.972.168.374.572.952.664.445.464.957.409-Apr-2015
UNIST_GDN_FCN [?] 62.274.531.966.749.760.576.975.976.022.957.654.573.059.475.073.751.067.543.370.056.427-Jul-2016
FCN-8s [?] 62.276.834.268.949.460.375.374.777.621.462.546.871.863.976.573.945.272.437.470.955.112-Nov-2014
MSRA_CFM [?] 61.875.726.769.548.865.681.069.273.330.068.751.569.168.171.767.550.466.544.458.953.517-Dec-2014
** SegNet ** [?] 59.973.637.662.046.858.679.170.165.423.660.445.661.863.575.374.942.663.742.567.852.710-Nov-2015
TTI_zoomout [?] 58.470.331.968.346.452.175.368.475.319.258.449.969.663.070.167.641.564.034.964.247.317-Nov-2014
SDS [?] 51.663.325.763.039.859.270.961.454.916.845.048.250.551.057.763.331.858.731.255.748.521-Jul-2014
NUS_UDS [?] 50.067.024.547.245.047.965.360.658.515.550.837.445.859.962.052.740.848.236.853.145.629-Oct-2014
TTIC-divmbest-rerank [?] 48.162.725.646.943.054.858.458.655.614.647.531.244.751.060.953.536.650.930.150.246.815-Nov-2012
BONN_O2PCPMC_FGT_SEGM [?] 47.864.027.354.139.248.756.657.752.514.254.829.642.258.054.850.236.658.631.648.438.608-Aug-2013
BONN_O2PCPMC_FGT_SEGM [?] 47.563.427.356.137.747.257.959.355.011.550.830.545.058.457.448.634.653.332.447.639.223-Sep-2012
BONNGC_O2P_CPMC_CSI [?] 46.863.626.845.641.747.154.358.655.114.549.030.946.152.658.253.432.044.534.645.343.123-Sep-2012
BONN_CMBR_O2P_CPMC_LIN [?] 46.763.923.844.640.345.559.658.757.111.745.934.943.054.958.051.534.644.129.950.544.523-Sep-2012
FER_WSSS_REGION_SCORE_POOL [?] 38.033.121.727.717.738.455.838.357.913.637.429.243.939.152.444.430.248.726.431.836.314-Jun-2016
Metu_Unified_Net [?] ---------------87.8-----10-Mar-2018

Abbreviations

TitleMethodAffiliationContributorsDescriptionDate
DeepLabv3+ with Fillin fusionA new feature fusion method: FillIn Beijing University of TechnologyTian Liu Lichun Wang Shaofan Wanghttps://arxiv.org/abs/1912.08059 The new version of our paper is not update yet. The feature fusion is actually privilege operation: Only use in training.2020-05-25 18:35:34
Adaptive Affinity Fields for Semantic SegmentationAAF_PSPNetUC Berkeley / ICSITsung-Wei Ke*, Jyh-Jing Hwang*, Ziwei Liu, Stella X. Yu (* equal contribution)Existing semantic segmentation methods mostly rely on per-pixel supervision, unable to capture structural regularity present in natural images. Instead of learning to enforce semantic labels on individual pixels, we propose to enforce affinity field patterns in individual pixel neighbourhoods, i.e., the semantic label patterns of whether neighbouring pixels are in the same segment should match between the prediction and the ground-truth. The affinity fields characterize geometric relationships within the image, such as "motorcycles have round wheels". We further develop a novel method for learning the optimal neighbourhood size for each semantic category, with an adversarial loss that optimizes over worst-case scenarios. Unlike the common Conditional Random Field (CRF) approaches, our adaptive affinity field (AAF) method has no extra parameters during inference, and is less sensitive to appearance changes in the image.2018-08-21 16:28:38
AGV BANA RES NALAGV BANA RES NALAGV BANA RES NALAGV BANA RES NALAGV BANA RES NAL2022-01-31 04:20:30
AGV BANA VGG NAL attempt 5AGV BANA VGG NAL attempt 5AGV BANA VGG NAL attempt 5AGV BANA VGG NAL attempt 5AGV BANA VGG NAL attempt 52022-01-30 16:24:19
Adaptive Progressive Decision NetworkAPDNUESTCHengcan Shi, Hongliang Li, Qingbo WuAdaptive Progressive Decision Network2019-05-28 08:03:53
Adelaide_Context_CNN_CRF_COCOAdelaide_Context_CNN_CRF_COCOThe University of Adelaide; ACRV; D2DCRCGuosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;Please refer to our technical report: http://arxiv.org/abs/1504.01013. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both CNNs and CRFs. 2015-11-06 07:46:13
Adelaide_Context_CNN_CRF_COCOAdelaide_Context_CNN_CRF_COCOThe University of Adelaide; ACRV; D2DCRCGuosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;Please refer to our technical report: Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation (available at: http://arxiv.org/abs/1504.01013). This technical report will be updated later. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both DCNNs and CRFs. Specifically, we train CRFs whose potential functions are modelled by fully convolutional neural networks (FCNNs). The resulted deep conditional random fields (DCRFs) are thus able to learn complex feature representations; and during the course of learning, dependencies between the output variables are taken into account. As in conventional DCNNs, the training of our model is performed in an end-to-end fashion using back-propagation. Different from directly maximizing likelihood, however, inference may be needed at each gradient descent iteration, which can be computationally very expensive since typically millions of iterations are required. To enable efficient training, we propose to use approximate training, namely, piecewise training of CRFs, avoiding repeated inference. 2015-08-13 04:13:59
Adelaide_Context_CNN_CRF_VOCAdelaide_Context_CNN_CRF_VOCThe University of Adelaide; ACRV; D2DCRCGuosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;Please refer to our technical report: Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation (available at: http://arxiv.org/abs/1504.01013). This technical report will be updated later. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both DCNNs and CRFs. Specifically, we train CRFs whose potential functions are modelled by fully convolutional neural networks (FCNNs). The resulted deep conditional random fields (DCRFs) are thus able to learn complex feature representations; and during the course of learning, dependencies between the output variables are taken into account. As in conventional DCNNs, the training of our model is performed in an end-to-end fashion using back-propagation. Different from directly maximizing likelihood, however, inference may be needed at each gradient descent iteration, which can be computationally very expensive since typically millions of iterations are required. To enable efficient training, we propose to use approximate training, namely, piecewise training of CRFs, avoiding repeated inference. 2015-08-30 11:49:27
High-performance Very Deep FCNAdelaide_VeryDeep_FCN_VOCThe University of Adelaide; D2DCRCZifeng Wu, Chunhua Shen, Anton van den HengelWe propose a method for high-performance semantic image segmentation based on very deep fully convolutional networks. A few design factors are carefully examined to achieve the result. Details can be found in the paper "High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks”, Zifeng Wu, Chunhua Shen, Anton van den Hengel”: http://arxiv.org/abs/1604.04339. Note that the system used for this submission was trained on the augmented VOC 2012 data ONLY. 2016-05-13 04:57:00
Auto-DeepLab-LAuto-DeepLab-LJohns Hopkins University; Google Inc.; Stanford UniversityChenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-FeiIn this work, we study Neural Architecture Search for semantic image segmentation, an important computer vision task that assigns a semantic label to every pixel in an image. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Without any ImageNet pretraining, our architecture searched specifically for semantic image segmentation attains state-of-the-art performance. Please refer to https://arxiv.org/abs/1901.02985 for details.2019-01-11 19:43:31
O2P Regressor + Composite Statistical InferenceBONNGC_O2P_CPMC_CSI(1) University of Bonn, (2) Georgia Institute of Technology, (3) University of CoimbraJoao Carreira (1,3) Fuxin Li (2) Guy Lebanon (2) Cristian Sminchisescu (1)We utilize a novel probabilistic inference procedure (unpublished yet), Composite Statisitcal Inference (CSI), on semantic segmentation using predictions on overlapping figure-ground hypotheses. Regressor predictions on segment overlaps to the ground truth object are modelled as generated by the true overlap with the ground truth segment plus noise. A model of ground truth overlap is defined by parametrizing on the unknown percentage of each superpixel that belongs to the unknown ground truth. A joint optimization on all the superpixels and all the categories is then performed in order to maximize the likelihood of the SVR predictions. The optimization has a tight convex relaxation so solutions can be expected to be close to the global optimum. A fast and optimal search algorithm is then applied to retrieve each object. CSI takes the intuition from the SVRSEGM inference algorithm that multiple predictions on similar segments can be combined to better consolidate the segment mask. But it fully develops the idea by constructing a probabilistic framework and performing composite MLE jointly on all segments and categories. Therefore it is able to consolidate better object boundaries and handle hard cases when objects interact closely and heavily occlude each other. For each image, we use 150 overlapping figure-ground hypotheses generated by the CPMC algorithm (Carreira and Sminchisescu, PAMI 2012), and linear SVR predictions on them with the novel second order O2P features (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONN_CMBR_O2P_CPMC_LIN) as the input to the inference algorithm.2012-09-23 23:49:02
Linear SVR with second-order pooling.BONN_CMBR_O2P_CPMC_LIN(1) University of Bonn, (2) University of CoimbraJoao Carreira (2,1) Rui Caseiro (2) Jorge Batista (2) Cristian Sminchisescu (1)We present a novel effective local feature aggregation method that we use in conjunction with an existing figure-ground segmentation sampling mechanism. This submission is described in detail in [1]. We sample multiple figure-ground segmentation candidates per image using the Constrained Parametric Min-Cuts (CPMC) algorithm. SIFT, masked SIFT and LBP features are extracted on the whole image, then pooled over each object segmentation candidate to generate global region descriptors. We employ a novel second-order pooling procedure, O2P, with two non-linearities: a tangent space mapping and power normalization. The global region descriptors are passed through linear regressors for each category, then labeled segments in each image having scores above some threshold are pasted onto the image in the order of these scores. Learning is performed using an epsilon-insensitive loss function on overlap with ground truth, similar to [2], but within a linear formulation (using LIBLINEAR). comp6: learning uses all images in the segmentation+detection trainval sets, and external ground truth annotations provided by courtesy of the Berkeley vision group. comp5: one model is trained for each category using the available ground truth segmentations from the 2012 trainval set. Then, on each image having no associated ground truth segmentations, the learned models are used together with bounding box constraints, low-level cues and region competition to generate predicted object segmentations inside all bounding boxes. Afterwards, learning proceeds similarly to the fully annotated case. 1. “Semantic Segmentation with Second-Order Pooling”, Carreira, Caseiro, Batista, Sminchisescu. ECCV 2012. 2. "Object Recognition by Ranking Figure-Ground Hypotheses", Li, Carreira, Sminchisescu. CVPR 2010.2012-09-23 19:11:47
BONN_O2PCPMC_FGT_SEGMBONN_O2PCPMC_FGT_SEGM(1) Universitfy of Bonn, (2) University of Coimbra, (3) Georgia Institute of Technology, (4) Vienna University of TechnologyJoao Carreira(1,2), Adrian Ion(4), Fuxin Li(3), Cristian Sminchisescu(1)We present a joint image segmentation and labeling model which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales using CPMC (Carreira and Sminchisescu, PAMI 2012), constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag (Ion, Carreira, Sminchisescu, ICCV2011), followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure (Ion, Carreira, Sminchisescu, NIPS2011). As meta-features we combine outputs from linear SVRs using novel second order O2P features to predict the overlap between segments and ground-truth objects of each class (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONNCMBR_O2PCPMC_LINEAR), bounding box object detectors, and kernel SVR outputs trained to predict the overlap between segments and ground-truth objects of each class (Carreira, Li, Sminchisescu, IJCV 2012). comp6: the O2P SVR learning uses all images in the segmentation+detection trainval sets, and external ground truth annotations provided by courtesy of the Berkeley vision group.2012-09-23 21:39:35
BONN_O2PCPMC_FGT_SEGMBONN_O2PCPMC_FGT_SEGM(1) Universitfy of Bonn, (2) University of Coimbra, (3) Georgia Institute of Technology, (4) Vienna University of TechnologyJoao Carreira(1,2), Adrian Ion(4), Fuxin Li(3), Cristian Sminchisescu(1) Same as before, except tilings non-maximal2013-08-08 05:54:53
Bayesian Dilation NetworkBayesian Dilation NetworkUniversity of CambridgeAlex Kendallhttp://arxiv.org/abs/1511.026802016-06-07 08:28:00
Bayesian FCNBayesian FCNUniversity of CambridgeAlex Kendallhttp://mi.eng.cam.ac.uk/projects/segnet/2016-06-07 08:36:38
Fully conv net for segmentation and detectionBlitzNetInriaNikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia SchmidCNN for joint segmentation and detection (based on SSD). Input resolution 300. Trained on VOC07 trainval + VOC12 trainval. 2017-03-17 18:24:29
Fully conv net for segmentation and detectionBlitzNetInriaNikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia SchmidCNN for joint segmentation and detection (based on SSD). Input resolution 512. Trained on VOC07 trainval + VOC12 trainval.2017-03-17 18:22:43
FCNBlitzNet300INRIANikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia SchmidCNN for joint segmentation and detection (based on SSD). Input resolution 300. Operates with speed 24 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO.2017-07-19 13:57:45
FCNBlitzNet512INRIANikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia SchmidCNN for joint segmentation and detection (based on SSD). Input resolution 512. Operates with speed 19 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO.2017-07-19 13:38:53
Objectness-aware Semantic SegmentationCASIA_IVA_OASegInstitute of Automation, Chinese Academy of SciencesYuhang Wang, Jing Liu, Yong Li, Jun Fu, Hang Song, Hanqing LuWe propose an objectness-aware semantic segmentation framework (OA-Seg) consisting of two deep networks. One is a lightweight deconvolutional neural network (Light-DCNN) which obviously decreases model size and convergence time with impressive segmentation performance. The other one is an object proposal network (OPN) used to roughly locate object regions. MSCOCO is used to extend training data and CRF is used as post-processing.2016-05-21 01:52:15
CASIA_IVA_SDNCASIA_IVA_SDNNational Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of SciencesJun Fu, Jing Liu, Yuhang Wang, Zhenwei Shen, Zhiwei Fang, Hanqing LuWe propose a Stacked Deconvolutional Network (SDN) for semantic segmentation. We stack multiple SDN units to make network deeper and meanwhile, dense connections and hierarchical supervision are adopted to promote network optimization. CRF is not employed!2017-07-29 06:00:31
CASIA_SegResNet_CRF_COCOCASIA_SegResNet_CRF_COCOInstitude of Automation, Chinese Academy of SciencesXinze Chen, Guangliang Cheng, Yinghao CaiWe propose a novel semantic segmentation method, which consists of three parts: a SAR-based data augmentation method, a deeper residual network including three effective techniques and an online hard pixels mining. We combine these three parts to train an end-to-end network. 2016-06-03 09:20:50
CCBMCCBMUniversity of TsinghuaQiurui Wang, Chun Yuan, Zhihui Lin, Zhicheng Wang, Xin QiuWe propose a method combined with convolutional neural network and Conditional Boltzmann Machines for object segmentation, called CCBM, which further utilizes human visual border detection method. We use CNNs to extract features and segment them by improved Conditional Boltzmann Machines. We also use Structured Random Forests based method to detect object border for a better effert. Finally, each superpixel is labelled as output. The proposed method for this submission was trained on VOC 2012 Segmentation training data and a subset of COCO 2014 training data.2015-11-29 07:26:11
Co-occurrent Features in Semantic SegmentationCFNetAmazonHang Zhang, Han Zhang, Chenguang Wang, Junyuan XieRecent work has achieved great success in utilizing global contextual information for semantic segmentation, including increasing the receptive field and aggregating pyramid feature representations. In this paper, we go beyond global context and explore the fine-grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts the distribution of co-occurrent features for a given target. To leverage the semantic context in the co-occurrent features, we build an Aggregated Co-occurrent Feature (ACF) Module by aggregating the probability of the co-occurrent feature within the co-occurrent context. ACF Module learns a fine-grained spatial invariant representation to capture co-occurrent context information across the scene. Our approach significantly improves the segmentation results using FCN and achieves superior performance 54.0% mIoU on Pascal Context, 87.2% mIoU on Pascal VOC 2012 and 44.89% mIoU on ADE20K datasets with ResNet-101 base network.2019-06-12 03:49:01
CMT-FCN-ResNet-CRFCMT-FCN-ResNet-CRFIntel labs China and Tsinghua UniversityLibin Wang, Anbang, Yao, Jianguo Li, Yurong Chen, Li Zhang?We propose a novel coupled multi-task FCN. Both VOC 2012 and COCO dataset are used for training, and CRF is applied as post-processing step.2016-08-02 09:57:05
CRF as RNNCRF_RNNUniversity of OxfordShuai Zheng; Sadeep Jayasumana; Bernardino Romera-Paredes; Philip TorrWe introduce a new form of convolutional neural network, called CRF-RNN, which expresses a conditional random field (CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. See the paper: "Conditional Random Fields as Recurrent Neural Networks".2015-02-10 11:03:16
CTNetCTNetNanjing University Of Science And TechnologyCTNetCTNet2020-10-29 01:38:27
Deep Parsing NetworkCUHK_DPN_COCOThe Chinese University of Hong KongZiwei Liu*, Xiaoxiao Li*, Ping Luo, Chen Change Loy, Xiaoou TangThis work addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN architecture to model unary terms and additional layers are carefully devised to approximate the mean field algorithm (MF) for pairwise terms. It has several appealing properties. First, different from the recent works that combined CNN and MRF, where many iterations of MF were required for each training image during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing works as its special cases. Third, DPN makes MF easier to be parallelized and speeded up in Graphical Processing Unit (GPU). The system used for this submission was trained on augmented VOC 2012 and MS-COCO 2014 training set. Please refer to the paper "Semantic Image Segmentation via Deep Parsing Network" (http://arxiv.org/abs/1509.02634) for further information. 2015-09-22 16:52:27
Learning to Predict CaC for semantic segmentationCaCNetCUHKJianbo Liu, Junjun He, Jimmy S. Ren, Yu Qiao, Hongsheng LiLong-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods demonstrate that using global context for re-weighting feature channels can effectively improve the accuracy of semantic segmentation. However, the globally-sharing feature re-weighting vector might not be optimal for regions of different classes in the input image. In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps. In CaC-Net, a set of context-adaptive convolution kernels are predicted from the global contextual information in a parameter-efficient manner. When used for convolution with the semantic feature maps, the predicted convolutional kernels can generate the spatially-varying feature weighting factors capturing both global and local contextual information. Comprehensive experimental results show that our CaC-Net achieves superior segmentation performance on three public datasets, PASCAL Context, PASCAL VOC 2012 and ADE20K.2020-05-29 05:19:26
Deep G-CRF (QO) combined with Deeplab-v2CentraleSupelec Deep G-CRFCentraleSupelec / INRIASiddhartha Chandra & Iasonas KokkinosWe employ the deep Gaussian CRF Quadratic Optimization formulation to learn pairwise terms for semantic segmentation using the Deeplab-v2-resnet-101 network. Additionally, we use the dense-CRF post-processing to refine object boundaries. This work is an accepted paper at ECCV 2016 and will be presented at the conference. Please refer to our arXiv report here: http://arxiv.org/abs/1603.08358 We will update the report with more details soon. 2016-08-12 11:21:28
"Super-Human" boundaries combined with DeeplabCentraleSuperBoundaries++CentraleSupelec / INRIAIasonas KokkinosWe exploit our "super-human" boundary detector with a multi-resolution variant of the Deeplab system (LargeFOV, pre-trained on MSCOCO). The boundary information comes in the form of Normalized Cut eigenvectors used in DenseCRF inference and boundary-dependent pairwise terms, used in Graph-Cut inference. This is an updated version of our earlier submission, using more training rounds and a single-shot training algorithm. Details on the system and our "super human" boundary detector are provided in http://arxiv.org/abs/1511.073862016-01-13 16:00:02
modified deeplabCurtin_QilinCurtin UniversityQilin lia modified version of deeplab-resnet1012018-03-09 03:59:28
Dense Context-Aware Network for Semantic SegmentatDCANetInstitution of Information Science and Electrical Engineering, Zhejiang UniversityYifu Liu Chenfeng Xu Zhihong Chen Chao ChenIn contrast to some previous works utilizing the multi-scale context fusion, we propose a novel module, named Dense Context-Aware (DCA) module, to adaptively integrate local detail information with global dependencies through a more efficient way. Driven by the contextual relationship, the DCA module can effectively complete the aggregation of multi-scale information to generate more powerful features. Meanwhile, the proposed DCA module is easy to apply and can be flexibility adjusted inside the existing deep networks. To further capture the long-range contextual information, we specially design two extended structures based on the DCA modules. By taking a progressive mannner under different scales, our networks can make use of context information to improve feature representations for robust segmentation. Due to privacy concerns, we will make the paper and code publicly available at https://github.com/YifuLiuL/DCANet.2020-01-13 08:36:04
Discriminative Feature NetworkDFNHUSTChangqian YuWe design a discriminative feature network for semantic segmentation.2018-01-15 04:32:54
DFPnet for real-time semantic segmentationDFPnetDalian Maritime UniversityShuhao MaDeep Feature Pyramid net(DFPnet) is the first model that can apply image pyramid technology to real-time semantic segmentation. DFPnet is a flexible model which can be applied to image segmentation, target detection, image classification tasks, and can make corresponding adjustments for different data, facing the network can change different structures, in short, DFPnet adopts open thinking.2018-08-26 12:09:50
Deep Dual Learning for Semantic Image SegmentationDISSun Yat-Sen University, The Chinese University of Hong KongPing Luo*, Guangrun Wang*, Liang Lin, Xiaogang WangWe present a novel learning setting, which consists of two complementary learning problems that are jointly solved. One predicts labelmaps and tags from images, and the other reconstructs the images using the predicted labelmaps. Given an image with tags only, its labelmap can be inferred by leveraging the images and tags as constraints. The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance. DIS is able to clean tags that have noises.2017-09-13 18:25:17
Dual-path Class-aware Attention NetworkDP-CANTianjin UniversityHailong ZhuOur proposed dual-path class-aware attention network exploit category-level context-free attention mechanism for semantic segmentation. This model is trained with pascal voc 2012 train_aug and finetuned on trainval. Multi-scale inputs and flipping are used in testing. 2019-01-25 12:36:41
Dual-path Class-aware Attention Network DP-CAN_decoderTianjin UniversityHailong Zhu Dual-path Class-aware Attention Network with dual-path refinement module as decoder. 2019-01-26 15:07:22
DP_ResNet_CRFDP_ResNet_CRF(1) Beijing University of Posts and Telecommunications (BUPT); (2) Beijing Moshanghua Tech (DressPlus)Lu Yang(1, 2), Qing Song(1), Bin Liu(2), Yuhang He(2), Zuoxin Li(2), Xiongwei Xia(2)Our network is based on ResNet-152, dilation convolution \ data augmentation \ pre-train on coco \ multi scale test are used for this submission. We also use densecrf as post-processing to refine object boundaries.2016-11-10 12:05:10
Dynamic routing encoding networkDRENHuazhong University of Science and TechnologyZhaoyangHuOn the basis of FCN network, we add dynamic routing to classify the context and add the context to help the network recognise.2019-03-29 02:04:11
Deep Layer Cascade (LC)Deep Layer Cascade (LC)The Chinese University of Hong KongXiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou TangWe propose a novel deep layer cascade (LC) method to improve the accuracy and speed of semantic segmentation. Unlike the conventional model cascade (MC) that is composed of multiple independent models, LC treats a single deep model as a cascade of several sub-models. Earlier sub-models are trained to handle easy and confident regions, and they progressively feed-forward harder regions to the next sub-model for processing. Convolutions are only calculated on these regions to reduce computations. The proposed method possesses several advantages. First, LC classifies most of the easy regions in the shallow stage and makes deeper stage focuses on a few hard regions. Such an adaptive and 'difficulty-aware' learning improves segmentation performance. Second, LC accelerates both training and testing of deep network thanks to early decisions in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable framework, allowing joint learning of all sub-models. We evaluate our method on PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and fast speed. Please refer to the paper "Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade" (https://arxiv.org/abs/1704.01344) for further information. 2017-04-06 14:46:45
DeepLab-CRFDeepLab-CRF(1) UCLA (2) Google (3) TTIC (4) ECP / INRIALiang-Chieh Chen (1) and George Papandreou (2,3) and Iasonas Kokkinos (4) and Kevin Murphy (2) and Alan L. Yuille (1)This work brings together methods from Deep Convolutional Neural Networks (DCNNs) and probabilistic graphical models for the task of semantic image segmentation. We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Efficient computation is achieved by (i) careful network re-purposing and (ii) a novel application of the ’hole’ algorithm from the wavelet community, allowing dense computation of neural net responses at 8 frames per second on a modern GPU. See http://arxiv.org/abs/1412.7062 for further information.2014-12-23 02:29:44
DeepLab-CRF-AttentionDeepLab-CRF-Attention(1) UCLA (2) BaiduLiang-Chieh Chen (1) and Yi Yang (2) and Jiang Wang (2) and Wei Xu (2) and Alan L. Yuille (1)This work is the extension of DeepLab-CRF-COCO-LargeFOV (pretrained on MS-COCO) by further incorporating (1) multi-scale inputs (2) extra supervision and (3) attention model. Further information will be provided in an *updated* version of http://arxiv.org/abs/1511.03339.2016-02-03 23:10:45
DeepLab-CRF-Attention-DTDeepLab-CRF-Attention-DT(1) UCLA (2) GoogleLiang-Chieh Chen (1) and Jonathan T. Barron (2) and George Papandreou (2) and Kevin Murphy (2) and Alan L. Yuille (1)This work is the extension of DeepLab-CRF-Attention by further incorporating a discriminatively trained Domain Transform. Further information will be provided in an *updated* version of http://arxiv.org/abs/1511.03328.2016-02-03 23:13:01
DeepLab-CRF-COCO-LargeFOVDeepLab-CRF-COCO-LargeFOV(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) Similar to DeepLab-CRF-COCO-Strong, but the network has a larger field-of-view on the image. Further information will be provided in an updated version of http://arxiv.org/abs/1502.02734.2015-03-18 04:09:39
DeepLab-CRF-COCO-StrongDeepLab-CRF-COCO-Strong(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) Similar to DeepLab-CRF, but network training also included the pixel-level semantic segmentation annotations of the MS-COCO (v. 2014) dataset. See http://arxiv.org/abs/1502.02734 for further information.2015-02-11 01:44:22
DeepLab-CRF-LargeFOVDeepLab-CRF-LargeFOV(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) Similar to DeepLab-CRF, but the network has a larger field-of-view on the image. Note that the model has NOT been fine-tuned on MS-COCO dataset. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062.2015-03-28 17:22:26
DeepLab-CRF-MScDeepLab-CRF-MSc(1) UCLA (2) Google (3) TTIC (4) ECP / INRIALiang-Chieh Chen (1) and George Papandreou (2,3) and Iasonas Kokkinos (4) and Kevin Murphy (2) and Alan L. Yuille (1)Similar to DeepLab-CRF, except that multiscale features (direct connections from intermediate layers to the classifier) are also exploited. Specifically, we attach to the input image and each of the first four max pooling layers a two-layer MLP (first layer: 128 3x3 convolutional filters, second layer: 128 1x1 convolutional filters) whose score map is concatenated to the VGG final layer score map. The final score map fed into the softmax layer thus consists of 4,096 + 5 * 128 = 4,736 channels.2014-12-30 02:52:40
DeepLab-MSc-CRF-LargeFOVDeepLab-MSc-CRF-LargeFOV(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) Similar to DeepLab-MSc-CRF, but the network has a larger field-of-view on the image. Note that the model has NOT been fine-tuned on MS-COCO dataset. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062.2015-04-02 06:57:21
DeepLab-MSc-CRF-LargeFOV-COCO-CrossJointDeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint(1) Google (2) UCLAGeorge Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2)Similar to Deeplab-CRF model, but with feature extraction at multiple network levels and large field of view. We jointly train DeepLab on Pascal VOC 2012 and MS-COCO, sharing the top-level network weights for the common classes, using pixel-level annotation in both datasets. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062 and http://arxiv.org/abs/1502.02734.2015-04-26 17:48:09
DeepLab_XIDeepLab_XIxiaoi researchBo Zhang, Xiaoke Wang, Guixiong ChenWe extend the deeplab method. Both VOC 2012 and COCO dataset are used for training.2019-05-07 07:08:00
DeepLabv2-CRFDeepLabv2-CRF(1) UCLA (2) Google (3) ECP / INRIALiang-Chieh Chen (1,2) and George Papandreou (2) and Iasonas Kokkinos (3) and Kevin Murphy (2) and Alan L. Yuille (1)DeepLabv2-CRF is based on three main methods. First, we employ convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool to repurpose ResNet-101 (trained on image classification task) in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within DCNNs. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and fully connected Conditional Random Fields (CRFs). See http://arxiv.org/abs/1606.00915 for further information.2016-06-06 01:59:20
DeepLabv3DeepLabv3Google Inc.Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig AdamIn this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks. We propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. See http://arxiv.org/abs/1706.05587 for further information.2017-06-20 01:59:26
DeepLabv3+DeepLabv3+Google Inc.Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig AdamSpatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on the PASCAL VOC 2012 semantic image segmentation dataset and achieve a state-of-art performance without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow. For details, please refer to https://arxiv.org/abs/1802.02611.2018-02-09 16:12:04
DeepLabv3+_AASPPDeepLabv3+_AASPPTsinghua UniversityJiancheng LiDeepLabv3+ with Attention Atrous Spatial Pyramid Pooling.2018-05-22 15:44:09
DeepLabv3+_JFTDeepLabv3+_JFTGoogle Inc.Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig AdamDeepLabv3+ by fine-tuning from the model pretrained on JFT-300M dataset. For details, please refer to https://arxiv.org/abs/1802.02611.2018-02-09 16:16:47
DeepLabv3-JFTDeepLabv3-JFTGoogle Inc.Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig AdamDeepLabv3 by fine-tuning from the model pretrained on JFT-300M dataset. See http://arxiv.org/abs/1706.05587 for further information.2017-08-05 01:16:48
DeepSqueeNetDeepSqueeNetSun Yat-sen University, SYSUHongPeng Wu,Long Chen, Kai HuangWe propose a method for semantic image segmentation. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1)SmallerDNNsrequirelesscommunicationacrossservers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an au-tonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To pro-vide all of these advantages, we propose a CNN architecture called DeepSqueeNet to semantic image segmentation . It based on SqueezeNet and VGG16. DeepSqueeNet achieves Deeplab(Based on VGG16) accuracy on semantic image segmentation with 10x fewer parameters.2016-07-20 13:16:16
DeepSqueeNet_CRFDeepSqueeNet_CRFSun Yat-sen University, SYSUHongPeng Wu,Long Chen, Kai HuangWe propose a method for semantic image segmentation. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1)SmallerDNNsrequirelesscommunicationacrossservers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an au-tonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To pro-vide all of these advantages, we propose a CNN architecture called DeepSqueeNet to semantic image segmentation . It based on SqueezeNet and VGG16. DeepSqueeNet achieves Deeplab(Based on VGG16) accuracy on semantic image segmentation with 10x fewer parameters. we add CRF2016-07-21 12:47:19
Dual Multi-Scale Manifold Ranking NetworkDual-Multi-Reso-MRWuhan UniversityMi Zhang, Ye Lv, Min Luo, Jiasi YiWe proposed a multi-scale network which utilize the dilated and non-dilated convolutional network as a dual. In both networks, a manifold ranking optimization method is embedded to optimize in a single stream jointly, i.e. no need to train the unary and pairwise network separately. And such a feedforward network makes it possible to train in an end-to-end fashion and guarantee a global optimal.2016-11-03 12:27:49
Expectation-Maximization Attention Networks for SEMANet152Peking UniversityXia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong LiuWe formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure.2019-08-15 16:22:33
ESPNetv2ESPNetv2University of WashingtonHannaneh Hajishirzi Mohammad Rastegari Linda ShapiroWe introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2, for modeling visual and sequential data. Our network uses group point-wise and depth-wise dilated separable convolutions to learn representations from a large effective receptive field with fewer FLOPs and parameters. The performance of our network is evaluated on three different tasks: (1) object classification, (2) semantic segmentation, and (3) language modeling. Experiments on these tasks, including image classification on the ImageNet and language modeling on the PenTree bank dataset, demonstrate the superior performance of our method over the state-of-the-art methods. Our network has better generalization properties than ShuffleNetv2 when tested on the MSCOCO multi-object classification task and the Cityscapes urban scene semantic segmentation task. Our experiments show that ESPNetv2 is much more power efficient than existing state-of-the-art efficient methods including ShuffleNets and MobileNets. Our code is open-source and available at https://github.com/sacmehta/ESPNetv22019-03-23 22:32:58
EfficientNet-L2 + NAS-FPN + Noisy StudentEfficientNet-L2 + NAS-FPN + Noisy StudentGoogle Inc.Golnaz Ghiasi, Barret Zoph, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin Cubuk, Quoc V. LeSingle-scale testing and without pre-training on COCO. See https://arxiv.org/abs/2006.06882 for details.2020-06-15 19:50:31
Efficient_SegmentationEfficientNet_MSCID_SegmentationTianjin UniversityXiu Su, Hongyan XuEfficientNet with MSCID module for segmentation2019-08-15 02:00:39
Context Encoding for Semantic SegmentationEncNetRutgers University, Amazon, SenseTime, CUHKHang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit AgrawalRecent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017. 2018-03-15 21:21:01
ExFuseExFuseFudan University, Megvii Inc.Zhenli Zhang, Xiangyu Zhang, Chao Peng, Jian SunFor more details, please refer to https://arxiv.org/abs/1804.03821.2018-05-22 09:27:16
Dilated FCN using VGG16 and Skip ArchitecturesFCN-2s_Dilated_VGG16Center for Cognitive Skill Enhancement, Independent University BangladeshSharif Amit Kamran, Ali Shihab SabbirThe weights were transferred from VGG16 and then the fully connected layers were converted to convolutional layers. Dilated convolution was used instead of vanila convolution in fc6 layer.The upsampling was done with Stride 2 and the upsampled layers were concatened in steps using four skip architectures. Pascal VOC2012 training data and SBD traning and validation data was used for training in two stages.2017-07-20 20:23:41
Dilated FCN using VGG19 and Skip ArchitecturesFCN-2s_Dilated_VGG19Center for Cognitive Skill Enhancement, Independent University BanlgadeshSharif Amit Kamran, Ali Shihab SabbirThe weights were transferred from VGG19 and then the fully connected layers were converted to convolutional layers. Dilated convolution was used instead of vanilla convolution in fc6 layer.The upsampling was done with Stride 2 and the upsampled layers were concatenated in steps using four skip architectures. Pascal VOC2012 training data and SBD training and validation data was used for training in two stages.2017-07-11 16:57:52
Fully convolutional netFCN-8sUC BerkeleyJonathan Long, Evan Shelhamer, Trevor DarrellWe apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net.2014-11-12 09:08:39
Fully convolutional netFCN-8s-heavyUC BerkeleyJonathan Long, Evan Shelhamer, Trevor DarrellWe apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net. The network is learned online with high momentum for better optimization.2016-02-06 09:57:31
FCN16s-Resnet101FCN16s-Resnet101peking universitypersonalFCN?output stride 16? based on resnet1012019-01-26 12:50:15
FCN with Cross-layer Concat and Multi-scale PredFCN_CLC_MSPNational Tsing Hua University, TaiwanTun-Huai Shih, Chiou-Ting HsuWe replace the original fc layers in VGG-16 with several conv and pool layers to extract hierarchical features (Pool3-5 and additional pool6-8). We then use pool3-8 to generate multi-scale predictions, and aggregate them to derive the dense prediction result. To jointly exploit the information from lower- and higher-level layers when making prediction, we adopt cross-layer concatenation to combine poolx features (lower-level) with prediction result of coarser stream (high-level). This makes the predictions of finer streams more robust. We do not adopt any pre- or post- processing steps. The number of parameters is about 36M, while the original FCN is 134M. We train all prediction streams at the same time using VOC additional annotated images (10582 in total), and it takes less than one day to train our FCN model on a single GTX Titan X GPU.2016-07-01 04:27:14
FDNet_16sFDNet_16sHongKong University of Science and Technology, altizure.comMingmin Zhen, Jinglu Wang, Siyu Zhu, Runze Zhang, Shiwei Li, Tian Fang, Long QuanA fully dense neural network with encoder-decoder structure is proposed that we abbreviate as FDNet. For each stage in the decoder module, feature maps of all the previous blocks are adaptively aggregated to feedforward as input. 2018-03-22 08:52:44
Weaky sup. segmentation by region scores' poolingFER_WSSS_REGION_SCORE_POOLUniversity of ZagrebJosip Krapac Sinisa SegvicWe address the problem of semantic segmentation of objects in weakly supervised setting, when only image-wide labels are available. We describe an image with a set of pre-trained convolutional features (from layer conv5.4 of 19-layer VGG-E network) and embed this set into a Fisher vector (64 component GMM, diagonal covariance for components, normalization only with inverse of Fisher matrix). We learn a linear classifier (logistic regression), apply the learned classifier on the set of all image regions (efficiently, using integral images), and propagate region scores back to the pixels. Compared to the alternatives the proposed method is simple, fast in inference, and especially in training. The details are described in the conference paper Krapac, Segvic: "Weakly-supervised semantic segmentation by redistributing region scores back to the pixels", GCPR 20162016-06-14 15:02:23
FSSI300FSSI300Beihang UniversityZuoxin LiFSSI300 Res502018-06-21 11:27:57
Learning Feature PyramidsFeature_PyramidsSun Yat-Sen University, The Chinese University of Hong KongGuangrun Wang, Wei YangThis model predicts segmentation via learning feature pyramids (LFP). LFP is originally used for human pose machine, described in the paper "Learning Feature Pyramids for Human Pose Estimation" (https://arxiv.org/abs/1708.01101). We extend it to the semantic image segmentation. The code and model are available at https://github.com/wanggrun/Learning-Feature-Pyramids2018-06-06 03:55:27
Gluon DeepLabV3 152Gluon DeepLabV3 152Amazon AIHang Zhang et al.https://gluon-cv.mxnet.io2018-10-03 18:18:27
GluonCV DeepLabV3GluonCV DeepLabV3AmazonHang Zhang et al.See details in GluonCV https://gluon-cv.mxnet.io/2018-09-07 00:48:31
GluonCV FCNGluonCV FCNAmazonHang Zhang et al.Please see details in GluonCV https://gluon-cv.mxnet.io/2018-09-07 01:11:12
GluonCV PSPGluonCV PSPAmazonHang Zhang et al.Please see details in GluonCV https://gluon-cv.mxnet.io/2018-09-07 00:51:53
Hierarchical Parsing NetHPNUESTCHengcan ShiHPN leverages global image semantic information and context among multiple objects to boost semantic segmentation.2017-12-13 02:30:24
HamburgerHamNet_w/o_COCOPeking UniversityZhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, Zhouchen LinPaper: Is Attention Better Than Matrix Decomposition? Accepted to ICLR 2021. Link: https://openreview.net/pdf?id=1FvkSpWosOl Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank completion problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding.2021-01-25 07:03:38
HikSeg_COCOHikSeg_COCOHikvision Research InstituteHaiming Sun, Di Xie, Shiliang PuWe begin with DilatedNet, and add a module which multi-scale features are combined step-wise. The network is able to learn to put different weights to features of different scales. This submission is first trained on COCO training set and validation set, then fine-tuned on PASCAL training set.2016-10-02 09:16:41
HypercolumnHypercolumnUC BerkeleyBharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra MalikRecognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the con- trary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activa- tions of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmenta- tion, where we improve state-of-the-art from 49.7 mean APr to 60.0, keypoint localization, where we get a 3.3 point boost over and part labeling, where we show a 6.6 point gain over a strong baseline.2015-04-09 02:01:36
Learning Object Interactions and Descriptions for IDW-CNNSun Yat-sen University; The Chinese University of Hong KongGuangrun Wang*, Ping Luo*, Liang Lin, Xiaogang WangThis work increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset. Unlike previous image captioning datasets, where captions were manually and densely annotated, images and their descriptions in IDW are automatically downloaded from Internet without any manual cleaning and refinement. An IDW-CNN is proposed to jointly train IDW and existing image segmentation dataset such as Pascal VOC 2012 (VOC).2017-06-30 00:11:24
KSAC(X-65) with hard imageKSAC-HThe University of Technology, SydneyYe HuangKSAC (Xception-65) + hard image bootstrap in OS = 162019-10-26 14:19:05
Ladder DenseNet-161LDN-161University of ZagrebIvan Kreso, Josip Krapac, Sinisa SegvicEfficient Ladder-style DenseNets for Semantic Segmentation of Large Images (journal submission). Trained on train+val+augmented data. DenseNet-161 backbone.2019-04-18 19:03:42
Laplacian reconstruction and refinementLRR_4x_COCOUniversity of California IrvineGolnaz Ghiasi, Charless C. FowlkesWe introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission is based on VGG-16 and it was trained on augmented PASCAL VOC and MS-COCO data. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation (http://arxiv.org/abs/1605.02264). 2016-06-16 06:19:08
Laplacian reconstruction and refinementLRR_4x_ResNet_COCOUniversity of California IrvineGolnaz Ghiasi Charless C. FowlkesWe introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission is based on ResNet-101 and it was trained on augmented PASCAL VOC and MS-COCO data. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation (http://arxiv.org/abs/1605.02264). 2016-07-18 19:07:32
Laplacian reconstruction and refinementLRR_4x_de_pyramid_VOCUniversity of California IrvineCharless C. Fowlkes Golnaz GhiasiWe introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission was trained on augmented PASCAL VOC. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation 2016-06-07 03:55:11
CVRSUAD submission, paper ID 21Ladder_DenseNetUNIZG-FERivan.kreso@fer.hrCVRSUAD submission paper ID 21: Ladder-style DenseNets for Semantic Segmentation of Large Natural Images 2017-07-25 17:42:21
Large_Kernel_MattersLarge_Kernel_MattersTsinghua UniversityPeng Chao, Yu Gang, Zhang XiangyuWe use the large kernel to generate the feature map and score map, resnet101 is applied with COCO, SBD datasets. No CRF or similar post processing methods are employed! No Multiscale2017-03-16 01:58:16
Deep Gaussian CRFMERL_DEEP_GCRFMitsubishi Electric Research LaboratoriesRaviteja Vemulapalli Oncel Tuzel We use two deep networks, one for generating unary potentials and the other for generating pairwise potentials. Then we use Gaussian CRF model for structured prediction. 2015-10-17 14:55:31
Gaussian CRF on top of Deeplab CNNMERL_UMD_Deep_GCRF_COCOUniversity of Maryland, College ParkRaviteja Vemulapalli (UMD) Oncel Tuzel (MERL) Ming-Yu Liu (MERL) Rama Chellappa (UMD)We use two deep networks, one for generating unary potentials and the other for generating pairwise potentials. Then we use a Gaussian CRF model for structured prediction. The entire model is trained end-to-end.2016-01-15 05:23:48
MSCI for Semantic SegmentationMSCIShenzhen UniversityDi Lin; Yuanfeng JiWe propose a novel scheme for aggregating features from different scales, which we refer to as Multi-Scale Context Intertwining (MSCI). Please see our paper http://vcc.szu.edu.cn/Di_Lin/papers/MSCI_eccv2018.pdf 2018-07-08 04:07:31
Box-SupervisionMSRA_BoxSupMicrosoft Research AsiaJifeng Dai, Kaiming He, Jian SunBoxSup makes use of bounding box annotations to supervise convolutional networks for semantic segmentation. From these boxes, we estimate segmentation masks with the help of region proposals. These masks are used to update the convolutional network, which is in turn fed back to mask estimation. This procedure is iterated. This result is achieved by semi-supervised training on the segmentation masks from PASCAL VOC and a large amount of bounding boxes from Microsoft COCO. See http://arxiv.org/abs/1503.01640 for details.2015-02-10 09:35:40
MSRA_BoxSupMSRA_BoxSupMicrosoft Research AsiaJifeng Dai, Kaiming He, Jian SunThis is an implementation of "BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation". We train a BoxSup model using the union set of VOC 2007 boxes, COCO boxes, and the augmented VOC 2012 training set. See http://arxiv.org/abs/1503.01640 for details.2015-05-18 09:42:54
Convolutional Feature MaskingMSRA_CFMMicrosoft Research AsiaJifeng Dai, Kaiming He, Jian SunThe method exploits shape information via ``masking" convolutional features. The proposal segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The CNN features of segments are directly masked out from these maps and used to train classifiers for recognition. Competitive accuracy and compelling computational speed are demonstrated by the proposed method. We achieve this result by utilizing segment proposal generated by Multi-scale Combinatorial Grouping (MCG), and initializing network parameters from the VGG 16-layer net. See http://arxiv.org/abs/1412.1283 for details.2014-12-17 02:56:52
Multi-Scale Residual Network for SegmentationMSRSegNet-UWUniversity of WashingtonLinda Shapiro, Hannaneh HajishirziUsing the prior work, we create a custom network that is fast as well as accurate. Our network runs at 21 fps (full resolution) while at 60 fps at a resolution of 224 x224. At low resolution, our network is as accurate as FCN-8s. More details are here: https://arxiv.org/pdf/1711.08040.pdf 2017-11-23 01:26:37
MasksegNetMasksegNetKyunghee universitymasksegnetMasksegNet2019-05-16 12:20:50
Multi-Task Learning for Human Pose EstimationMetu_Unified_NetMiddle East Technical UniversitySalih Karagoz, Muhammed Kocabas, Emre AkbasMulti-Task Learning for Multi-Person Pose Estimation, Human Semantic Segmentation and Human Detection. The model works simultaneously. We just only trained with coco-dataset. No additional data has used.2018-03-10 12:39:37
Multipath-RefineNetMultipath-RefineNetThe University of Adelaide; ACRV;Guosheng Lin; Anton Milan; Chunhua Shen; Ian Reid;Please refer to our technical report for details: "RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation" (https://arxiv.org/abs/1611.06612). Our source code is available at: https://github.com/guosheng/refinenet2017-01-17 18:03:57
Unified Object Detection and Semantic SegmentationNUS_UDSNUSJian Dong, Qiang Chen, Shuicheng Yan, Alan YuilleMotivated by the complementary effect observed from the typical failure cases of object detection and semantic segmentation, we propose a uni?ed framework for joint object detection and semantic segmentation [1]. By enforcing the consistency between final detection and segmentation results, our unified framework can effectively leverage the advantages of leading techniques for these two tasks. Furthermore, both local and global context information are integrated into the framework to better distinguish the ambiguous samples. By jointly optimizing the model parameters for all the components, the relative importance of different component is automatically learned for each category to guarantee the overall performance. [1] Jian Dong, Qiang Chen, Shuicheng Yan, Alan Yuille: Towards Unified Object Detection and Semantic Segmentation. ECCV 20142014-10-29 16:07:10
Joint a network to guided and maskingOBP-HJLCNnational central university Jia-Ching Wang , Chien-Yao Wang, Jyun-Hong Li We proposed a hierarchical joint guided networks which has ability to predict objects greater and finer. We also proposed a novel way to guided segmentation by object and boundary.2016-09-13 15:21:45
Oxford_TVG_CRF_RNN_COCOOxford_TVG_CRF_RNN_COCO[1] University of Oxford / [2] Baidu IDLShuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, Berkeley augmented data and a subset of COCO 2014 train data. More details will be available in the paper http://arxiv.org/abs/1502.03240.2015-04-22 11:26:57
Oxford_TVG_CRF_RNN_VOCOxford_TVG_CRF_RNN_VOC[1] University of Oxford / [2] Baidu IDLShuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, and Berkeley augmented data (COCO dataset was not used). More details will be available in the paper http://arxiv.org/abs/1502.03240. 2015-04-22 10:24:43
Higher Order CRF in CNNOxford_TVG_HO_CRFUniversity of OxfordAnurag Arnab Sadeep Jayasumana Shuai Zheng Philip TorrWe integrate a conditional random field with higher order potentials into a deep neural network. Our higher order potentials are based on object detector outputs and superpixel oversegmentation, and formulated such that their corresponding mean-field updates are differentiable. For further details, please refer to http://arxiv.org/abs/1511.08119 2016-03-16 21:12:47
PANPANBIT, Megvii Inc.Hanchao LiPyramid Attention Network for Semantic Segmentation; (without COCO pretrain)2018-07-04 13:10:20
POSTECH_DeconvNet_CRF_VOCPOSTECH_DeconvNet_CRF_VOCPOSTECH (Pohang University of Science and Technology)Hyeonwoo Noh, Seunghoon Hong, Bohyung Han.We propose a novel semantic segmentation algorithm by learning a deconvolution network. Our deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. The trained network is applied to each proposal in an input image, and the final semantic segmentation map is constructed by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks; our segmentation method typically identifies more detailed structures and handles objects in multiple scales more naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset without external training data. See http://arxiv.org/abs/1505.04366 for details.2015-08-18 18:42:18
POSTECH_EDeconvNet_CRF_VOCPOSTECH_EDeconvNet_CRF_VOCPOSTECH(Pohang University of Science and Technology)Hyeonwoo Noh, Seunghoon Hong, Bohyung HanWe propose a novel semantic segmentation algorithm by learning a deconvolution network. Our deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. The trained network is applied to each proposal in an input image, and the final semantic segmentation map is constructed by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks; our segmentation method typically identifies more detailed structures and handles objects in multiple scales more naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset without external training data. 2015-04-22 21:33:03
PSPNetPSPNetCUHK, SenseTimeHengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya JiaScene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU score as 85.4% on PASCAL VOC 2012 and 80.2% on Cityscapes. https://arxiv.org/abs/1612.011052016-12-06 02:22:13
Encoder-decoder with FCNPSP_flowThe University of Northwestern Polytechnical UniversityYanhua ZhangSpatial pyramid structure and a feature alignment.2021-07-13 14:21:30
Residual Forest classifier with FCN featuresRRF-4sMonash UniversityYan Zuo, Tom DrummondWe replace the solver component of FCN with a Random Residual Forest (RRF) Classifier and treat FCN as a generic feature extractor to train the RRF classifier2016-11-30 23:31:43
Tensor low-rank ReconstructionRecoNet152_cocoTencentPlease contact with wanli chen chenwl@mail.sustech.edu.cnPlease contact with wanli chen chenwl@mail.sustech.edu.cn2019-10-26 04:39:21
Res2Net:Multi-scale Backbone ArchitectureRes2NetNankai UniversityShanghua Gao, Ming-Ming Cheng Res2Net: A New Multi-scale Backbone Architecture (TPAMI20) We propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. source code: https://github.com/Res2Net 2020-02-22 05:29:02
ResNet-38 with COCOResNet-38_COCOThe University of AdelaideZifeng Wu, Chunhua Shen, Anton van den HengelPre-trained with COCO, and tested with multiple scales. See our report https://arxiv.org/abs/1611.10080 for details.2017-01-22 04:44:14
ResNet-38 Multi-scaleResNet-38_MSThe University of AdelaideZifeng Wu, Chunhua Shen, Anton van den HengelSingle model; multi-scale testing; NO COCO; NO CRF-based post-processing. For more details, refer to our report https://arxiv.org/abs/1611.10080 and code https://github.com/itijyou/ademxapp.2016-12-09 12:19:24
ResNet_DUC_HDC_TuSimpleResNet_DUC_HDCUC San Diego, CMU, UIUC, TuSimplePanqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, Garrison Cottrell We improve pixel-wise semantic segmentation by manipulating convolution-related operations: 1) we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information; 2) we implement hybrid dilated convolution (HDC) to aggregate global information and alleviate what we call the "gridding issue" caused by the standard dilated convolution operation. Current submission is single model and single scale testing. Pretrained models: https://goo.gl/DQMeun Paper link: https://arxiv.org/abs/1702.08502 2017-03-01 20:22:41
ResSegNetResSegNetSCUT-CIVICMengxi Li-2018-05-28 04:39:01
SDSSDSUC BerkeleyBharath Hariharan Pablo Arbelaez Ross Girshick Jitendra MalikWe aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [1]), introducing a novel architecture tailored for SDS. We then use category-specific, top-down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 4 point boost (8% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work.2014-07-21 22:46:22
SRC-B-MachineLearningLabSRC-B-MachineLearningLabSamsung R&D Institue China - Beijing, Machine Learning LabJianlong Yuan, Shu Wang, Wei Zhao, Hanchao Jia, Zhenbo LuoThe model is pretrained on ImageNet, and fineturned on COCO VOC SBD. The result is tested by multi scale and filp. The paper is in preparing. 2018-04-19 03:08:39
Score Map Pyramid NetScore Map Pyramid NetDalian Maritime UniversityShuhao MaOur method is fast2018-07-06 13:27:16
SegModelSegModelPeking UniverisityFalong Shen, Peking UniversityDeep fully convolutional networks with conditional random field. Trained on MSCOCO trainval set and Pascal VOC 12 train set.2016-08-23 04:04:21
SegNeXtSegNeXtTsinghua University and Nankai UniversityMeng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation (NeurIPS 2022). A simple CNN-based method for semantic segmentation.2022-09-19 11:12:10
SegNetSegNetUniversity of CambridgeAlex Kendall, Vijay Badrinarayanan and Roberto CipollaSegNet is a memory efficient real time deep convolutional encoder-decoder architecture. For more information, please see our publications and web demo at: http://mi.eng.cam.ac.uk/projects/segnet/2015-11-10 09:48:12
asfcasSepaNetdqwdawasfcaegvsfdvc2019-10-25 16:30:20
SpDConv2SpDConv2SpDConv2SpDConv2SpDConv22021-01-06 03:14:39
Tree-structured Kronecker Convolutional Networks TKCNetInstitute of Computing Technology, Chinese Academy of SciencesTianyi Wu, Sheng Tang, Rui Zhang, Linghui Li, Yongdong ZhangMost existing semantic segmentation methods employ atrous convolution to enlarge the receptive field of filters, but neglect important local contextual information. To tackle this issue, we firstly propose a novel Kronecker convolution which adopts Kronecker product to expand its kernel for taking into account the feature vectors neglected by atrous convolutions. Therefore, it can capture local contextual information and enlarge the field of view of filters simultaneously without introducing extra parameters. Secondly, we propose Tree-structured Feature Aggregation (TFA) module which follows a recursive rule to expand and forms a hierarchical structure. Thus, it can naturally learn representations of multi-scale objects and encode hierarchical contextual information in complex scenes. Finally, we design Tree-structured Kronecker Convolutional Networks (TKCN) that employs Kronecker convolution and TFA module. Extensive experiments on three datasets, PASCAL VOC 2012, PASCAL-Context and Cityscapes, verify the effectiveness of our proposed approach. Created on2018-04-20 13:04:57
Diverse M-Best with discriminative rerankingTTIC-divmbest-rerank(1) Toyota Technological Institute at Chicago, (2) Virginia TechPayman Yadollahpour (1), Dhruv Batra (1,2), Greg Shakhnarovich (1)We generate a set of M=10 full image segmentations using Diverse M-Best algorithm from [BYGS'12], applied to inference in the O2P model (Carreira et al., 2012). Then we discriminatively train a reranker based on a novel set of features. The learning of the reranker uses relative loss, with the objective to minimize gap with the oracle (the hindsight-best of the M segmentations), and relies on slack-rescaling structural SVM. The details are described in [YBS'13]. References: [BYGS'12] Batra, Yadollahpour, Guzman, Shakhnarovich, ECCV 2012. [YBS'13] Yadollahpour, Batra, Shakhnarovich, CVPR 2013.2012-11-15 04:03:01
Feedforward segmentation with zoom-out featuresTTI_zoomoutTTI-ChicagoMohammadreza Mostajabi, Payman Yadollahpour, Gregory ShakhnarovichOur method uses a feedforward network to directly label superpixels. For each superpixel we use features extracted from a nested set of "zoom-out" regions, from purely local to image-level. 2014-11-17 04:57:49
Feedforward segmentation with zoom-out featuresTTI_zoomout_16TTI-ChicagoMohammadreza Mostajabi, Payman Yadollahpour, Gregory ShakhnarovichSame as before, except using VGG 16-layer network instead of VGG CNN-S network. Fine-tuning on VOC-2012 was not performed. See http://arxiv.org/abs/1412.0774 for details.2014-11-24 08:54:05
Feedforward semantic segmentation with zoom-out featuresTTI_zoomout_v2TTI-ChicagoMohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich Similar to TTI_zoomout_16, except the way that we set the number and scope of zoom-out levels. In this version, zoom-out levels correspond to receptive field sizes of different layers in a convolutional neural network. Our model is trained only on VOC-2012. Details are provided in our CVPR 2015 paper available at http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mostajabi_Feedforward_Semantic_Segmentation_2015_CVPR_paper.pdf.2015-03-30 18:40:04
Global Deconvolutional Network with CRFUNIST_GDN_CRFUlsan National Institute of Science and Technology (UNIST)Vladimir Nekrasov, Janghoon Ju, Jaesik ChoiWe propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline DeepLab-CRF. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.2016-07-29 07:23:03
Global Deconvolutional Network with CRFUNIST_GDN_CRF_ENSUlsan National Institute of Science and Technology (UNIST)Vladimir Nekrasov, Janghoon Ju, Jaesik ChoiWe propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline DeepLab-CRF. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.2016-07-29 07:25:56
Global Deconvolutional NetworkUNIST_GDN_FCNUlsan National Institute of Science and Technology (UNIST)Vladimir Nekrasov, Janghoon Ju, Jaesik ChoiWe propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline FCN-32s. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.2016-07-27 01:39:17
Global Deconvolutional NetworkUNIST_GDN_FCN_FCUlsan National Institute of Science and Technology (UNIST)Vladimir Nekrasov, Janghoon Ju, Jaesik ChoiWe propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Besides that, we append a fully-connected layer after the down-sampled image to refine current predictions. Our model shows superior performance over baseline FCN-32s and even outperforms more powerful multi-scale variant. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.2016-07-27 01:49:02
Fully convolutional neural net using VGG19VGG19_FCN-Sharif Amit Kamran , Md. Asif Bin Khaled , Sabit Bin Kabir , Dr. Hasan Muhammad , Moin Mostakim We use VGG-19 classification neural net and then make it fully convolulational. Moreover, we use skip architectures by concatenating upsampled pool 1 to 4 with the score layer to get finer features. Training was done on two stages, first on Pascal VOC training dataset , secondly on both SBD training plus validation datasets. 2017-04-06 23:22:53
CNN segmentation based on manifold learningWeak_manifold_CNNUniversity of Central FloridaMarzieh Edraki CNN manifold learning for segmentation 2016-11-11 23:34:20
FLATTENETXC-FLATTENETSichuan University, Chengdu, ChinaXin CaiIt is well-known that the reduced feature resolution due to repeated subsampling operations poses a serious challenge to Fully Convolutional Network (FCN) based models. In contrast to the commonly-used strategies, such as dilated convolution and encoder-decoder structure, we introduce a novel Flattening Module to produce high-resolution predictions without either removing any subsampling operations or building a complicated decoder module. https://ieeexplore.ieee.org/document/8932465/metrics#metrics2020-01-17 07:46:18
new ConcatASPPXception65_ConcatASPP_DecoderTianjin University and Nankai UniversityXiu Su, Hongyan Xu, Hong Kanga new ASPP method2019-07-26 02:23:38
deeplabv3+ resnet50deeplabv3+ resnet50Northwestern Polytechnical UniversityLiying Gao, Peng Wangdeeplabv3+ resnet502018-12-11 13:36:13
deeplabv3+ resnet50deeplabv3+ resnet50Northwestern Polytechnical UniversityLiying Gao, Peng Wangweakly supervised segmentation, replace FCN by deeplabv3+2018-12-11 13:32:23
deeplabv3+ vgg16deeplabv3+ vgg16Northwestern Polytechnical UniversityLiying Gao, Peng Wangdeeplabv3+ vgg16 63.69 val2018-12-12 08:46:27
deeplabv3+ vgg16deeplabv3+ vgg16Northwestern Polytechnical UniversityLiying Gao, Peng Wangdeeplabv3+ vgg16 63.69 val2018-12-12 07:54:27
dsanetdsanetdsanetdsanetdsanet2019-11-23 03:51:33
dscnndscnnjwjwdscnn2018-05-25 19:49:13
fdsffdsffsdffsdffsdf2018-11-22 01:07:09
high revolution network baselinehrnet_baselineUCASxiaoyangIn this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. 2020-01-26 05:12:51
MFF Networkmulti-scale feature fusion networkshenzhen universitySijun Dong, Di Linwe proposed a novel network to make full use of context information for semantic segmentation.2018-11-26 13:04:53
fast laddernetresnet 101 + fast laddernetYale UniversityJuntang Zhuangresnet 101 + fast laddernet2018-10-29 19:53:41
resnet38resnet38_deeplabTsinghua UniversityChen Qianwaiting for submission2021-11-06 01:49:46
Semi-supervised seg with weak masks weak_semi_segXiamen UniversityLin ChengSemi-supervised segmentation with weak masks. We use 1.4k strong masks and 9k weak masks with class labels.2021-07-03 08:34:39
mixupxingchina1231232020-07-10 10:36:10