PASCAL VOC Challenge performance evaluation server

Segmentation Results: VOC2012 ^BETA

Competition "comp6" (train on own data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.

The highest scoring entry in each column is shown in bold.
Clicking on the blue arrow symbol () at the top of a column will order the submissions from high to low wrt performance on that column.

Average Precision (AP %)

	mean	aero plane	bicycle	bird	boat	bottle	bus	car	cat	chair	cow	dining table	dog	horse	motor bike	person	potted plant	sheep	sofa	train	tv/ monitor	submission date
VPNeXt ^[?]	92.2	98.9	78.5	98.6	92.1	92.3	95.2	96.8	96.1	70.7	98.8	79.9	96.0	98.4	96.9	95.8	89.8	98.2	78.1	96.6	91.3	10-Feb-2025
SegNeXt ^[?]	90.6	98.3	85.0	97.6	88.3	91.3	97.5	91.4	98.3	60.4	96.7	85.0	95.7	98.2	94.2	92.7	82.5	97.3	77.7	93.1	84.3	19-Sep-2022
EfficientNet-L2 + NAS-FPN + Noisy Student ^[?]	90.5	98.0	84.8	89.6	88.2	91.0	98.3	93.0	98.5	57.5	98.4	81.8	98.4	98.0	95.8	93.2	83.2	97.8	75.0	91.8	90.0	15-Jun-2020
DeepLabv3+_JFT ^[?]	89.0	97.5	77.9	96.2	80.4	90.8	98.3	95.5	97.6	58.8	96.1	79.2	95.0	97.3	94.1	93.8	78.5	95.5	74.4	93.8	81.6	09-Feb-2018
RecoNet152_coco ^[?]	89.0	97.3	80.4	96.5	83.8	89.5	97.6	95.4	97.7	50.1	96.8	82.6	95.1	97.7	95.1	92.6	80.2	95.2	71.7	92.1	83.8	26-Oct-2019
DeepLabv3+_AASPP ^[?]	88.5	97.4	80.3	97.1	80.1	89.3	97.4	94.1	96.9	61.9	95.1	77.2	94.2	97.5	94.4	93.0	72.4	93.8	72.6	93.3	83.3	22-May-2018
SRC-B-MachineLearningLab ^[?]	88.5	97.2	78.6	97.1	80.6	89.7	97.4	93.7	96.7	59.1	95.4	81.1	93.2	97.5	94.2	92.9	73.5	93.3	74.2	91.0	85.0	19-Apr-2018
SepaNet ^[?]	88.3	97.2	80.2	96.2	80.0	89.2	97.3	94.7	97.7	48.6	95.0	81.6	95.2	97.5	95.1	92.7	79.5	95.4	68.8	90.9	83.4	25-Oct-2019
EMANet152 ^[?]	88.2	96.8	79.4	96.0	83.6	88.1	97.1	95.0	96.6	49.4	95.4	77.8	94.8	96.8	95.1	92.0	79.3	95.9	68.5	91.7	85.6	15-Aug-2019
KSAC-H ^[?]	88.1	97.2	79.9	96.3	76.5	86.5	97.5	94.5	96.9	54.8	95.3	81.4	93.7	97.2	94.0	92.8	77.3	94.4	73.5	91.1	83.4	26-Oct-2019
SpDConv2 ^[?]	88.1	96.9	79.7	96.8	80.2	87.8	98.0	92.3	96.0	57.2	95.8	82.1	92.3	97.3	93.6	93.0	71.4	92.3	75.8	90.7	83.8	06-Jan-2021
A new feature fusion method: FillIn ^[?]	88.0	97.1	80.8	96.7	77.6	89.2	97.4	92.2	96.9	58.3	94.3	79.4	93.1	97.3	94.4	93.2	73.6	93.0	72.6	89.7	83.4	25-May-2020
MSCI ^[?]	88.0	96.8	76.8	97.0	80.6	89.3	97.4	93.8	97.1	56.7	94.3	78.3	93.5	97.1	94.0	92.8	72.3	92.6	73.6	90.8	85.4	08-Jul-2018
ExFuse ^[?]	87.9	96.8	80.3	97.0	82.5	87.8	96.3	92.6	96.4	53.3	94.3	78.4	94.1	94.9	91.6	92.3	81.7	94.8	70.3	90.1	83.8	22-May-2018
DeepLabv3+ ^[?]	87.8	97.0	77.1	97.1	79.3	89.3	97.4	93.2	96.6	56.9	95.0	79.2	93.1	97.0	94.0	92.8	71.3	92.9	72.4	91.0	84.9	09-Feb-2018
CaCNet ^[?]	87.5	97.1	80.3	96.1	79.7	86.7	97.2	93.8	96.4	45.5	95.0	82.1	92.7	97.0	94.6	91.8	78.2	95.4	65.7	92.3	82.2	29-May-2020
CFNet ^[?]	87.2	96.7	79.7	94.3	78.4	83.0	97.7	91.6	96.7	50.1	95.3	79.6	93.6	97.2	94.2	91.7	78.4	95.4	69.6	90.0	81.4	12-Jun-2019
DeepLabv3-JFT ^[?]	86.9	96.9	73.2	95.5	78.4	86.5	96.8	90.3	97.1	51.4	95.0	73.4	94.0	96.8	94.0	92.3	81.5	95.4	67.2	90.8	81.8	05-Aug-2017
DIS ^[?]	86.8	94.0	73.3	93.5	79.1	84.8	95.4	89.5	93.4	53.6	94.8	79.0	93.6	95.2	91.5	89.6	78.1	93.0	79.4	94.3	81.3	13-Sep-2017
Gluon DeepLabV3 152 ^[?]	86.7	96.5	74.3	96.1	80.2	85.2	97.0	93.8	96.4	49.7	93.6	77.6	95.1	95.3	93.9	89.6	75.8	94.4	70.8	89.7	78.7	03-Oct-2018
CASIA_IVA_SDN ^[?]	86.6	96.9	78.6	96.0	79.6	84.1	97.1	91.9	96.6	48.5	94.3	78.9	93.6	95.5	92.1	91.1	75.0	93.8	64.8	89.0	84.6	29-Jul-2017
APDN ^[?]	86.4	94.5	65.4	94.2	82.7	88.1	95.7	91.7	95.7	45.5	94.3	82.8	93.8	94.8	92.4	91.7	73.7	93.4	72.8	91.9	82.4	28-May-2019
IDW-CNN ^[?]	86.3	94.8	67.3	93.4	74.8	84.6	95.3	89.6	93.6	54.1	94.9	79.0	93.3	95.5	91.7	89.2	77.5	93.7	79.2	94.0	80.8	30-Jun-2017
DFN ^[?]	86.2	96.4	78.6	95.5	79.1	86.4	97.1	91.4	95.0	47.7	92.9	77.2	91.0	96.7	92.2	91.7	76.5	93.1	64.4	88.3	81.2	15-Jan-2018
GluonCV DeepLabV3 ^[?]	86.2	96.3	69.7	93.5	76.2	86.5	96.5	92.2	95.8	47.8	95.0	81.6	93.0	96.0	91.2	90.7	77.1	94.7	68.9	89.3	81.7	07-Sep-2018
EncNet ^[?]	85.9	95.3	76.9	94.2	80.2	85.3	96.5	90.8	96.3	47.9	93.9	80.0	92.4	96.6	90.5	91.5	70.9	93.6	66.5	87.7	80.8	15-Mar-2018
HamNet_w/o_COCO ^[?]	85.9	96.8	74.6	96.5	75.3	79.6	97.4	93.4	97.3	42.5	94.0	76.1	95.3	96.3	91.0	91.0	78.4	93.2	68.7	90.0	80.7	25-Jan-2021
HPN ^[?]	85.8	94.1	67.0	95.2	81.9	88.3	95.5	90.4	95.9	40.0	92.7	82.5	91.7	95.3	92.6	91.6	73.6	94.1	69.4	91.1	81.9	13-Dec-2017
XC-FLATTENET ^[?]	85.7	96.5	79.2	95.5	75.3	84.3	95.9	91.3	93.9	45.1	95.9	79.2	88.8	96.7	91.6	91.1	75.7	94.0	62.8	87.7	82.6	17-Jan-2020
DeepLabv3 ^[?]	85.7	96.4	76.6	92.7	77.8	87.6	96.7	90.2	95.4	47.5	93.4	76.3	91.4	97.2	91.0	92.1	71.3	90.9	68.9	90.8	79.3	20-Jun-2017
Auto-DeepLab-L ^[?]	85.6	96.5	77.3	94.8	74.1	84.0	97.1	88.7	94.5	53.5	91.6	79.2	88.4	94.2	90.2	91.2	75.1	90.1	70.7	89.1	79.7	11-Jan-2019
DP-CAN_decoder ^[?]	85.5	95.9	77.8	91.6	75.0	81.7	96.6	92.4	97.1	42.7	93.5	74.1	93.9	95.0	91.4	91.2	78.1	94.6	66.5	89.8	79.1	26-Jan-2019
PSPNet ^[?]	85.4	95.8	72.7	95.0	78.9	84.4	94.7	92.0	95.7	43.1	91.0	80.3	91.3	96.3	92.3	90.1	71.5	94.4	66.9	88.8	82.0	06-Dec-2016
CTNet ^[?]	85.3	96.1	75.9	96.8	78.0	82.4	95.3	92.3	96.7	42.0	93.8	71.2	93.8	95.0	90.5	90.6	77.9	95.2	62.9	89.5	78.4	29-Oct-2020
Res2Net ^[?]	85.3	96.1	77.6	96.1	77.3	84.5	96.7	92.5	95.0	40.5	91.9	78.3	92.2	93.7	92.7	89.6	77.6	93.7	63.5	87.3	78.6	22-Feb-2020
GluonCV PSP ^[?]	85.1	95.7	70.9	92.8	75.6	85.0	96.5	91.7	95.0	41.8	92.3	78.8	90.4	95.6	93.4	90.6	76.1	93.5	66.7	89.5	78.4	07-Sep-2018
ResNet-38_COCO ^[?]	84.9	96.2	75.2	95.4	74.4	81.7	93.7	89.9	92.5	48.2	92.0	79.9	90.1	95.5	91.8	91.2	73.0	90.5	65.4	88.7	80.6	22-Jan-2017
DP-CAN ^[?]	84.6	96.5	77.7	87.6	73.9	79.9	96.8	92.9	95.7	40.8	92.9	74.0	91.7	95.0	92.5	89.7	77.2	94.6	64.6	90.2	77.1	25-Jan-2019
DCANet ^[?]	84.4	96.0	44.8	95.1	75.1	85.8	97.2	91.0	95.0	47.5	94.5	75.8	93.9	96.0	92.2	89.7	74.5	95.4	66.3	91.1	79.8	13-Jan-2020
Multipath-RefineNet ^[?]	84.2	95.0	73.2	93.5	78.1	84.8	95.6	89.8	94.1	43.7	92.0	77.2	90.8	93.4	88.6	88.1	70.1	92.9	64.3	87.7	78.8	17-Jan-2017
resnet 101 + fast laddernet ^[?]	84.2	95.4	73.9	94.9	75.7	83.2	96.3	91.2	93.9	35.3	90.0	79.4	90.2	94.2	92.8	90.1	73.2	92.3	64.5	88.0	77.5	29-Oct-2018
FDNet_16s ^[?]	84.0	95.4	77.9	95.9	69.1	80.6	96.4	92.6	95.5	40.5	92.6	70.6	93.8	93.1	90.4	89.9	71.2	92.7	63.1	88.5	77.7	22-Mar-2018
PAN ^[?]	84.0	95.7	75.2	94.0	73.7	79.6	96.4	93.7	94.1	40.5	93.3	72.4	89.1	94.1	91.6	89.5	73.6	93.2	62.8	87.3	78.6	04-Jul-2018
Large_Kernel_Matters ^[?]	83.6	95.3	68.7	94.1	72.6	82.4	96.0	89.3	93.0	47.8	89.6	70.8	89.2	93.3	90.1	91.2	72.0	89.8	67.8	88.9	76.9	16-Mar-2017
multi-scale feature fusion network ^[?]	83.6	96.0	76.2	95.4	70.7	82.1	95.0	90.4	92.7	40.2	92.5	75.7	88.6	96.1	91.0	88.4	72.2	92.7	60.7	85.3	76.8	26-Nov-2018
GluonCV FCN ^[?]	83.6	94.8	59.5	94.6	71.5	81.9	95.6	91.2	93.9	42.1	91.3	77.0	91.5	93.2	91.0	90.0	74.0	92.5	68.1	88.6	77.2	07-Sep-2018
LDN-161 ^[?]	83.6	93.4	76.6	92.7	70.9	77.6	96.7	90.2	96.3	47.8	91.2	72.6	92.8	93.0	88.7	88.1	72.6	90.9	63.5	89.4	74.4	18-Apr-2019
DREN ^[?]	83.5	94.7	70.6	94.1	73.6	82.5	95.4	87.7	92.3	44.2	90.2	75.1	89.7	94.5	90.4	88.9	68.3	91.3	67.6	87.9	77.1	29-Mar-2019
Xception65_ConcatASPP_Decoder ^[?]	83.5	94.3	44.9	92.8	77.4	85.5	96.7	91.1	94.6	51.0	91.9	71.8	91.2	95.3	92.8	90.5	69.6	91.7	66.3	88.3	80.7	26-Jul-2019
TKCNet ^[?]	83.2	94.7	46.5	94.9	77.7	83.7	92.6	92.2	94.9	45.3	91.1	72.4	90.7	95.8	91.6	90.3	69.9	93.8	62.1	88.7	82.5	20-Apr-2018
ResNet-38_MS ^[?]	83.1	95.2	72.5	95.1	70.8	78.5	91.7	90.0	92.4	41.9	90.8	73.9	90.6	93.8	90.5	89.5	72.6	89.8	63.2	87.8	79.1	09-Dec-2016
ResNet_DUC_HDC ^[?]	83.1	92.1	64.6	94.7	71.0	81.0	94.6	89.7	94.9	45.6	93.7	74.4	92.0	95.1	90.0	88.7	69.1	90.4	62.7	86.4	78.2	01-Mar-2017
dsanet ^[?]	83.0	93.5	66.0	95.3	77.4	82.4	95.4	91.8	95.4	36.1	92.0	74.2	92.0	93.3	90.3	88.4	73.8	92.3	57.5	87.0	73.5	23-Nov-2019
Deep Layer Cascade (LC) ^[?]	82.7	85.5	66.7	94.5	67.2	84.0	96.1	89.8	93.5	47.2	90.4	71.5	88.9	91.7	89.2	89.1	70.4	89.4	70.7	84.2	79.6	06-Apr-2017
AAF_PSPNet ^[?]	82.2	91.3	72.9	90.7	68.2	77.7	95.5	90.7	94.7	40.9	89.5	72.6	91.6	94.1	88.3	88.8	67.3	92.9	62.6	85.2	74.0	21-Aug-2018
SegModel ^[?]	81.8	93.6	60.2	93.6	69.1	76.4	96.3	88.2	95.5	37.9	90.8	73.3	91.1	94.3	88.6	88.6	64.8	90.1	63.7	87.3	78.2	23-Aug-2016
DeepLab_XI ^[?]	81.6	96.2	45.0	94.9	76.3	82.1	96.1	83.2	95.0	47.9	94.1	51.2	92.7	96.4	89.3	90.9	58.9	92.4	68.2	90.1	76.9	07-May-2019
xing ^[?]	81.5	95.5	42.1	94.4	75.3	77.9	96.0	92.4	94.6	42.4	94.8	59.1	92.3	95.1	88.8	88.9	68.8	94.7	56.5	88.9	77.0	10-Jul-2020
HikSeg_COCO ^[?]	81.4	95.0	64.2	91.5	79.0	78.7	93.4	88.4	94.3	45.8	89.6	65.2	90.6	92.8	88.7	87.5	62.4	88.4	56.4	86.2	75.3	02-Oct-2016
dscnn ^[?]	81.2	94.0	58.5	91.3	69.2	78.2	95.5	89.8	92.9	38.5	90.3	70.2	90.8	93.5	87.0	87.4	63.4	89.5	65.1	88.9	75.8	25-May-2018
MSRSegNet-UW ^[?]	81.0	93.7	64.1	92.5	68.9	79.7	91.2	86.4	90.4	41.9	88.3	72.6	89.3	90.2	86.0	86.6	67.2	89.5	66.5	83.7	76.6	23-Nov-2017
DP_ResNet_CRF ^[?]	81.0	94.0	59.5	91.8	68.1	75.9	95.2	88.9	93.2	37.7	90.8	70.8	89.2	92.7	87.7	87.9	65.5	90.3	62.6	87.2	75.5	10-Nov-2016
Feature_Pyramids ^[?]	81.0	93.9	60.2	86.8	70.7	75.3	92.9	91.3	92.0	42.7	90.0	71.3	88.7	92.9	88.8	89.3	60.7	88.3	65.7	87.7	76.2	06-Jun-2018
MasksegNet ^[?]	81.0	95.3	43.9	93.4	72.9	80.5	91.1	86.1	91.9	44.2	87.7	65.8	90.9	93.2	92.4	90.2	72.0	92.0	60.6	86.3	74.4	16-May-2019
OBP-HJLCN ^[?]	80.4	92.7	54.8	91.6	68.0	76.9	95.7	89.3	92.6	35.2	89.0	69.3	89.4	92.7	87.9	87.5	66.8	88.5	62.2	86.1	76.2	13-Sep-2016
ResSegNet ^[?]	80.4	93.6	65.2	92.4	67.0	74.9	93.9	88.5	92.8	37.4	88.8	72.7	89.1	91.9	88.7	86.6	68.6	85.9	59.1	82.0	73.3	28-May-2018
CentraleSupelec Deep G-CRF ^[?]	80.2	92.9	61.2	91.0	66.3	77.7	95.3	88.9	92.4	33.8	88.4	69.1	89.8	92.9	87.7	87.5	62.6	89.9	59.2	87.1	74.2	12-Aug-2016
CMT-FCN-ResNet-CRF ^[?]	80.0	92.5	55.3	92.2	66.0	76.9	95.1	88.6	93.9	35.1	87.6	71.6	89.3	92.8	87.9	88.0	62.0	88.0	59.7	86.1	75.7	02-Aug-2016
DeepLabv2-CRF ^[?]	79.7	92.6	60.4	91.6	63.4	76.3	95.0	88.4	92.6	32.7	88.5	67.6	89.6	92.1	87.0	87.4	63.3	88.3	60.0	86.8	74.5	06-Jun-2016
PSP_flow ^[?]	79.4	86.2	44.2	93.4	72.1	75.8	93.7	91.2	95.0	38.6	86.7	63.9	89.0	89.4	90.4	88.4	64.4	91.8	60.9	82.6	73.8	13-Jul-2021
LRR_4x_ResNet_COCO ^[?]	79.3	92.4	45.1	94.6	65.2	75.8	95.1	89.1	92.3	39.0	85.7	70.4	88.6	89.4	88.6	86.6	65.8	86.2	57.4	85.7	77.3	18-Jul-2016
CASIA_SegResNet_CRF_COCO ^[?]	79.3	93.8	42.2	93.1	68.6	75.3	95.3	88.8	92.5	36.5	84.3	64.2	86.8	87.8	87.5	88.5	69.2	89.7	64.1	86.8	74.6	03-Jun-2016
hrnet_baseline ^[?]	79.3	93.8	43.5	84.8	63.9	82.4	92.8	91.0	93.8	45.6	88.0	61.4	90.0	90.2	88.0	88.1	66.8	91.1	53.3	87.1	74.4	26-Jan-2020
Adelaide_VeryDeep_FCN_VOC ^[?]	79.1	91.9	48.1	93.4	69.3	75.5	94.2	87.5	92.8	36.7	86.9	65.2	89.1	90.2	86.5	87.2	64.6	90.1	59.7	85.5	72.7	13-May-2016
EfficientNet_MSCID_Segmentation ^[?]	78.9	92.1	42.1	91.6	73.8	80.7	93.8	88.1	91.6	38.7	84.3	68.5	90.3	88.7	86.3	84.8	64.7	87.3	58.6	85.3	71.4	15-Aug-2019
BlitzNet512 ^[?]	78.8	92.4	42.7	78.8	67.5	77.0	95.2	88.5	90.1	39.1	85.5	73.2	85.5	89.6	88.5	87.3	67.8	85.9	62.9	88.8	74.5	19-Jul-2017
LRR_4x_COCO ^[?]	78.7	93.2	44.2	89.4	65.4	74.9	93.9	87.0	92.0	42.9	83.7	68.9	86.5	88.0	89.0	87.2	67.3	85.6	64.0	84.1	71.5	16-Jun-2016
weak_semi_seg ^[?]	78.6	92.2	62.0	90.0	64.8	77.1	93.3	84.8	91.4	31.4	89.1	73.3	88.0	87.7	86.1	84.5	65.4	85.4	56.9	85.1	67.8	03-Jul-2021
Ladder_DenseNet ^[?]	78.3	90.3	68.7	89.0	60.8	71.9	91.0	85.5	91.7	34.7	81.9	68.2	86.7	86.6	87.1	85.9	66.5	89.2	59.8	78.6	74.2	25-Jul-2017
CASIA_IVA_OASeg ^[?]	78.3	93.8	41.9	89.4	67.5	71.5	94.6	85.3	89.5	38.1	88.4	64.8	87.0	90.5	84.9	83.3	67.5	86.9	68.1	83.4	74.0	21-May-2016
Oxford_TVG_HO_CRF ^[?]	77.9	92.5	59.1	90.3	70.6	74.4	92.4	84.1	88.3	36.8	85.6	67.1	85.1	86.9	88.2	82.6	62.6	85.0	56.3	81.9	72.5	16-Mar-2016
Adelaide_Context_CNN_CRF_COCO ^[?]	77.8	92.9	39.6	84.0	67.9	75.3	92.7	83.8	90.1	44.3	85.5	64.9	87.3	88.8	84.5	85.5	68.1	89.0	62.8	81.2	71.4	06-Nov-2015
CUHK_DPN_COCO ^[?]	77.5	89.0	61.6	87.7	66.8	74.7	91.2	84.3	87.6	36.5	86.3	66.1	84.4	87.8	85.6	85.4	63.6	87.3	61.3	79.4	66.4	22-Sep-2015
Adelaide_Context_CNN_CRF_COCO ^[?]	77.2	92.3	38.8	82.9	66.1	75.1	92.4	83.1	88.6	41.8	85.9	62.8	86.7	88.4	84.0	85.4	67.4	88.8	61.9	81.9	71.7	13-Aug-2015
WeakTr_CRF_SAM_M2F_SwinL ^[?]	76.4	85.0	41.9	88.2	67.0	70.0	85.6	77.5	95.2	34.9	93.7	59.7	93.8	93.9	82.0	80.8	70.1	89.5	66.3	77.1	59.3	08-Jun-2025
DeepLab-CRF-Attention-DT ^[?]	76.3	93.2	41.7	88.0	61.7	74.9	92.9	84.5	90.4	33.0	82.8	63.2	84.5	85.0	87.2	85.7	60.5	87.7	57.8	84.3	68.2	03-Feb-2016
CentraleSuperBoundaries++ ^[?]	76.0	91.1	38.5	90.9	68.7	74.2	89.9	85.3	89.1	34.4	82.5	65.6	83.1	82.9	85.7	85.4	60.6	84.5	59.9	80.2	69.9	13-Jan-2016
LRR_4x_de_pyramid_VOC ^[?]	75.9	91.8	41.0	83.0	62.3	74.3	93.0	86.8	88.7	36.6	81.8	63.4	84.7	85.9	85.1	83.1	62.0	84.6	55.6	84.9	70.0	07-Jun-2016
DeepLab-CRF-Attention ^[?]	75.7	91.1	40.9	86.9	62.1	74.2	92.3	84.4	90.1	34.0	81.7	66.0	83.5	83.9	86.5	84.6	59.1	87.2	59.6	81.0	66.2	03-Feb-2016
Curtin_Qilin ^[?]	75.6	85.4	38.5	86.5	63.8	74.8	91.3	86.8	88.3	33.5	84.1	62.4	83.6	87.7	84.9	83.5	61.4	88.5	58.0	80.8	69.0	09-Mar-2018
BlitzNet ^[?]	75.6	90.1	38.7	87.5	68.6	70.1	93.1	86.4	89.2	32.3	81.7	67.9	82.2	82.9	84.7	81.5	63.3	85.5	55.5	83.1	70.6	17-Mar-2017
BlitzNet300 ^[?]	75.5	91.5	40.4	82.6	64.5	71.7	93.3	85.2	84.9	41.8	79.1	70.6	79.3	82.7	86.6	84.2	55.3	81.0	60.1	85.6	71.6	19-Jul-2017
Adelaide_Context_CNN_CRF_VOC ^[?]	75.3	90.6	37.6	80.0	67.8	74.4	92.0	85.2	86.2	39.1	81.2	58.9	83.8	83.9	84.3	84.8	62.1	83.2	58.2	80.8	72.3	30-Aug-2015
MSRA_BoxSup ^[?]	75.2	89.8	38.0	89.2	68.9	68.0	89.6	83.0	87.7	34.4	83.6	67.1	81.5	83.7	85.2	83.5	58.6	84.9	55.8	81.2	70.7	18-May-2015
FSSI300 ^[?]	75.1	91.1	42.6	89.1	66.4	69.2	92.5	88.5	86.8	33.2	79.2	63.2	82.4	81.4	86.9	82.1	58.1	83.2	53.0	83.1	71.5	21-Jun-2018
POSTECH_DeconvNet_CRF_VOC ^[?]	74.8	90.0	40.8	84.2	67.3	70.7	90.9	84.8	87.4	34.8	83.0	58.7	82.3	87.1	86.9	82.4	64.5	84.6	54.9	77.5	64.1	18-Aug-2015
MERL_UMD_Deep_GCRF_COCO ^[?]	74.8	89.9	42.6	90.0	65.0	69.2	89.9	83.9	88.2	31.3	81.8	66.4	82.9	81.1	85.7	83.4	58.4	88.4	56.7	77.7	64.3	15-Jan-2016
Oxford_TVG_CRF_RNN_COCO ^[?]	74.7	90.4	55.3	88.7	68.4	69.8	88.3	82.4	85.1	32.6	78.5	64.4	79.6	81.9	86.4	81.8	58.6	82.4	53.5	77.4	70.1	22-Apr-2015
UNIST_GDN_CRF_ENS ^[?]	74.0	88.6	48.6	88.8	64.7	70.4	87.2	81.8	86.4	32.0	77.1	64.1	80.5	78.0	84.0	83.3	59.2	85.9	56.8	77.9	65.0	29-Jul-2016
fdsf ^[?]	73.9	90.1	39.9	85.7	60.8	70.6	87.4	86.6	89.6	32.2	77.6	58.0	85.8	84.8	82.9	82.8	58.5	87.3	47.6	84.0	66.8	22-Nov-2018
DeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint ^[?]	73.9	89.2	46.7	88.5	63.5	68.4	87.0	81.2	86.3	32.6	80.7	62.4	81.0	81.3	84.3	82.1	56.2	84.6	58.3	76.2	67.2	26-Apr-2015
BlitzNet ^[?]	73.9	91.4	40.4	76.4	62.6	74.8	91.1	86.2	85.2	35.6	83.1	59.0	77.9	84.6	84.1	80.6	57.2	86.5	56.1	78.8	67.4	17-Mar-2017
UNIST_GDN_CRF ^[?]	73.2	87.9	37.8	88.8	64.5	70.7	87.7	81.3	87.1	32.5	76.7	66.6	80.3	76.6	82.2	82.3	57.9	84.5	55.9	78.5	64.2	29-Jul-2016
MERL_DEEP_GCRF ^[?]	73.2	85.2	43.9	83.3	65.2	68.3	89.0	82.7	85.3	31.1	79.5	63.3	80.5	79.3	85.5	81.0	60.5	85.5	52.0	77.3	65.1	17-Oct-2015
Bayesian Dilation Network ^[?]	73.1	88.6	39.0	86.2	63.3	67.1	88.1	81.9	86.8	34.7	81.1	57.1	81.3	86.5	83.4	83.4	53.7	84.0	53.3	80.5	62.5	07-Jun-2016
DeepLab-CRF-COCO-LargeFOV ^[?]	72.7	89.1	38.3	88.1	63.3	69.7	87.1	83.1	85.0	29.3	76.5	56.5	79.8	77.9	85.8	82.4	57.4	84.3	54.9	80.5	64.1	18-Mar-2015
POSTECH_EDeconvNet_CRF_VOC ^[?]	72.5	89.9	39.3	79.7	63.9	68.2	87.4	81.2	86.1	28.5	77.0	62.0	79.0	80.3	83.6	80.2	58.8	83.4	54.3	80.7	65.0	22-Apr-2015
Dual-Multi-Reso-MR ^[?]	72.4	87.6	40.3	80.6	62.9	71.3	88.1	84.4	84.7	29.6	77.8	58.5	80.0	81.0	85.4	82.1	55.0	83.8	48.2	80.3	65.3	03-Nov-2016
CCBM ^[?]	72.3	87.8	46.7	79.0	63.6	70.5	83.7	75.5	86.9	31.0	81.9	61.3	81.5	85.9	81.1	76.5	58.7	77.7	50.4	76.6	69.8	29-Nov-2015
Oxford_TVG_CRF_RNN_VOC ^[?]	72.0	87.5	39.0	79.7	64.2	68.3	87.6	80.8	84.4	30.4	78.2	60.4	80.5	77.8	83.1	80.6	59.5	82.8	47.8	78.3	67.1	22-Apr-2015
AGV BANA RES NAL ^[?]	71.7	81.6	36.6	86.2	58.7	76.8	78.6	82.0	87.3	34.4	79.3	63.8	82.6	79.7	78.5	79.8	56.5	84.5	55.3	70.7	60.3	31-Jan-2022
DeepLab-MSc-CRF-LargeFOV ^[?]	71.6	84.4	54.5	81.5	63.6	65.9	85.1	79.1	83.4	30.7	74.1	59.8	79.0	76.1	83.2	80.8	59.7	82.2	50.4	73.1	63.7	02-Apr-2015
resnet38_deeplab ^[?]	71.4	89.1	37.3	84.6	56.4	68.2	90.8	83.7	89.0	28.4	84.7	47.0	84.7	87.1	80.2	77.1	49.3	87.0	49.8	75.6	56.2	06-Nov-2021
DFPnet ^[?]	71.0	88.4	37.6	83.3	52.7	75.8	89.1	85.8	89.3	31.6	65.9	33.7	83.5	75.3	82.3	82.8	60.5	75.9	52.6	80.5	70.5	26-Aug-2018
MSRA_BoxSup ^[?]	71.0	86.4	35.5	79.7	65.2	65.2	84.3	78.5	83.7	30.5	76.2	62.6	79.3	76.1	82.1	81.3	57.0	78.2	55.0	72.5	68.1	10-Feb-2015
FCN16s-Resnet101 ^[?]	71.0	83.9	49.3	79.1	56.6	70.4	87.5	82.7	84.9	27.0	74.1	53.6	79.9	76.7	81.9	81.7	55.3	76.9	50.8	79.0	66.6	26-Jan-2019
FCN_CLC_MSP ^[?]	70.8	86.2	40.1	83.9	57.8	64.7	87.9	81.3	85.9	28.3	80.0	61.9	80.7	82.5	79.7	80.2	54.7	81.3	39.3	78.9	59.2	01-Jul-2016
DeepLab-CRF-COCO-Strong ^[?]	70.4	85.3	36.2	84.8	61.2	67.5	84.6	81.4	81.0	30.8	73.8	53.8	77.5	76.5	82.3	81.6	56.3	78.9	52.3	76.6	63.3	11-Feb-2015
DeepLab-CRF-LargeFOV ^[?]	70.3	83.5	36.6	82.5	62.3	66.5	85.4	78.5	83.7	30.4	72.9	60.4	78.5	75.5	82.1	79.7	58.2	82.0	48.8	73.7	63.3	28-Mar-2015
DeepSqueeNet_CRF ^[?]	70.1	85.7	37.4	83.4	59.7	67.8	85.2	79.8	81.4	27.9	72.3	60.4	76.5	78.2	82.7	78.8	57.3	78.6	49.0	77.6	61.0	21-Jul-2016
TTI_zoomout_v2 ^[?]	69.6	85.6	37.3	83.2	62.5	66.0	85.1	80.7	84.9	27.2	73.2	57.5	78.1	79.2	81.1	77.1	53.6	74.0	49.2	71.7	63.3	30-Mar-2015
RRF-4s ^[?]	69.4	79.5	57.3	78.7	61.8	64.1	83.9	78.1	80.4	30.0	73.0	59.4	74.3	73.9	80.8	77.9	53.9	76.4	46.1	71.7	63.9	30-Nov-2016
Score Map Pyramid Net ^[?]	69.3	80.9	38.5	79.0	58.5	68.6	83.2	80.0	85.7	31.0	66.1	56.2	76.2	71.0	81.1	81.6	54.9	74.6	49.4	75.9	68.9	06-Jul-2018
FCN-2s_Dilated_VGG19 ^[?]	69.0	81.8	37.0	79.5	57.2	67.5	83.8	79.3	83.0	28.5	74.5	57.5	76.0	75.9	79.5	78.6	57.0	77.8	45.3	73.7	63.2	11-Jul-2017
VGG19_FCN ^[?]	68.1	81.7	35.9	79.8	57.5	66.9	84.1	79.6	80.8	28.2	72.1	53.3	74.0	72.1	78.5	78.2	55.5	76.7	43.4	73.8	65.1	06-Apr-2017
ESPNetv2 ^[?]	68.0	87.5	36.9	75.9	64.0	63.8	87.2	73.7	76.5	26.7	70.3	57.5	68.9	70.6	82.9	78.9	48.1	76.4	46.9	77.7	64.1	23-Mar-2019
FCN-2s_Dilated_VGG16 ^[?]	67.6	81.1	35.7	78.0	58.5	63.9	82.8	79.7	81.4	27.8	71.2	53.6	75.1	74.8	79.2	77.8	55.3	74.5	45.5	72.7	60.0	20-Jul-2017
FCN-8s-heavy ^[?]	67.2	82.4	36.1	75.6	61.5	65.4	83.4	77.2	80.1	27.9	66.8	51.5	73.6	71.9	78.9	77.1	55.3	73.4	44.3	74.0	63.2	06-Feb-2016
DeepLab-CRF-MSc ^[?]	67.1	80.4	36.8	77.4	55.2	66.4	81.5	77.5	78.9	27.1	68.2	52.7	74.3	69.6	79.4	79.0	56.9	78.8	45.2	72.7	59.3	30-Dec-2014
DeepLab-CRF ^[?]	66.4	78.4	33.1	78.2	55.6	65.3	81.3	75.5	78.6	25.3	69.2	52.7	75.2	69.0	79.1	77.6	54.7	78.3	45.1	73.3	56.2	23-Dec-2014
DeepSqueeNet ^[?]	65.7	76.1	34.3	76.4	56.0	62.0	82.7	75.4	78.3	25.6	64.3	58.8	73.3	69.3	79.3	76.7	53.2	72.1	46.2	69.3	59.1	20-Jul-2016
AGV BANA VGG NAL attempt 5 ^[?]	65.6	77.1	31.6	72.1	54.8	63.8	82.8	76.0	82.0	26.6	65.0	58.5	75.5	64.2	75.8	70.9	54.0	76.4	44.6	74.9	60.1	30-Jan-2022
Bayesian FCN ^[?]	65.4	80.8	34.9	75.2	57.0	64.1	80.9	77.2	78.0	26.4	65.6	44.0	72.6	70.8	78.7	76.8	52.4	71.0	40.4	73.8	61.8	07-Jun-2016
Weak_manifold_CNN ^[?]	65.3	80.9	32.9	73.2	57.7	63.0	83.9	73.5	76.6	27.0	65.9	52.6	70.9	69.8	73.0	74.9	53.3	70.1	45.4	72.4	62.7	11-Nov-2016
deeplabv3+ resnet50 ^[?]	65.2	77.9	33.4	86.1	19.6	63.8	84.1	74.9	90.1	27.9	81.2	48.3	85.5	85.8	81.8	69.6	47.8	84.5	44.7	41.2	53.9	11-Dec-2018
CRF_RNN ^[?]	65.2	80.9	34.0	72.9	52.6	62.5	79.8	76.3	79.9	23.6	67.7	51.8	74.8	69.9	76.9	76.9	49.0	74.7	42.7	72.1	59.6	10-Feb-2015
deeplabv3+ resnet50 ^[?]	64.6	78.7	32.9	79.7	19.5	67.8	88.0	75.5	89.6	24.7	80.6	46.1	85.1	83.8	83.1	65.5	48.1	83.7	44.0	41.3	52.8	11-Dec-2018
UNIST_GDN_FCN_FC ^[?]	64.4	75.6	31.5	69.2	51.6	62.9	78.8	76.7	78.7	24.6	61.7	60.3	74.5	62.6	76.1	74.3	51.5	70.6	47.3	74.0	58.4	27-Jul-2016
TTI_zoomout_16 ^[?]	64.4	81.9	35.1	78.2	57.4	56.5	80.5	74.0	79.8	22.4	69.6	53.7	74.0	76.0	76.6	68.8	44.3	70.2	40.2	68.9	55.3	24-Nov-2014
deeplabv3+ vgg16 ^[?]	64.3	85.0	32.1	83.5	19.4	63.8	88.7	73.7	88.5	24.4	76.9	49.5	82.3	79.8	82.2	66.0	56.3	81.4	44.6	46.6	39.8	12-Dec-2018
deeplabv3+ vgg16 ^[?]	63.9	84.6	31.2	78.8	19.0	64.1	87.9	74.3	87.7	24.7	77.5	49.6	83.3	81.8	82.4	66.2	54.1	80.1	44.6	44.0	39.7	12-Dec-2018
Hypercolumn ^[?]	62.6	68.7	33.5	69.8	51.3	70.2	81.1	71.9	74.9	23.9	60.6	46.9	72.1	68.3	74.5	72.9	52.6	64.4	45.4	64.9	57.4	09-Apr-2015
UNIST_GDN_FCN ^[?]	62.2	74.5	31.9	66.7	49.7	60.5	76.9	75.9	76.0	22.9	57.6	54.5	73.0	59.4	75.0	73.7	51.0	67.5	43.3	70.0	56.4	27-Jul-2016
FCN-8s ^[?]	62.2	76.8	34.2	68.9	49.4	60.3	75.3	74.7	77.6	21.4	62.5	46.8	71.8	63.9	76.5	73.9	45.2	72.4	37.4	70.9	55.1	12-Nov-2014
MSRA_CFM ^[?]	61.8	75.7	26.7	69.5	48.8	65.6	81.0	69.2	73.3	30.0	68.7	51.5	69.1	68.1	71.7	67.5	50.4	66.5	44.4	58.9	53.5	17-Dec-2014
SegNet ^[?]	59.9	73.6	37.6	62.0	46.8	58.6	79.1	70.1	65.4	23.6	60.4	45.6	61.8	63.5	75.3	74.9	42.6	63.7	42.5	67.8	52.7	10-Nov-2015
TTI_zoomout ^[?]	58.4	70.3	31.9	68.3	46.4	52.1	75.3	68.4	75.3	19.2	58.4	49.9	69.6	63.0	70.1	67.6	41.5	64.0	34.9	64.2	47.3	17-Nov-2014
SDS ^[?]	51.6	63.3	25.7	63.0	39.8	59.2	70.9	61.4	54.9	16.8	45.0	48.2	50.5	51.0	57.7	63.3	31.8	58.7	31.2	55.7	48.5	21-Jul-2014
NUS_UDS ^[?]	50.0	67.0	24.5	47.2	45.0	47.9	65.3	60.6	58.5	15.5	50.8	37.4	45.8	59.9	62.0	52.7	40.8	48.2	36.8	53.1	45.6	29-Oct-2014
TTIC-divmbest-rerank ^[?]	48.1	62.7	25.6	46.9	43.0	54.8	58.4	58.6	55.6	14.6	47.5	31.2	44.7	51.0	60.9	53.5	36.6	50.9	30.1	50.2	46.8	15-Nov-2012
BONN_O2PCPMC_FGT_SEGM ^[?]	47.8	64.0	27.3	54.1	39.2	48.7	56.6	57.7	52.5	14.2	54.8	29.6	42.2	58.0	54.8	50.2	36.6	58.6	31.6	48.4	38.6	08-Aug-2013
BONN_O2PCPMC_FGT_SEGM ^[?]	47.5	63.4	27.3	56.1	37.7	47.2	57.9	59.3	55.0	11.5	50.8	30.5	45.0	58.4	57.4	48.6	34.6	53.3	32.4	47.6	39.2	23-Sep-2012
BONNGC_O2P_CPMC_CSI ^[?]	46.8	63.6	26.8	45.6	41.7	47.1	54.3	58.6	55.1	14.5	49.0	30.9	46.1	52.6	58.2	53.4	32.0	44.5	34.6	45.3	43.1	23-Sep-2012
BONN_CMBR_O2P_CPMC_LIN ^[?]	46.7	63.9	23.8	44.6	40.3	45.5	59.6	58.7	57.1	11.7	45.9	34.9	43.0	54.9	58.0	51.5	34.6	44.1	29.9	50.5	44.5	23-Sep-2012
FER_WSSS_REGION_SCORE_POOL ^[?]	38.0	33.1	21.7	27.7	17.7	38.4	55.8	38.3	57.9	13.6	37.4	29.2	43.9	39.1	52.4	44.4	30.2	48.7	26.4	31.8	36.3	14-Jun-2016
Metu_Unified_Net ^[?]	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	87.8	-	-	-	-	-	10-Mar-2018

Abbreviations

Title	Method	Affiliation	Contributors	Description	Date
DeepLabv3+ with Fillin fusion	A new feature fusion method: FillIn	Beijing University of Technology	Tian Liu Lichun Wang Shaofan Wang	https://arxiv.org/abs/1912.08059 The new version of our paper is not update yet. The feature fusion is actually privilege operation: Only use in training.	2020-05-25 18:35:34
Adaptive Affinity Fields for Semantic Segmentation	AAF_PSPNet	UC Berkeley / ICSI	Tsung-Wei Ke, Jyh-Jing Hwang, Ziwei Liu, Stella X. Yu (* equal contribution)	Existing semantic segmentation methods mostly rely on per-pixel supervision, unable to capture structural regularity present in natural images. Instead of learning to enforce semantic labels on individual pixels, we propose to enforce affinity field patterns in individual pixel neighbourhoods, i.e., the semantic label patterns of whether neighbouring pixels are in the same segment should match between the prediction and the ground-truth. The affinity fields characterize geometric relationships within the image, such as "motorcycles have round wheels". We further develop a novel method for learning the optimal neighbourhood size for each semantic category, with an adversarial loss that optimizes over worst-case scenarios. Unlike the common Conditional Random Field (CRF) approaches, our adaptive affinity field (AAF) method has no extra parameters during inference, and is less sensitive to appearance changes in the image.	2018-08-21 16:28:38
AGV BANA RES NAL	AGV BANA RES NAL	AGV BANA RES NAL	AGV BANA RES NAL	AGV BANA RES NAL	2022-01-31 04:20:30
AGV BANA VGG NAL attempt 5	AGV BANA VGG NAL attempt 5	AGV BANA VGG NAL attempt 5	AGV BANA VGG NAL attempt 5	AGV BANA VGG NAL attempt 5	2022-01-30 16:24:19
Adaptive Progressive Decision Network	APDN	UESTC	Hengcan Shi, Hongliang Li, Qingbo Wu	Adaptive Progressive Decision Network	2019-05-28 08:03:53
Adelaide_Context_CNN_CRF_COCO	Adelaide_Context_CNN_CRF_COCO	The University of Adelaide; ACRV; D2DCRC	Guosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;	Please refer to our technical report: Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation (available at: http://arxiv.org/abs/1504.01013). This technical report will be updated later. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both DCNNs and CRFs. Specifically, we train CRFs whose potential functions are modelled by fully convolutional neural networks (FCNNs). The resulted deep conditional random fields (DCRFs) are thus able to learn complex feature representations; and during the course of learning, dependencies between the output variables are taken into account. As in conventional DCNNs, the training of our model is performed in an end-to-end fashion using back-propagation. Different from directly maximizing likelihood, however, inference may be needed at each gradient descent iteration, which can be computationally very expensive since typically millions of iterations are required. To enable efficient training, we propose to use approximate training, namely, piecewise training of CRFs, avoiding repeated inference.	2015-08-13 04:13:59
Adelaide_Context_CNN_CRF_COCO	Adelaide_Context_CNN_CRF_COCO	The University of Adelaide; ACRV; D2DCRC	Guosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;	Please refer to our technical report: http://arxiv.org/abs/1504.01013. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both CNNs and CRFs.	2015-11-06 07:46:13
Adelaide_Context_CNN_CRF_VOC	Adelaide_Context_CNN_CRF_VOC	The University of Adelaide; ACRV; D2DCRC	Guosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel;	Please refer to our technical report: Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation (available at: http://arxiv.org/abs/1504.01013). This technical report will be updated later. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both DCNNs and CRFs. Specifically, we train CRFs whose potential functions are modelled by fully convolutional neural networks (FCNNs). The resulted deep conditional random fields (DCRFs) are thus able to learn complex feature representations; and during the course of learning, dependencies between the output variables are taken into account. As in conventional DCNNs, the training of our model is performed in an end-to-end fashion using back-propagation. Different from directly maximizing likelihood, however, inference may be needed at each gradient descent iteration, which can be computationally very expensive since typically millions of iterations are required. To enable efficient training, we propose to use approximate training, namely, piecewise training of CRFs, avoiding repeated inference.	2015-08-30 11:49:27
High-performance Very Deep FCN	Adelaide_VeryDeep_FCN_VOC	The University of Adelaide; D2DCRC	Zifeng Wu, Chunhua Shen, Anton van den Hengel	We propose a method for high-performance semantic image segmentation based on very deep fully convolutional networks. A few design factors are carefully examined to achieve the result. Details can be found in the paper "High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks�, Zifeng Wu, Chunhua Shen, Anton van den Hengel�: http://arxiv.org/abs/1604.04339. Note that the system used for this submission was trained on the augmented VOC 2012 data ONLY.	2016-05-13 04:57:00
Auto-DeepLab-L	Auto-DeepLab-L	Johns Hopkins University; Google Inc.; Stanford University	Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei	In this work, we study Neural Architecture Search for semantic image segmentation, an important computer vision task that assigns a semantic label to every pixel in an image. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Without any ImageNet pretraining, our architecture searched specifically for semantic image segmentation attains state-of-the-art performance. Please refer to https://arxiv.org/abs/1901.02985 for details.	2019-01-11 19:43:31
O2P Regressor + Composite Statistical Inference	BONNGC_O2P_CPMC_CSI	(1) University of Bonn, (2) Georgia Institute of Technology, (3) University of Coimbra	Joao Carreira (1,3) Fuxin Li (2) Guy Lebanon (2) Cristian Sminchisescu (1)	We utilize a novel probabilistic inference procedure (unpublished yet), Composite Statisitcal Inference (CSI), on semantic segmentation using predictions on overlapping figure-ground hypotheses. Regressor predictions on segment overlaps to the ground truth object are modelled as generated by the true overlap with the ground truth segment plus noise. A model of ground truth overlap is defined by parametrizing on the unknown percentage of each superpixel that belongs to the unknown ground truth. A joint optimization on all the superpixels and all the categories is then performed in order to maximize the likelihood of the SVR predictions. The optimization has a tight convex relaxation so solutions can be expected to be close to the global optimum. A fast and optimal search algorithm is then applied to retrieve each object. CSI takes the intuition from the SVRSEGM inference algorithm that multiple predictions on similar segments can be combined to better consolidate the segment mask. But it fully develops the idea by constructing a probabilistic framework and performing composite MLE jointly on all segments and categories. Therefore it is able to consolidate better object boundaries and handle hard cases when objects interact closely and heavily occlude each other. For each image, we use 150 overlapping figure-ground hypotheses generated by the CPMC algorithm (Carreira and Sminchisescu, PAMI 2012), and linear SVR predictions on them with the novel second order O2P features (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONN_CMBR_O2P_CPMC_LIN) as the input to the inference algorithm.	2012-09-23 23:49:02
Linear SVR with second-order pooling.	BONN_CMBR_O2P_CPMC_LIN	(1) University of Bonn, (2) University of Coimbra	Joao Carreira (2,1) Rui Caseiro (2) Jorge Batista (2) Cristian Sminchisescu (1)	We present a novel effective local feature aggregation method that we use in conjunction with an existing figure-ground segmentation sampling mechanism. This submission is described in detail in [1]. We sample multiple figure-ground segmentation candidates per image using the Constrained Parametric Min-Cuts (CPMC) algorithm. SIFT, masked SIFT and LBP features are extracted on the whole image, then pooled over each object segmentation candidate to generate global region descriptors. We employ a novel second-order pooling procedure, O2P, with two non-linearities: a tangent space mapping and power normalization. The global region descriptors are passed through linear regressors for each category, then labeled segments in each image having scores above some threshold are pasted onto the image in the order of these scores. Learning is performed using an epsilon-insensitive loss function on overlap with ground truth, similar to [2], but within a linear formulation (using LIBLINEAR). comp6: learning uses all images in the segmentation+detection trainval sets, and external ground truth annotations provided by courtesy of the Berkeley vision group. comp5: one model is trained for each category using the available ground truth segmentations from the 2012 trainval set. Then, on each image having no associated ground truth segmentations, the learned models are used together with bounding box constraints, low-level cues and region competition to generate predicted object segmentations inside all bounding boxes. Afterwards, learning proceeds similarly to the fully annotated case. 1. �Semantic Segmentation with Second-Order Pooling�, Carreira, Caseiro, Batista, Sminchisescu. ECCV 2012. 2. "Object Recognition by Ranking Figure-Ground Hypotheses", Li, Carreira, Sminchisescu. CVPR 2010.	2012-09-23 19:11:47
BONN_O2PCPMC_FGT_SEGM	BONN_O2PCPMC_FGT_SEGM	(1) Universitfy of Bonn, (2) University of Coimbra, (3) Georgia Institute of Technology, (4) Vienna University of Technology	Joao Carreira(1,2), Adrian Ion(4), Fuxin Li(3), Cristian Sminchisescu(1)	Same as before, except tilings non-maximal	2013-08-08 05:54:53
BONN_O2PCPMC_FGT_SEGM	BONN_O2PCPMC_FGT_SEGM	(1) Universitfy of Bonn, (2) University of Coimbra, (3) Georgia Institute of Technology, (4) Vienna University of Technology	Joao Carreira(1,2), Adrian Ion(4), Fuxin Li(3), Cristian Sminchisescu(1)	We present a joint image segmentation and labeling model which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales using CPMC (Carreira and Sminchisescu, PAMI 2012), constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag (Ion, Carreira, Sminchisescu, ICCV2011), followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure (Ion, Carreira, Sminchisescu, NIPS2011). As meta-features we combine outputs from linear SVRs using novel second order O2P features to predict the overlap between segments and ground-truth objects of each class (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONNCMBR_O2PCPMC_LINEAR), bounding box object detectors, and kernel SVR outputs trained to predict the overlap between segments and ground-truth objects of each class (Carreira, Li, Sminchisescu, IJCV 2012). comp6: the O2P SVR learning uses all images in the segmentation+detection trainval sets, and external ground truth annotations provided by courtesy of the Berkeley vision group.	2012-09-23 21:39:35
Bayesian Dilation Network	Bayesian Dilation Network	University of Cambridge	Alex Kendall	http://arxiv.org/abs/1511.02680	2016-06-07 08:28:00
Bayesian FCN	Bayesian FCN	University of Cambridge	Alex Kendall	http://mi.eng.cam.ac.uk/projects/segnet/	2016-06-07 08:36:38
Fully conv net for segmentation and detection	BlitzNet	Inria	Nikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia Schmid	CNN for joint segmentation and detection (based on SSD). Input resolution 300. Trained on VOC07 trainval + VOC12 trainval.	2017-03-17 18:24:29
Fully conv net for segmentation and detection	BlitzNet	Inria	Nikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia Schmid	CNN for joint segmentation and detection (based on SSD). Input resolution 512. Trained on VOC07 trainval + VOC12 trainval.	2017-03-17 18:22:43
FCN	BlitzNet300	INRIA	Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid	CNN for joint segmentation and detection (based on SSD). Input resolution 300. Operates with speed 24 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO.	2017-07-19 13:57:45
FCN	BlitzNet512	INRIA	Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid	CNN for joint segmentation and detection (based on SSD). Input resolution 512. Operates with speed 19 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO.	2017-07-19 13:38:53
Objectness-aware Semantic Segmentation	CASIA_IVA_OASeg	Institute of Automation, Chinese Academy of Sciences	Yuhang Wang, Jing Liu, Yong Li, Jun Fu, Hang Song, Hanqing Lu	We propose an objectness-aware semantic segmentation framework (OA-Seg) consisting of two deep networks. One is a lightweight deconvolutional neural network (Light-DCNN) which obviously decreases model size and convergence time with impressive segmentation performance. The other one is an object proposal network (OPN) used to roughly locate object regions. MSCOCO is used to extend training data and CRF is used as post-processing.	2016-05-21 01:52:15
CASIA_IVA_SDN	CASIA_IVA_SDN	National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences	Jun Fu, Jing Liu, Yuhang Wang, Zhenwei Shen, Zhiwei Fang, Hanqing Lu	We propose a Stacked Deconvolutional Network (SDN) for semantic segmentation. We stack multiple SDN units to make network deeper and meanwhile, dense connections and hierarchical supervision are adopted to promote network optimization. CRF is not employed!	2017-07-29 06:00:31
CASIA_SegResNet_CRF_COCO	CASIA_SegResNet_CRF_COCO	Institude of Automation, Chinese Academy of Sciences	Xinze Chen, Guangliang Cheng, Yinghao Cai	We propose a novel semantic segmentation method, which consists of three parts: a SAR-based data augmentation method, a deeper residual network including three effective techniques and an online hard pixels mining. We combine these three parts to train an end-to-end network.	2016-06-03 09:20:50
CCBM	CCBM	University of Tsinghua	Qiurui Wang, Chun Yuan, Zhihui Lin, Zhicheng Wang, Xin Qiu	We propose a method combined with convolutional neural network and Conditional Boltzmann Machines for object segmentation, called CCBM, which further utilizes human visual border detection method. We use CNNs to extract features and segment them by improved Conditional Boltzmann Machines. We also use Structured Random Forests based method to detect object border for a better effert. Finally, each superpixel is labelled as output. The proposed method for this submission was trained on VOC 2012 Segmentation training data and a subset of COCO 2014 training data.	2015-11-29 07:26:11
Co-occurrent Features in Semantic Segmentation	CFNet	Amazon	Hang Zhang, Han Zhang, Chenguang Wang, Junyuan Xie	Recent work has achieved great success in utilizing global contextual information for semantic segmentation, including increasing the receptive field and aggregating pyramid feature representations. In this paper, we go beyond global context and explore the fine-grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts the distribution of co-occurrent features for a given target. To leverage the semantic context in the co-occurrent features, we build an Aggregated Co-occurrent Feature (ACF) Module by aggregating the probability of the co-occurrent feature within the co-occurrent context. ACF Module learns a fine-grained spatial invariant representation to capture co-occurrent context information across the scene. Our approach significantly improves the segmentation results using FCN and achieves superior performance 54.0% mIoU on Pascal Context, 87.2% mIoU on Pascal VOC 2012 and 44.89% mIoU on ADE20K datasets with ResNet-101 base network.	2019-06-12 03:49:01
CMT-FCN-ResNet-CRF	CMT-FCN-ResNet-CRF	Intel labs China and Tsinghua University	Libin Wang, Anbang, Yao, Jianguo Li, Yurong Chen, Li Zhang?	We propose a novel coupled multi-task FCN. Both VOC 2012 and COCO dataset are used for training, and CRF is applied as post-processing step.	2016-08-02 09:57:05
CRF as RNN	CRF_RNN	University of Oxford	Shuai Zheng; Sadeep Jayasumana; Bernardino Romera-Paredes; Philip Torr	We introduce a new form of convolutional neural network, called CRF-RNN, which expresses a conditional random field (CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. See the paper: "Conditional Random Fields as Recurrent Neural Networks".	2015-02-10 11:03:16
CTNet	CTNet	Nanjing University Of Science And Technology	CTNet	CTNet	2020-10-29 01:38:27
Deep Parsing Network	CUHK_DPN_COCO	The Chinese University of Hong Kong	Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang	This work addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN architecture to model unary terms and additional layers are carefully devised to approximate the mean field algorithm (MF) for pairwise terms. It has several appealing properties. First, different from the recent works that combined CNN and MRF, where many iterations of MF were required for each training image during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing works as its special cases. Third, DPN makes MF easier to be parallelized and speeded up in Graphical Processing Unit (GPU). The system used for this submission was trained on augmented VOC 2012 and MS-COCO 2014 training set. Please refer to the paper "Semantic Image Segmentation via Deep Parsing Network" (http://arxiv.org/abs/1509.02634) for further information.	2015-09-22 16:52:27
Learning to Predict CaC for semantic segmentation	CaCNet	CUHK	Jianbo Liu, Junjun He, Jimmy S. Ren, Yu Qiao, Hongsheng Li	Long-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods demonstrate that using global context for re-weighting feature channels can effectively improve the accuracy of semantic segmentation. However, the globally-sharing feature re-weighting vector might not be optimal for regions of different classes in the input image. In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps. In CaC-Net, a set of context-adaptive convolution kernels are predicted from the global contextual information in a parameter-efficient manner. When used for convolution with the semantic feature maps, the predicted convolutional kernels can generate the spatially-varying feature weighting factors capturing both global and local contextual information. Comprehensive experimental results show that our CaC-Net achieves superior segmentation performance on three public datasets, PASCAL Context, PASCAL VOC 2012 and ADE20K.	2020-05-29 05:19:26
Deep G-CRF (QO) combined with Deeplab-v2	CentraleSupelec Deep G-CRF	CentraleSupelec / INRIA	Siddhartha Chandra & Iasonas Kokkinos	We employ the deep Gaussian CRF Quadratic Optimization formulation to learn pairwise terms for semantic segmentation using the Deeplab-v2-resnet-101 network. Additionally, we use the dense-CRF post-processing to refine object boundaries. This work is an accepted paper at ECCV 2016 and will be presented at the conference. Please refer to our arXiv report here: http://arxiv.org/abs/1603.08358 We will update the report with more details soon.	2016-08-12 11:21:28
"Super-Human" boundaries combined with Deeplab	CentraleSuperBoundaries++	CentraleSupelec / INRIA	Iasonas Kokkinos	We exploit our "super-human" boundary detector with a multi-resolution variant of the Deeplab system (LargeFOV, pre-trained on MSCOCO). The boundary information comes in the form of Normalized Cut eigenvectors used in DenseCRF inference and boundary-dependent pairwise terms, used in Graph-Cut inference. This is an updated version of our earlier submission, using more training rounds and a single-shot training algorithm. Details on the system and our "super human" boundary detector are provided in http://arxiv.org/abs/1511.07386	2016-01-13 16:00:02
modified deeplab	Curtin_Qilin	Curtin University	Qilin li	a modified version of deeplab-resnet101	2018-03-09 03:59:28
Dense Context-Aware Network for Semantic Segmentat	DCANet	Institution of Information Science and Electrical Engineering, Zhejiang University	Yifu Liu Chenfeng Xu Zhihong Chen Chao Chen	In contrast to some previous works utilizing the multi-scale context fusion, we propose a novel module, named Dense Context-Aware (DCA) module, to adaptively integrate local detail information with global dependencies through a more efficient way. Driven by the contextual relationship, the DCA module can effectively complete the aggregation of multi-scale information to generate more powerful features. Meanwhile, the proposed DCA module is easy to apply and can be flexibility adjusted inside the existing deep networks. To further capture the long-range contextual information, we specially design two extended structures based on the DCA modules. By taking a progressive mannner under different scales, our networks can make use of context information to improve feature representations for robust segmentation. Due to privacy concerns, we will make the paper and code publicly available at https://github.com/YifuLiuL/DCANet.	2020-01-13 08:36:04
Discriminative Feature Network	DFN	HUST	Changqian Yu	We design a discriminative feature network for semantic segmentation.	2018-01-15 04:32:54
DFPnet for real-time semantic segmentation	DFPnet	Dalian Maritime University	Shuhao Ma	Deep Feature Pyramid net(DFPnet) is the first model that can apply image pyramid technology to real-time semantic segmentation. DFPnet is a flexible model which can be applied to image segmentation, target detection, image classification tasks, and can make corresponding adjustments for different data, facing the network can change different structures, in short, DFPnet adopts open thinking.	2018-08-26 12:09:50
Deep Dual Learning for Semantic Image Segmentation	DIS	Sun Yat-Sen University, The Chinese University of Hong Kong	Ping Luo, Guangrun Wang, Liang Lin, Xiaogang Wang	We present a novel learning setting, which consists of two complementary learning problems that are jointly solved. One predicts labelmaps and tags from images, and the other reconstructs the images using the predicted labelmaps. Given an image with tags only, its labelmap can be inferred by leveraging the images and tags as constraints. The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance. DIS is able to clean tags that have noises.	2017-09-13 18:25:17
Dual-path Class-aware Attention Network	DP-CAN	Tianjin University	Hailong Zhu	Our proposed dual-path class-aware attention network exploit category-level context-free attention mechanism for semantic segmentation. This model is trained with pascal voc 2012 train_aug and finetuned on trainval. Multi-scale inputs and flipping are used in testing.	2019-01-25 12:36:41
Dual-path Class-aware Attention Network	DP-CAN_decoder	Tianjin University	Hailong Zhu	Dual-path Class-aware Attention Network with dual-path refinement module as decoder.	2019-01-26 15:07:22
DP_ResNet_CRF	DP_ResNet_CRF	(1) Beijing University of Posts and Telecommunications (BUPT); (2) Beijing Moshanghua Tech (DressPlus)	Lu Yang(1, 2), Qing Song(1), Bin Liu(2), Yuhang He(2), Zuoxin Li(2), Xiongwei Xia(2)	Our network is based on ResNet-152, dilation convolution \ data augmentation \ pre-train on coco \ multi scale test are used for this submission. We also use densecrf as post-processing to refine object boundaries.	2016-11-10 12:05:10
Dynamic routing encoding network	DREN	Huazhong University of Science and Technology	ZhaoyangHu	On the basis of FCN network, we add dynamic routing to classify the context and add the context to help the network recognise.	2019-03-29 02:04:11
Deep Layer Cascade (LC)	Deep Layer Cascade (LC)	The Chinese University of Hong Kong	Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang	We propose a novel deep layer cascade (LC) method to improve the accuracy and speed of semantic segmentation. Unlike the conventional model cascade (MC) that is composed of multiple independent models, LC treats a single deep model as a cascade of several sub-models. Earlier sub-models are trained to handle easy and confident regions, and they progressively feed-forward harder regions to the next sub-model for processing. Convolutions are only calculated on these regions to reduce computations. The proposed method possesses several advantages. First, LC classifies most of the easy regions in the shallow stage and makes deeper stage focuses on a few hard regions. Such an adaptive and 'difficulty-aware' learning improves segmentation performance. Second, LC accelerates both training and testing of deep network thanks to early decisions in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable framework, allowing joint learning of all sub-models. We evaluate our method on PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and fast speed. Please refer to the paper "Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade" (https://arxiv.org/abs/1704.01344) for further information.	2017-04-06 14:46:45
DeepLab-CRF	DeepLab-CRF	(1) UCLA (2) Google (3) TTIC (4) ECP / INRIA	Liang-Chieh Chen (1) and George Papandreou (2,3) and Iasonas Kokkinos (4) and Kevin Murphy (2) and Alan L. Yuille (1)	This work brings together methods from Deep Convolutional Neural Networks (DCNNs) and probabilistic graphical models for the task of semantic image segmentation. We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Efficient computation is achieved by (i) careful network re-purposing and (ii) a novel application of the �hole� algorithm from the wavelet community, allowing dense computation of neural net responses at 8 frames per second on a modern GPU. See http://arxiv.org/abs/1412.7062 for further information.	2014-12-23 02:29:44
DeepLab-CRF-Attention	DeepLab-CRF-Attention	(1) UCLA (2) Baidu	Liang-Chieh Chen (1) and Yi Yang (2) and Jiang Wang (2) and Wei Xu (2) and Alan L. Yuille (1)	This work is the extension of DeepLab-CRF-COCO-LargeFOV (pretrained on MS-COCO) by further incorporating (1) multi-scale inputs (2) extra supervision and (3) attention model. Further information will be provided in an updated version of http://arxiv.org/abs/1511.03339.	2016-02-03 23:10:45
DeepLab-CRF-Attention-DT	DeepLab-CRF-Attention-DT	(1) UCLA (2) Google	Liang-Chieh Chen (1) and Jonathan T. Barron (2) and George Papandreou (2) and Kevin Murphy (2) and Alan L. Yuille (1)	This work is the extension of DeepLab-CRF-Attention by further incorporating a discriminatively trained Domain Transform. Further information will be provided in an updated version of http://arxiv.org/abs/1511.03328.	2016-02-03 23:13:01
DeepLab-CRF-COCO-LargeFOV	DeepLab-CRF-COCO-LargeFOV	(1) Google (2) UCLA	George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2)	Similar to DeepLab-CRF-COCO-Strong, but the network has a larger field-of-view on the image. Further information will be provided in an updated version of http://arxiv.org/abs/1502.02734.	2015-03-18 04:09:39
DeepLab-CRF-COCO-Strong	DeepLab-CRF-COCO-Strong	(1) Google (2) UCLA	George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2)	Similar to DeepLab-CRF, but network training also included the pixel-level semantic segmentation annotations of the MS-COCO (v. 2014) dataset. See http://arxiv.org/abs/1502.02734 for further information.	2015-02-11 01:44:22
DeepLab-CRF-LargeFOV	DeepLab-CRF-LargeFOV	(1) Google (2) UCLA	George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2)	Similar to DeepLab-CRF, but the network has a larger field-of-view on the image. Note that the model has NOT been fine-tuned on MS-COCO dataset. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062.	2015-03-28 17:22:26
DeepLab-CRF-MSc	DeepLab-CRF-MSc	(1) UCLA (2) Google (3) TTIC (4) ECP / INRIA	Liang-Chieh Chen (1) and George Papandreou (2,3) and Iasonas Kokkinos (4) and Kevin Murphy (2) and Alan L. Yuille (1)	Similar to DeepLab-CRF, except that multiscale features (direct connections from intermediate layers to the classifier) are also exploited. Specifically, we attach to the input image and each of the first four max pooling layers a two-layer MLP (first layer: 128 3x3 convolutional filters, second layer: 128 1x1 convolutional filters) whose score map is concatenated to the VGG final layer score map. The final score map fed into the softmax layer thus consists of 4,096 + 5 * 128 = 4,736 channels.	2014-12-30 02:52:40
DeepLab-MSc-CRF-LargeFOV	DeepLab-MSc-CRF-LargeFOV	(1) Google (2) UCLA	George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2)	Similar to DeepLab-MSc-CRF, but the network has a larger field-of-view on the image. Note that the model has NOT been fine-tuned on MS-COCO dataset. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062.	2015-04-02 06:57:21
DeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint	DeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint	(1) Google (2) UCLA	George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2)	Similar to Deeplab-CRF model, but with feature extraction at multiple network levels and large field of view. We jointly train DeepLab on Pascal VOC 2012 and MS-COCO, sharing the top-level network weights for the common classes, using pixel-level annotation in both datasets. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062 and http://arxiv.org/abs/1502.02734.	2015-04-26 17:48:09
DeepLab_XI	DeepLab_XI	xiaoi research	Bo Zhang, Xiaoke Wang, Guixiong Chen	We extend the deeplab method. Both VOC 2012 and COCO dataset are used for training.	2019-05-07 07:08:00
DeepLabv2-CRF	DeepLabv2-CRF	(1) UCLA (2) Google (3) ECP / INRIA	Liang-Chieh Chen (1,2) and George Papandreou (2) and Iasonas Kokkinos (3) and Kevin Murphy (2) and Alan L. Yuille (1)	DeepLabv2-CRF is based on three main methods. First, we employ convolution with upsampled filters, or �atrous convolution�, as a powerful tool to repurpose ResNet-101 (trained on image classification task) in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within DCNNs. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and fully connected Conditional Random Fields (CRFs). See http://arxiv.org/abs/1606.00915 for further information.	2016-06-06 01:59:20
DeepLabv3	DeepLabv3	Google Inc.	Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam	In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks. We propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. See http://arxiv.org/abs/1706.05587 for further information.	2017-06-20 01:59:26
DeepLabv3+	DeepLabv3+	Google Inc.	Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam	Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on the PASCAL VOC 2012 semantic image segmentation dataset and achieve a state-of-art performance without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow. For details, please refer to https://arxiv.org/abs/1802.02611.	2018-02-09 16:12:04
DeepLabv3+_AASPP	DeepLabv3+_AASPP	Tsinghua University	Jiancheng Li	DeepLabv3+ with Attention Atrous Spatial Pyramid Pooling.	2018-05-22 15:44:09
DeepLabv3+_JFT	DeepLabv3+_JFT	Google Inc.	Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam	DeepLabv3+ by fine-tuning from the model pretrained on JFT-300M dataset. For details, please refer to https://arxiv.org/abs/1802.02611.	2018-02-09 16:16:47
DeepLabv3-JFT	DeepLabv3-JFT	Google Inc.	Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam	DeepLabv3 by fine-tuning from the model pretrained on JFT-300M dataset. See http://arxiv.org/abs/1706.05587 for further information.	2017-08-05 01:16:48
DeepSqueeNet	DeepSqueeNet	Sun Yat-sen University, SYSU	HongPeng Wu,Long Chen, Kai Huang	We propose a method for semantic image segmentation. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1)SmallerDNNsrequirelesscommunicationacrossservers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an au-tonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To pro-vide all of these advantages, we propose a CNN architecture called DeepSqueeNet to semantic image segmentation . It based on SqueezeNet and VGG16. DeepSqueeNet achieves Deeplab(Based on VGG16) accuracy on semantic image segmentation with 10x fewer parameters.	2016-07-20 13:16:16
DeepSqueeNet_CRF	DeepSqueeNet_CRF	Sun Yat-sen University, SYSU	HongPeng Wu,Long Chen, Kai Huang	We propose a method for semantic image segmentation. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1)SmallerDNNsrequirelesscommunicationacrossservers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an au-tonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To pro-vide all of these advantages, we propose a CNN architecture called DeepSqueeNet to semantic image segmentation . It based on SqueezeNet and VGG16. DeepSqueeNet achieves Deeplab(Based on VGG16) accuracy on semantic image segmentation with 10x fewer parameters. we add CRF	2016-07-21 12:47:19
Dual Multi-Scale Manifold Ranking Network	Dual-Multi-Reso-MR	Wuhan University	Mi Zhang, Ye Lv, Min Luo, Jiasi Yi	We proposed a multi-scale network which utilize the dilated and non-dilated convolutional network as a dual. In both networks, a manifold ranking optimization method is embedded to optimize in a single stream jointly, i.e. no need to train the unary and pairwise network separately. And such a feedforward network makes it possible to train in an end-to-end fashion and guarantee a global optimal.	2016-11-03 12:27:49
Expectation-Maximization Attention Networks for S	EMANet152	Peking University	Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu	We formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure.	2019-08-15 16:22:33
ESPNetv2	ESPNetv2	University of Washington	Hannaneh Hajishirzi Mohammad Rastegari Linda Shapiro	We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2, for modeling visual and sequential data. Our network uses group point-wise and depth-wise dilated separable convolutions to learn representations from a large effective receptive field with fewer FLOPs and parameters. The performance of our network is evaluated on three different tasks: (1) object classification, (2) semantic segmentation, and (3) language modeling. Experiments on these tasks, including image classification on the ImageNet and language modeling on the PenTree bank dataset, demonstrate the superior performance of our method over the state-of-the-art methods. Our network has better generalization properties than ShuffleNetv2 when tested on the MSCOCO multi-object classification task and the Cityscapes urban scene semantic segmentation task. Our experiments show that ESPNetv2 is much more power efficient than existing state-of-the-art efficient methods including ShuffleNets and MobileNets. Our code is open-source and available at https://github.com/sacmehta/ESPNetv2	2019-03-23 22:32:58
EfficientNet-L2 + NAS-FPN + Noisy Student	EfficientNet-L2 + NAS-FPN + Noisy Student	Google Inc.	Golnaz Ghiasi, Barret Zoph, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin Cubuk, Quoc V. Le	Single-scale testing and without pre-training on COCO. See https://arxiv.org/abs/2006.06882 for details.	2020-06-15 19:50:31
Efficient_Segmentation	EfficientNet_MSCID_Segmentation	Tianjin University	Xiu Su, Hongyan Xu	EfficientNet with MSCID module for segmentation	2019-08-15 02:00:39
Context Encoding for Semantic Segmentation	EncNet	Rutgers University, Amazon, SenseTime, CUHK	Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal	Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017.	2018-03-15 21:21:01
ExFuse	ExFuse	Fudan University, Megvii Inc.	Zhenli Zhang, Xiangyu Zhang, Chao Peng, Jian Sun	For more details, please refer to https://arxiv.org/abs/1804.03821.	2018-05-22 09:27:16
Dilated FCN using VGG16 and Skip Architectures	FCN-2s_Dilated_VGG16	Center for Cognitive Skill Enhancement, Independent University Bangladesh	Sharif Amit Kamran, Ali Shihab Sabbir	The weights were transferred from VGG16 and then the fully connected layers were converted to convolutional layers. Dilated convolution was used instead of vanila convolution in fc6 layer.The upsampling was done with Stride 2 and the upsampled layers were concatened in steps using four skip architectures. Pascal VOC2012 training data and SBD traning and validation data was used for training in two stages.	2017-07-20 20:23:41
Dilated FCN using VGG19 and Skip Architectures	FCN-2s_Dilated_VGG19	Center for Cognitive Skill Enhancement, Independent University Banlgadesh	Sharif Amit Kamran, Ali Shihab Sabbir	The weights were transferred from VGG19 and then the fully connected layers were converted to convolutional layers. Dilated convolution was used instead of vanilla convolution in fc6 layer.The upsampling was done with Stride 2 and the upsampled layers were concatenated in steps using four skip architectures. Pascal VOC2012 training data and SBD training and validation data was used for training in two stages.	2017-07-11 16:57:52
Fully convolutional net	FCN-8s	UC Berkeley	Jonathan Long, Evan Shelhamer, Trevor Darrell	We apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net.	2014-11-12 09:08:39
Fully convolutional net	FCN-8s-heavy	UC Berkeley	Jonathan Long, Evan Shelhamer, Trevor Darrell	We apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net. The network is learned online with high momentum for better optimization.	2016-02-06 09:57:31
FCN16s-Resnet101	FCN16s-Resnet101	peking university	personal	FCN?output stride 16? based on resnet101	2019-01-26 12:50:15
FCN with Cross-layer Concat and Multi-scale Pred	FCN_CLC_MSP	National Tsing Hua University, Taiwan	Tun-Huai Shih, Chiou-Ting Hsu	We replace the original fc layers in VGG-16 with several conv and pool layers to extract hierarchical features (Pool3-5 and additional pool6-8). We then use pool3-8 to generate multi-scale predictions, and aggregate them to derive the dense prediction result. To jointly exploit the information from lower- and higher-level layers when making prediction, we adopt cross-layer concatenation to combine poolx features (lower-level) with prediction result of coarser stream (high-level). This makes the predictions of finer streams more robust. We do not adopt any pre- or post- processing steps. The number of parameters is about 36M, while the original FCN is 134M. We train all prediction streams at the same time using VOC additional annotated images (10582 in total), and it takes less than one day to train our FCN model on a single GTX Titan X GPU.	2016-07-01 04:27:14
FDNet_16s	FDNet_16s	HongKong University of Science and Technology, altizure.com	Mingmin Zhen, Jinglu Wang, Siyu Zhu, Runze Zhang, Shiwei Li, Tian Fang, Long Quan	A fully dense neural network with encoder-decoder structure is proposed that we abbreviate as FDNet. For each stage in the decoder module, feature maps of all the previous blocks are adaptively aggregated to feedforward as input.	2018-03-22 08:52:44
Weaky sup. segmentation by region scores' pooling	FER_WSSS_REGION_SCORE_POOL	University of Zagreb	Josip Krapac Sinisa Segvic	We address the problem of semantic segmentation of objects in weakly supervised setting, when only image-wide labels are available. We describe an image with a set of pre-trained convolutional features (from layer conv5.4 of 19-layer VGG-E network) and embed this set into a Fisher vector (64 component GMM, diagonal covariance for components, normalization only with inverse of Fisher matrix). We learn a linear classifier (logistic regression), apply the learned classifier on the set of all image regions (efficiently, using integral images), and propagate region scores back to the pixels. Compared to the alternatives the proposed method is simple, fast in inference, and especially in training. The details are described in the conference paper Krapac, Segvic: "Weakly-supervised semantic segmentation by redistributing region scores back to the pixels", GCPR 2016	2016-06-14 15:02:23
FSSI300	FSSI300	Beihang University	Zuoxin Li	FSSI300 Res50	2018-06-21 11:27:57
Learning Feature Pyramids	Feature_Pyramids	Sun Yat-Sen University, The Chinese University of Hong Kong	Guangrun Wang, Wei Yang	This model predicts segmentation via learning feature pyramids (LFP). LFP is originally used for human pose machine, described in the paper "Learning Feature Pyramids for Human Pose Estimation" (https://arxiv.org/abs/1708.01101). We extend it to the semantic image segmentation. The code and model are available at https://github.com/wanggrun/Learning-Feature-Pyramids	2018-06-06 03:55:27
Gluon DeepLabV3 152	Gluon DeepLabV3 152	Amazon AI	Hang Zhang et al.	https://gluon-cv.mxnet.io	2018-10-03 18:18:27
GluonCV DeepLabV3	GluonCV DeepLabV3	Amazon	Hang Zhang et al.	See details in GluonCV https://gluon-cv.mxnet.io/	2018-09-07 00:48:31
GluonCV FCN	GluonCV FCN	Amazon	Hang Zhang et al.	Please see details in GluonCV https://gluon-cv.mxnet.io/	2018-09-07 01:11:12
GluonCV PSP	GluonCV PSP	Amazon	Hang Zhang et al.	Please see details in GluonCV https://gluon-cv.mxnet.io/	2018-09-07 00:51:53
Hierarchical Parsing Net	HPN	UESTC	Hengcan Shi	HPN leverages global image semantic information and context among multiple objects to boost semantic segmentation.	2017-12-13 02:30:24
Hamburger	HamNet_w/o_COCO	Peking University	Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, Zhouchen Lin	Paper: Is Attention Better Than Matrix Decomposition? Accepted to ICLR 2021. Link: https://openreview.net/pdf?id=1FvkSpWosOl Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank completion problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding.	2021-01-25 07:03:38
HikSeg_COCO	HikSeg_COCO	Hikvision Research Institute	Haiming Sun, Di Xie, Shiliang Pu	We begin with DilatedNet, and add a module which multi-scale features are combined step-wise. The network is able to learn to put different weights to features of different scales. This submission is first trained on COCO training set and validation set, then fine-tuned on PASCAL training set.	2016-10-02 09:16:41
Hypercolumn	Hypercolumn	UC Berkeley	Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik	Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the con- trary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activa- tions of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmenta- tion, where we improve state-of-the-art from 49.7 mean APr to 60.0, keypoint localization, where we get a 3.3 point boost over and part labeling, where we show a 6.6 point gain over a strong baseline.	2015-04-09 02:01:36
Learning Object Interactions and Descriptions for	IDW-CNN	Sun Yat-sen University; The Chinese University of Hong Kong	Guangrun Wang, Ping Luo, Liang Lin, Xiaogang Wang	This work increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset. Unlike previous image captioning datasets, where captions were manually and densely annotated, images and their descriptions in IDW are automatically downloaded from Internet without any manual cleaning and refinement. An IDW-CNN is proposed to jointly train IDW and existing image segmentation dataset such as Pascal VOC 2012 (VOC).	2017-06-30 00:11:24
KSAC(X-65) with hard image	KSAC-H	The University of Technology, Sydney	Ye Huang	KSAC (Xception-65) + hard image bootstrap in OS = 16	2019-10-26 14:19:05
Ladder DenseNet-161	LDN-161	University of Zagreb	Ivan Kreso, Josip Krapac, Sinisa Segvic	Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images (journal submission). Trained on train+val+augmented data. DenseNet-161 backbone.	2019-04-18 19:03:42
Laplacian reconstruction and refinement	LRR_4x_COCO	University of California Irvine	Golnaz Ghiasi, Charless C. Fowlkes	We introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission is based on VGG-16 and it was trained on augmented PASCAL VOC and MS-COCO data. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation (http://arxiv.org/abs/1605.02264).	2016-06-16 06:19:08
Laplacian reconstruction and refinement	LRR_4x_ResNet_COCO	University of California Irvine	Golnaz Ghiasi Charless C. Fowlkes	We introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission is based on ResNet-101 and it was trained on augmented PASCAL VOC and MS-COCO data. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation (http://arxiv.org/abs/1605.02264).	2016-07-18 19:07:32
Laplacian reconstruction and refinement	LRR_4x_de_pyramid_VOC	University of California Irvine	Charless C. Fowlkes Golnaz Ghiasi	We introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission was trained on augmented PASCAL VOC. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation	2016-06-07 03:55:11
CVRSUAD submission, paper ID 21	Ladder_DenseNet	UNIZG-FER	ivan.kreso@fer.hr	CVRSUAD submission paper ID 21: Ladder-style DenseNets for Semantic Segmentation of Large Natural Images	2017-07-25 17:42:21
Large_Kernel_Matters	Large_Kernel_Matters	Tsinghua University	Peng Chao, Yu Gang, Zhang Xiangyu	We use the large kernel to generate the feature map and score map, resnet101 is applied with COCO, SBD datasets. No CRF or similar post processing methods are employed! No Multiscale	2017-03-16 01:58:16
Deep Gaussian CRF	MERL_DEEP_GCRF	Mitsubishi Electric Research Laboratories	Raviteja Vemulapalli Oncel Tuzel	We use two deep networks, one for generating unary potentials and the other for generating pairwise potentials. Then we use Gaussian CRF model for structured prediction.	2015-10-17 14:55:31
Gaussian CRF on top of Deeplab CNN	MERL_UMD_Deep_GCRF_COCO	University of Maryland, College Park	Raviteja Vemulapalli (UMD) Oncel Tuzel (MERL) Ming-Yu Liu (MERL) Rama Chellappa (UMD)	We use two deep networks, one for generating unary potentials and the other for generating pairwise potentials. Then we use a Gaussian CRF model for structured prediction. The entire model is trained end-to-end.	2016-01-15 05:23:48
MSCI for Semantic Segmentation	MSCI	Shenzhen University	Di Lin; Yuanfeng Ji	We propose a novel scheme for aggregating features from different scales, which we refer to as Multi-Scale Context Intertwining (MSCI). Please see our paper http://vcc.szu.edu.cn/Di_Lin/papers/MSCI_eccv2018.pdf	2018-07-08 04:07:31
MSRA_BoxSup	MSRA_BoxSup	Microsoft Research Asia	Jifeng Dai, Kaiming He, Jian Sun	This is an implementation of "BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation". We train a BoxSup model using the union set of VOC 2007 boxes, COCO boxes, and the augmented VOC 2012 training set. See http://arxiv.org/abs/1503.01640 for details.	2015-05-18 09:42:54
Box-Supervision	MSRA_BoxSup	Microsoft Research Asia	Jifeng Dai, Kaiming He, Jian Sun	BoxSup makes use of bounding box annotations to supervise convolutional networks for semantic segmentation. From these boxes, we estimate segmentation masks with the help of region proposals. These masks are used to update the convolutional network, which is in turn fed back to mask estimation. This procedure is iterated. This result is achieved by semi-supervised training on the segmentation masks from PASCAL VOC and a large amount of bounding boxes from Microsoft COCO. See http://arxiv.org/abs/1503.01640 for details.	2015-02-10 09:35:40
Convolutional Feature Masking	MSRA_CFM	Microsoft Research Asia	Jifeng Dai, Kaiming He, Jian Sun	The method exploits shape information via ``masking" convolutional features. The proposal segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The CNN features of segments are directly masked out from these maps and used to train classifiers for recognition. Competitive accuracy and compelling computational speed are demonstrated by the proposed method. We achieve this result by utilizing segment proposal generated by Multi-scale Combinatorial Grouping (MCG), and initializing network parameters from the VGG 16-layer net. See http://arxiv.org/abs/1412.1283 for details.	2014-12-17 02:56:52
Multi-Scale Residual Network for Segmentation	MSRSegNet-UW	University of Washington	Linda Shapiro, Hannaneh Hajishirzi	Using the prior work, we create a custom network that is fast as well as accurate. Our network runs at 21 fps (full resolution) while at 60 fps at a resolution of 224 x224. At low resolution, our network is as accurate as FCN-8s. More details are here: https://arxiv.org/pdf/1711.08040.pdf	2017-11-23 01:26:37
MasksegNet	MasksegNet	Kyunghee university	masksegnet	MasksegNet	2019-05-16 12:20:50
Multi-Task Learning for Human Pose Estimation	Metu_Unified_Net	Middle East Technical University	Salih Karagoz, Muhammed Kocabas, Emre Akbas	Multi-Task Learning for Multi-Person Pose Estimation, Human Semantic Segmentation and Human Detection. The model works simultaneously. We just only trained with coco-dataset. No additional data has used.	2018-03-10 12:39:37
Multipath-RefineNet	Multipath-RefineNet	The University of Adelaide; ACRV;	Guosheng Lin; Anton Milan; Chunhua Shen; Ian Reid;	Please refer to our technical report for details: "RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation" (https://arxiv.org/abs/1611.06612). Our source code is available at: https://github.com/guosheng/refinenet	2017-01-17 18:03:57
Unified Object Detection and Semantic Segmentation	NUS_UDS	NUS	Jian Dong, Qiang Chen, Shuicheng Yan, Alan Yuille	Motivated by the complementary effect observed from the typical failure cases of object detection and semantic segmentation, we propose a uni?ed framework for joint object detection and semantic segmentation [1]. By enforcing the consistency between final detection and segmentation results, our unified framework can effectively leverage the advantages of leading techniques for these two tasks. Furthermore, both local and global context information are integrated into the framework to better distinguish the ambiguous samples. By jointly optimizing the model parameters for all the components, the relative importance of different component is automatically learned for each category to guarantee the overall performance. [1] Jian Dong, Qiang Chen, Shuicheng Yan, Alan Yuille: Towards Unified Object Detection and Semantic Segmentation. ECCV 2014	2014-10-29 16:07:10
Joint a network to guided and masking	OBP-HJLCN	national central university	Jia-Ching Wang , Chien-Yao Wang, Jyun-Hong Li	We proposed a hierarchical joint guided networks which has ability to predict objects greater and finer. We also proposed a novel way to guided segmentation by object and boundary.	2016-09-13 15:21:45
Oxford_TVG_CRF_RNN_COCO	Oxford_TVG_CRF_RNN_COCO	[1] University of Oxford / [2] Baidu IDL	Shuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]	We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, Berkeley augmented data and a subset of COCO 2014 train data. More details will be available in the paper http://arxiv.org/abs/1502.03240.	2015-04-22 11:26:57
Oxford_TVG_CRF_RNN_VOC	Oxford_TVG_CRF_RNN_VOC	[1] University of Oxford / [2] Baidu IDL	Shuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1]	We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, and Berkeley augmented data (COCO dataset was not used). More details will be available in the paper http://arxiv.org/abs/1502.03240.	2015-04-22 10:24:43
Higher Order CRF in CNN	Oxford_TVG_HO_CRF	University of Oxford	Anurag Arnab Sadeep Jayasumana Shuai Zheng Philip Torr	We integrate a conditional random field with higher order potentials into a deep neural network. Our higher order potentials are based on object detector outputs and superpixel oversegmentation, and formulated such that their corresponding mean-field updates are differentiable. For further details, please refer to http://arxiv.org/abs/1511.08119	2016-03-16 21:12:47
PAN	PAN	BIT, Megvii Inc.	Hanchao Li	Pyramid Attention Network for Semantic Segmentation; (without COCO pretrain)	2018-07-04 13:10:20
POSTECH_DeconvNet_CRF_VOC	POSTECH_DeconvNet_CRF_VOC	POSTECH (Pohang University of Science and Technology)	Hyeonwoo Noh, Seunghoon Hong, Bohyung Han.	We propose a novel semantic segmentation algorithm by learning a deconvolution network. Our deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. The trained network is applied to each proposal in an input image, and the final semantic segmentation map is constructed by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks; our segmentation method typically identifies more detailed structures and handles objects in multiple scales more naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset without external training data. See http://arxiv.org/abs/1505.04366 for details.	2015-08-18 18:42:18
POSTECH_EDeconvNet_CRF_VOC	POSTECH_EDeconvNet_CRF_VOC	POSTECH(Pohang University of Science and Technology)	Hyeonwoo Noh, Seunghoon Hong, Bohyung Han	We propose a novel semantic segmentation algorithm by learning a deconvolution network. Our deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. The trained network is applied to each proposal in an input image, and the final semantic segmentation map is constructed by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks; our segmentation method typically identifies more detailed structures and handles objects in multiple scales more naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset without external training data.	2015-04-22 21:33:03
PSPNet	PSPNet	CUHK, SenseTime	Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia	Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU score as 85.4% on PASCAL VOC 2012 and 80.2% on Cityscapes. https://arxiv.org/abs/1612.01105	2016-12-06 02:22:13
Encoder-decoder with FCN	PSP_flow	The University of Northwestern Polytechnical University	Yanhua Zhang	Spatial pyramid structure and a feature alignment.	2021-07-13 14:21:30
Residual Forest classifier with FCN features	RRF-4s	Monash University	Yan Zuo, Tom Drummond	We replace the solver component of FCN with a Random Residual Forest (RRF) Classifier and treat FCN as a generic feature extractor to train the RRF classifier	2016-11-30 23:31:43
Tensor low-rank Reconstruction	RecoNet152_coco	Tencent	Please contact with wanli chen chenwl@mail.sustech.edu.cn	Please contact with wanli chen chenwl@mail.sustech.edu.cn	2019-10-26 04:39:21
Res2Net:Multi-scale Backbone Architecture	Res2Net	Nankai University	Shanghua Gao, Ming-Ming Cheng	Res2Net: A New Multi-scale Backbone Architecture (TPAMI20) We propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. source code: https://github.com/Res2Net	2020-02-22 05:29:02
ResNet-38 with COCO	ResNet-38_COCO	The University of Adelaide	Zifeng Wu, Chunhua Shen, Anton van den Hengel	Pre-trained with COCO, and tested with multiple scales. See our report https://arxiv.org/abs/1611.10080 for details.	2017-01-22 04:44:14
ResNet-38 Multi-scale	ResNet-38_MS	The University of Adelaide	Zifeng Wu, Chunhua Shen, Anton van den Hengel	Single model; multi-scale testing; NO COCO; NO CRF-based post-processing. For more details, refer to our report https://arxiv.org/abs/1611.10080 and code https://github.com/itijyou/ademxapp.	2016-12-09 12:19:24
ResNet_DUC_HDC_TuSimple	ResNet_DUC_HDC	UC San Diego, CMU, UIUC, TuSimple	Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, Garrison Cottrell	We improve pixel-wise semantic segmentation by manipulating convolution-related operations: 1) we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information; 2) we implement hybrid dilated convolution (HDC) to aggregate global information and alleviate what we call the "gridding issue" caused by the standard dilated convolution operation. Current submission is single model and single scale testing. Pretrained models: https://goo.gl/DQMeun Paper link: https://arxiv.org/abs/1702.08502	2017-03-01 20:22:41
ResSegNet	ResSegNet	SCUT-CIVIC	Mengxi Li	-	2018-05-28 04:39:01
SDS	SDS	UC Berkeley	Bharath Hariharan Pablo Arbelaez Ross Girshick Jitendra Malik	We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [1]), introducing a novel architecture tailored for SDS. We then use category-specific, top-down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 4 point boost (8% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work.	2014-07-21 22:46:22
SRC-B-MachineLearningLab	SRC-B-MachineLearningLab	Samsung R&D Institue China - Beijing, Machine Learning Lab	Jianlong Yuan, Shu Wang, Wei Zhao, Hanchao Jia, Zhenbo Luo	The model is pretrained on ImageNet, and fineturned on COCO VOC SBD. The result is tested by multi scale and filp. The paper is in preparing.	2018-04-19 03:08:39
Score Map Pyramid Net	Score Map Pyramid Net	Dalian Maritime University	Shuhao Ma	Our method is fast	2018-07-06 13:27:16
SegModel	SegModel	Peking Univerisity	Falong Shen, Peking University	Deep fully convolutional networks with conditional random field. Trained on MSCOCO trainval set and Pascal VOC 12 train set.	2016-08-23 04:04:21
SegNeXt	SegNeXt	Tsinghua University and Nankai University	Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.	SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation (NeurIPS 2022). A simple CNN-based method for semantic segmentation.	2022-09-19 11:12:10
SegNet	SegNet	University of Cambridge	Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla	SegNet is a memory efficient real time deep convolutional encoder-decoder architecture. For more information, please see our publications and web demo at: http://mi.eng.cam.ac.uk/projects/segnet/	2015-11-10 09:48:12
asfcas	SepaNet	dqwdaw	asfcae	gvsfdvc	2019-10-25 16:30:20
SpDConv2	SpDConv2	SpDConv2	SpDConv2	SpDConv2	2021-01-06 03:14:39
Tree-structured Kronecker Convolutional Networks	TKCNet	Institute of Computing Technology, Chinese Academy of Sciences	Tianyi Wu, Sheng Tang, Rui Zhang, Linghui Li, Yongdong Zhang	Most existing semantic segmentation methods employ atrous convolution to enlarge the receptive field of filters, but neglect important local contextual information. To tackle this issue, we firstly propose a novel Kronecker convolution which adopts Kronecker product to expand its kernel for taking into account the feature vectors neglected by atrous convolutions. Therefore, it can capture local contextual information and enlarge the field of view of filters simultaneously without introducing extra parameters. Secondly, we propose Tree-structured Feature Aggregation (TFA) module which follows a recursive rule to expand and forms a hierarchical structure. Thus, it can naturally learn representations of multi-scale objects and encode hierarchical contextual information in complex scenes. Finally, we design Tree-structured Kronecker Convolutional Networks (TKCN) that employs Kronecker convolution and TFA module. Extensive experiments on three datasets, PASCAL VOC 2012, PASCAL-Context and Cityscapes, verify the effectiveness of our proposed approach. Created on	2018-04-20 13:04:57
Diverse M-Best with discriminative reranking	TTIC-divmbest-rerank	(1) Toyota Technological Institute at Chicago, (2) Virginia Tech	Payman Yadollahpour (1), Dhruv Batra (1,2), Greg Shakhnarovich (1)	We generate a set of M=10 full image segmentations using Diverse M-Best algorithm from [BYGS'12], applied to inference in the O2P model (Carreira et al., 2012). Then we discriminatively train a reranker based on a novel set of features. The learning of the reranker uses relative loss, with the objective to minimize gap with the oracle (the hindsight-best of the M segmentations), and relies on slack-rescaling structural SVM. The details are described in [YBS'13]. References: [BYGS'12] Batra, Yadollahpour, Guzman, Shakhnarovich, ECCV 2012. [YBS'13] Yadollahpour, Batra, Shakhnarovich, CVPR 2013.	2012-11-15 04:03:01
Feedforward segmentation with zoom-out features	TTI_zoomout	TTI-Chicago	Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich	Our method uses a feedforward network to directly label superpixels. For each superpixel we use features extracted from a nested set of "zoom-out" regions, from purely local to image-level.	2014-11-17 04:57:49
Feedforward segmentation with zoom-out features	TTI_zoomout_16	TTI-Chicago	Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich	Same as before, except using VGG 16-layer network instead of VGG CNN-S network. Fine-tuning on VOC-2012 was not performed. See http://arxiv.org/abs/1412.0774 for details.	2014-11-24 08:54:05
Feedforward semantic segmentation with zoom-out features	TTI_zoomout_v2	TTI-Chicago	Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich	Similar to TTI_zoomout_16, except the way that we set the number and scope of zoom-out levels. In this version, zoom-out levels correspond to receptive field sizes of different layers in a convolutional neural network. Our model is trained only on VOC-2012. Details are provided in our CVPR 2015 paper available at http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mostajabi_Feedforward_Semantic_Segmentation_2015_CVPR_paper.pdf.	2015-03-30 18:40:04
Global Deconvolutional Network with CRF	UNIST_GDN_CRF	Ulsan National Institute of Science and Technology (UNIST)	Vladimir Nekrasov, Janghoon Ju, Jaesik Choi	We propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline DeepLab-CRF. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.	2016-07-29 07:23:03
Global Deconvolutional Network with CRF	UNIST_GDN_CRF_ENS	Ulsan National Institute of Science and Technology (UNIST)	Vladimir Nekrasov, Janghoon Ju, Jaesik Choi	We propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline DeepLab-CRF. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.	2016-07-29 07:25:56
Global Deconvolutional Network	UNIST_GDN_FCN	Ulsan National Institute of Science and Technology (UNIST)	Vladimir Nekrasov, Janghoon Ju, Jaesik Choi	We propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline FCN-32s. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.	2016-07-27 01:39:17
Global Deconvolutional Network	UNIST_GDN_FCN_FC	Ulsan National Institute of Science and Technology (UNIST)	Vladimir Nekrasov, Janghoon Ju, Jaesik Choi	We propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Besides that, we append a fully-connected layer after the down-sampled image to refine current predictions. Our model shows superior performance over baseline FCN-32s and even outperforms more powerful multi-scale variant. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930.	2016-07-27 01:49:02
Fully convolutional neural net using VGG19	VGG19_FCN	-	Sharif Amit Kamran , Md. Asif Bin Khaled , Sabit Bin Kabir , Dr. Hasan Muhammad , Moin Mostakim	We use VGG-19 classification neural net and then make it fully convolulational. Moreover, we use skip architectures by concatenating upsampled pool 1 to 4 with the score layer to get finer features. Training was done on two stages, first on Pascal VOC training dataset , secondly on both SBD training plus validation datasets.	2017-04-06 23:22:53
VPNeXt	VPNeXt	UESTC	Ye Huang	VPNeXt	2025-02-10 11:30:18
Weakly Supervised Semantic Segmentation	WeakTr_CRF_SAM_M2F_SwinL	University of Information Technology	Huynh Yen Nhi, Nguyen Tran Khuong An	https://arxiv.org/abs/2304.01184	2025-06-08 17:49:52
CNN segmentation based on manifold learning	Weak_manifold_CNN	University of Central Florida	Marzieh Edraki	CNN manifold learning for segmentation	2016-11-11 23:34:20
FLATTENET	XC-FLATTENET	Sichuan University, Chengdu, China	Xin Cai	It is well-known that the reduced feature resolution due to repeated subsampling operations poses a serious challenge to Fully Convolutional Network (FCN) based models. In contrast to the commonly-used strategies, such as dilated convolution and encoder-decoder structure, we introduce a novel Flattening Module to produce high-resolution predictions without either removing any subsampling operations or building a complicated decoder module. https://ieeexplore.ieee.org/document/8932465/metrics#metrics	2020-01-17 07:46:18
new ConcatASPP	Xception65_ConcatASPP_Decoder	Tianjin University and Nankai University	Xiu Su, Hongyan Xu, Hong Kang	a new ASPP method	2019-07-26 02:23:38
deeplabv3+ resnet50	deeplabv3+ resnet50	Northwestern Polytechnical University	Liying Gao, Peng Wang	deeplabv3+ resnet50	2018-12-11 13:36:13
deeplabv3+ resnet50	deeplabv3+ resnet50	Northwestern Polytechnical University	Liying Gao, Peng Wang	weakly supervised segmentation, replace FCN by deeplabv3+	2018-12-11 13:32:23
deeplabv3+ vgg16	deeplabv3+ vgg16	Northwestern Polytechnical University	Liying Gao, Peng Wang	deeplabv3+ vgg16 63.69 val	2018-12-12 08:46:27
deeplabv3+ vgg16	deeplabv3+ vgg16	Northwestern Polytechnical University	Liying Gao, Peng Wang	deeplabv3+ vgg16 63.69 val	2018-12-12 07:54:27
dsanet	dsanet	dsanet	dsanet	dsanet	2019-11-23 03:51:33
dscnn	dscnn	jw	jw	dscnn	2018-05-25 19:49:13
fdsf	fdsf	fsdf	fsdf	fsdf	2018-11-22 01:07:09
high revolution network baseline	hrnet_baseline	UCAS	xiaoyang	In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise.	2020-01-26 05:12:51
MFF Network	multi-scale feature fusion network	shenzhen university	Sijun Dong, Di Lin	we proposed a novel network to make full use of context information for semantic segmentation.	2018-11-26 13:04:53
fast laddernet	resnet 101 + fast laddernet	Yale University	Juntang Zhuang	resnet 101 + fast laddernet	2018-10-29 19:53:41
resnet38	resnet38_deeplab	Tsinghua University	Chen Qian	waiting for submission	2021-11-06 01:49:46
Semi-supervised seg with weak masks	weak_semi_seg	Xiamen University	Lin Cheng	Semi-supervised segmentation with weak masks. We use 1.4k strong masks and 9k weak masks with class labels.	2021-07-03 08:34:39
mixup	xing	china	123	123	2020-07-10 10:36:10

PASCAL VOC Challenge performance evaluation and download server

Segmentation Results: VOC2012 BETA

Competition "comp6" (train on own data)

Average Precision (AP %)

Abbreviations

Segmentation Results: VOC2012 ^BETA