PASCAL VOC Challenge performance evaluation and download server |
|
Home | Leaderboard |
mean | aero plane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | dining table | dog | horse | motor bike | person | potted plant | sheep | sofa | train | tv/ monitor | submission date | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
** SegNeXt ** [?] | 90.6 | 98.3 | 85.0 | 97.6 | 88.3 | 91.3 | 97.5 | 91.4 | 98.3 | 60.4 | 96.7 | 85.0 | 95.7 | 98.2 | 94.2 | 92.7 | 82.5 | 97.3 | 77.7 | 93.1 | 84.3 | 19-Sep-2022 | |
** AGV BANA RES NAL ** [?] | 71.7 | 81.6 | 36.6 | 86.2 | 58.7 | 76.8 | 78.6 | 82.0 | 87.3 | 34.4 | 79.3 | 63.8 | 82.6 | 79.7 | 78.5 | 79.8 | 56.5 | 84.5 | 55.3 | 70.7 | 60.3 | 31-Jan-2022 | |
** AGV BANA VGG NAL attempt 5 ** [?] | 65.6 | 77.1 | 31.6 | 72.1 | 54.8 | 63.8 | 82.8 | 76.0 | 82.0 | 26.6 | 65.0 | 58.5 | 75.5 | 64.2 | 75.8 | 70.9 | 54.0 | 76.4 | 44.6 | 74.9 | 60.1 | 30-Jan-2022 | |
** resnet38_deeplab ** [?] | 71.4 | 89.1 | 37.3 | 84.6 | 56.4 | 68.2 | 90.8 | 83.7 | 89.0 | 28.4 | 84.7 | 47.0 | 84.7 | 87.1 | 80.2 | 77.1 | 49.3 | 87.0 | 49.8 | 75.6 | 56.2 | 06-Nov-2021 | |
** PSP_flow ** [?] | 79.4 | 86.2 | 44.2 | 93.4 | 72.1 | 75.8 | 93.7 | 91.2 | 95.0 | 38.6 | 86.7 | 63.9 | 89.0 | 89.4 | 90.4 | 88.4 | 64.4 | 91.8 | 60.9 | 82.6 | 73.8 | 13-Jul-2021 | |
** weak_semi_seg ** [?] | 78.6 | 92.2 | 62.0 | 90.0 | 64.8 | 77.1 | 93.3 | 84.8 | 91.4 | 31.4 | 89.1 | 73.3 | 88.0 | 87.7 | 86.1 | 84.5 | 65.4 | 85.4 | 56.9 | 85.1 | 67.8 | 03-Jul-2021 | |
** HamNet_w/o_COCO ** [?] | 85.9 | 96.8 | 74.6 | 96.5 | 75.3 | 79.6 | 97.4 | 93.4 | 97.3 | 42.5 | 94.0 | 76.1 | 95.3 | 96.3 | 91.0 | 91.0 | 78.4 | 93.2 | 68.7 | 90.0 | 80.7 | 25-Jan-2021 | |
** SpDConv2 ** [?] | 88.1 | 96.9 | 79.7 | 96.8 | 80.2 | 87.8 | 98.0 | 92.3 | 96.0 | 57.2 | 95.8 | 82.1 | 92.3 | 97.3 | 93.6 | 93.0 | 71.4 | 92.3 | 75.8 | 90.7 | 83.8 | 06-Jan-2021 | |
** CTNet ** [?] | 85.3 | 96.1 | 75.9 | 96.8 | 78.0 | 82.4 | 95.3 | 92.3 | 96.7 | 42.0 | 93.8 | 71.2 | 93.8 | 95.0 | 90.5 | 90.6 | 77.9 | 95.2 | 62.9 | 89.5 | 78.4 | 29-Oct-2020 | |
** xing ** [?] | 81.5 | 95.5 | 42.1 | 94.4 | 75.3 | 77.9 | 96.0 | 92.4 | 94.6 | 42.4 | 94.8 | 59.1 | 92.3 | 95.1 | 88.8 | 88.9 | 68.8 | 94.7 | 56.5 | 88.9 | 77.0 | 10-Jul-2020 | |
** EfficientNet-L2 + NAS-FPN + Noisy Student ** [?] | 90.5 | 98.0 | 84.8 | 89.6 | 88.2 | 91.0 | 98.3 | 93.0 | 98.5 | 57.5 | 98.4 | 81.8 | 98.4 | 98.0 | 95.8 | 93.2 | 83.2 | 97.8 | 75.0 | 91.8 | 90.0 | 15-Jun-2020 | |
** CaCNet ** [?] | 87.5 | 97.1 | 80.3 | 96.1 | 79.7 | 86.7 | 97.2 | 93.8 | 96.4 | 45.5 | 95.0 | 82.1 | 92.7 | 97.0 | 94.6 | 91.8 | 78.2 | 95.4 | 65.7 | 92.3 | 82.2 | 29-May-2020 | |
** A new feature fusion method: FillIn ** [?] | 88.0 | 97.1 | 80.8 | 96.7 | 77.6 | 89.2 | 97.4 | 92.2 | 96.9 | 58.3 | 94.3 | 79.4 | 93.1 | 97.3 | 94.4 | 93.2 | 73.6 | 93.0 | 72.6 | 89.7 | 83.4 | 25-May-2020 | |
** Res2Net ** [?] | 85.3 | 96.1 | 77.6 | 96.1 | 77.3 | 84.5 | 96.7 | 92.5 | 95.0 | 40.5 | 91.9 | 78.3 | 92.2 | 93.7 | 92.7 | 89.6 | 77.6 | 93.7 | 63.5 | 87.3 | 78.6 | 22-Feb-2020 | |
** hrnet_baseline ** [?] | 79.3 | 93.8 | 43.5 | 84.8 | 63.9 | 82.4 | 92.8 | 91.0 | 93.8 | 45.6 | 88.0 | 61.4 | 90.0 | 90.2 | 88.0 | 88.1 | 66.8 | 91.1 | 53.3 | 87.1 | 74.4 | 26-Jan-2020 | |
** XC-FLATTENET ** [?] | 85.7 | 96.5 | 79.2 | 95.5 | 75.3 | 84.3 | 95.9 | 91.3 | 93.9 | 45.1 | 95.9 | 79.2 | 88.8 | 96.7 | 91.6 | 91.1 | 75.7 | 94.0 | 62.8 | 87.7 | 82.6 | 17-Jan-2020 | |
** DCANet ** [?] | 84.4 | 96.0 | 44.8 | 95.1 | 75.1 | 85.8 | 97.2 | 91.0 | 95.0 | 47.5 | 94.5 | 75.8 | 93.9 | 96.0 | 92.2 | 89.7 | 74.5 | 95.4 | 66.3 | 91.1 | 79.8 | 13-Jan-2020 | |
** dsanet ** [?] | 83.0 | 93.5 | 66.0 | 95.3 | 77.4 | 82.4 | 95.4 | 91.8 | 95.4 | 36.1 | 92.0 | 74.2 | 92.0 | 93.3 | 90.3 | 88.4 | 73.8 | 92.3 | 57.5 | 87.0 | 73.5 | 23-Nov-2019 | |
** KSAC-H ** [?] | 88.1 | 97.2 | 79.9 | 96.3 | 76.5 | 86.5 | 97.5 | 94.5 | 96.9 | 54.8 | 95.3 | 81.4 | 93.7 | 97.2 | 94.0 | 92.8 | 77.3 | 94.4 | 73.5 | 91.1 | 83.4 | 26-Oct-2019 | |
** RecoNet152_coco ** [?] | 89.0 | 97.3 | 80.4 | 96.5 | 83.8 | 89.5 | 97.6 | 95.4 | 97.7 | 50.1 | 96.8 | 82.6 | 95.1 | 97.7 | 95.1 | 92.6 | 80.2 | 95.2 | 71.7 | 92.1 | 83.8 | 26-Oct-2019 | |
** SepaNet ** [?] | 88.3 | 97.2 | 80.2 | 96.2 | 80.0 | 89.2 | 97.3 | 94.7 | 97.7 | 48.6 | 95.0 | 81.6 | 95.2 | 97.5 | 95.1 | 92.7 | 79.5 | 95.4 | 68.8 | 90.9 | 83.4 | 25-Oct-2019 | |
** EfficientNet_MSCID_Segmentation ** [?] | 78.9 | 92.1 | 42.1 | 91.6 | 73.8 | 80.7 | 93.8 | 88.1 | 91.6 | 38.7 | 84.3 | 68.5 | 90.3 | 88.7 | 86.3 | 84.8 | 64.7 | 87.3 | 58.6 | 85.3 | 71.4 | 15-Aug-2019 | |
** EMANet152 ** [?] | 88.2 | 96.8 | 79.4 | 96.0 | 83.6 | 88.1 | 97.1 | 95.0 | 96.6 | 49.4 | 95.4 | 77.8 | 94.8 | 96.8 | 95.1 | 92.0 | 79.3 | 95.9 | 68.5 | 91.7 | 85.6 | 15-Aug-2019 | |
** Xception65_ConcatASPP_Decoder ** [?] | 83.5 | 94.3 | 44.9 | 92.8 | 77.4 | 85.5 | 96.7 | 91.1 | 94.6 | 51.0 | 91.9 | 71.8 | 91.2 | 95.3 | 92.8 | 90.5 | 69.6 | 91.7 | 66.3 | 88.3 | 80.7 | 26-Jul-2019 | |
** CFNet ** [?] | 87.2 | 96.7 | 79.7 | 94.3 | 78.4 | 83.0 | 97.7 | 91.6 | 96.7 | 50.1 | 95.3 | 79.6 | 93.6 | 97.2 | 94.2 | 91.7 | 78.4 | 95.4 | 69.6 | 90.0 | 81.4 | 12-Jun-2019 | |
** APDN ** [?] | 86.4 | 94.5 | 65.4 | 94.2 | 82.7 | 88.1 | 95.7 | 91.7 | 95.7 | 45.5 | 94.3 | 82.8 | 93.8 | 94.8 | 92.4 | 91.7 | 73.7 | 93.4 | 72.8 | 91.9 | 82.4 | 28-May-2019 | |
** MasksegNet ** [?] | 81.0 | 95.3 | 43.9 | 93.4 | 72.9 | 80.5 | 91.1 | 86.1 | 91.9 | 44.2 | 87.7 | 65.8 | 90.9 | 93.2 | 92.4 | 90.2 | 72.0 | 92.0 | 60.6 | 86.3 | 74.4 | 16-May-2019 | |
** DeepLab_XI ** [?] | 81.6 | 96.2 | 45.0 | 94.9 | 76.3 | 82.1 | 96.1 | 83.2 | 95.0 | 47.9 | 94.1 | 51.2 | 92.7 | 96.4 | 89.3 | 90.9 | 58.9 | 92.4 | 68.2 | 90.1 | 76.9 | 07-May-2019 | |
** LDN-161 ** [?] | 83.6 | 93.4 | 76.6 | 92.7 | 70.9 | 77.6 | 96.7 | 90.2 | 96.3 | 47.8 | 91.2 | 72.6 | 92.8 | 93.0 | 88.7 | 88.1 | 72.6 | 90.9 | 63.5 | 89.4 | 74.4 | 18-Apr-2019 | |
** DREN ** [?] | 83.5 | 94.7 | 70.6 | 94.1 | 73.6 | 82.5 | 95.4 | 87.7 | 92.3 | 44.2 | 90.2 | 75.1 | 89.7 | 94.5 | 90.4 | 88.9 | 68.3 | 91.3 | 67.6 | 87.9 | 77.1 | 29-Mar-2019 | |
** ESPNetv2 ** [?] | 68.0 | 87.5 | 36.9 | 75.9 | 64.0 | 63.8 | 87.2 | 73.7 | 76.5 | 26.7 | 70.3 | 57.5 | 68.9 | 70.6 | 82.9 | 78.9 | 48.1 | 76.4 | 46.9 | 77.7 | 64.1 | 23-Mar-2019 | |
** DP-CAN_decoder ** [?] | 85.5 | 95.9 | 77.8 | 91.6 | 75.0 | 81.7 | 96.6 | 92.4 | 97.1 | 42.7 | 93.5 | 74.1 | 93.9 | 95.0 | 91.4 | 91.2 | 78.1 | 94.6 | 66.5 | 89.8 | 79.1 | 26-Jan-2019 | |
** FCN16s-Resnet101 ** [?] | 71.0 | 83.9 | 49.3 | 79.1 | 56.6 | 70.4 | 87.5 | 82.7 | 84.9 | 27.0 | 74.1 | 53.6 | 79.9 | 76.7 | 81.9 | 81.7 | 55.3 | 76.9 | 50.8 | 79.0 | 66.6 | 26-Jan-2019 | |
** DP-CAN ** [?] | 84.6 | 96.5 | 77.7 | 87.6 | 73.9 | 79.9 | 96.8 | 92.9 | 95.7 | 40.8 | 92.9 | 74.0 | 91.7 | 95.0 | 92.5 | 89.7 | 77.2 | 94.6 | 64.6 | 90.2 | 77.1 | 25-Jan-2019 | |
** Auto-DeepLab-L ** [?] | 85.6 | 96.5 | 77.3 | 94.8 | 74.1 | 84.0 | 97.1 | 88.7 | 94.5 | 53.5 | 91.6 | 79.2 | 88.4 | 94.2 | 90.2 | 91.2 | 75.1 | 90.1 | 70.7 | 89.1 | 79.7 | 11-Jan-2019 | |
** deeplabv3+ vgg16 ** [?] | 63.9 | 84.6 | 31.2 | 78.8 | 19.0 | 64.1 | 87.9 | 74.3 | 87.7 | 24.7 | 77.5 | 49.6 | 83.3 | 81.8 | 82.4 | 66.2 | 54.1 | 80.1 | 44.6 | 44.0 | 39.7 | 12-Dec-2018 | |
** deeplabv3+ vgg16 ** [?] | 64.3 | 85.0 | 32.1 | 83.5 | 19.4 | 63.8 | 88.7 | 73.7 | 88.5 | 24.4 | 76.9 | 49.5 | 82.3 | 79.8 | 82.2 | 66.0 | 56.3 | 81.4 | 44.6 | 46.6 | 39.8 | 12-Dec-2018 | |
** deeplabv3+ resnet50 ** [?] | 65.2 | 77.9 | 33.4 | 86.1 | 19.6 | 63.8 | 84.1 | 74.9 | 90.1 | 27.9 | 81.2 | 48.3 | 85.5 | 85.8 | 81.8 | 69.6 | 47.8 | 84.5 | 44.7 | 41.2 | 53.9 | 11-Dec-2018 | |
** deeplabv3+ resnet50 ** [?] | 64.6 | 78.7 | 32.9 | 79.7 | 19.5 | 67.8 | 88.0 | 75.5 | 89.6 | 24.7 | 80.6 | 46.1 | 85.1 | 83.8 | 83.1 | 65.5 | 48.1 | 83.7 | 44.0 | 41.3 | 52.8 | 11-Dec-2018 | |
** multi-scale feature fusion network ** [?] | 83.6 | 96.0 | 76.2 | 95.4 | 70.7 | 82.1 | 95.0 | 90.4 | 92.7 | 40.2 | 92.5 | 75.7 | 88.6 | 96.1 | 91.0 | 88.4 | 72.2 | 92.7 | 60.7 | 85.3 | 76.8 | 26-Nov-2018 | |
** fdsf ** [?] | 73.9 | 90.1 | 39.9 | 85.7 | 60.8 | 70.6 | 87.4 | 86.6 | 89.6 | 32.2 | 77.6 | 58.0 | 85.8 | 84.8 | 82.9 | 82.8 | 58.5 | 87.3 | 47.6 | 84.0 | 66.8 | 22-Nov-2018 | |
** resnet 101 + fast laddernet ** [?] | 84.2 | 95.4 | 73.9 | 94.9 | 75.7 | 83.2 | 96.3 | 91.2 | 93.9 | 35.3 | 90.0 | 79.4 | 90.2 | 94.2 | 92.8 | 90.1 | 73.2 | 92.3 | 64.5 | 88.0 | 77.5 | 29-Oct-2018 | |
** Gluon DeepLabV3 152 ** [?] | 86.7 | 96.5 | 74.3 | 96.1 | 80.2 | 85.2 | 97.0 | 93.8 | 96.4 | 49.7 | 93.6 | 77.6 | 95.1 | 95.3 | 93.9 | 89.6 | 75.8 | 94.4 | 70.8 | 89.7 | 78.7 | 03-Oct-2018 | |
** GluonCV DeepLabV3 ** [?] | 86.2 | 96.3 | 69.7 | 93.5 | 76.2 | 86.5 | 96.5 | 92.2 | 95.8 | 47.8 | 95.0 | 81.6 | 93.0 | 96.0 | 91.2 | 90.7 | 77.1 | 94.7 | 68.9 | 89.3 | 81.7 | 07-Sep-2018 | |
** GluonCV PSP ** [?] | 85.1 | 95.7 | 70.9 | 92.8 | 75.6 | 85.0 | 96.5 | 91.7 | 95.0 | 41.8 | 92.3 | 78.8 | 90.4 | 95.6 | 93.4 | 90.6 | 76.1 | 93.5 | 66.7 | 89.5 | 78.4 | 07-Sep-2018 | |
** GluonCV FCN ** [?] | 83.6 | 94.8 | 59.5 | 94.6 | 71.5 | 81.9 | 95.6 | 91.2 | 93.9 | 42.1 | 91.3 | 77.0 | 91.5 | 93.2 | 91.0 | 90.0 | 74.0 | 92.5 | 68.1 | 88.6 | 77.2 | 07-Sep-2018 | |
** DFPnet ** [?] | 71.0 | 88.4 | 37.6 | 83.3 | 52.7 | 75.8 | 89.1 | 85.8 | 89.3 | 31.6 | 65.9 | 33.7 | 83.5 | 75.3 | 82.3 | 82.8 | 60.5 | 75.9 | 52.6 | 80.5 | 70.5 | 26-Aug-2018 | |
** AAF_PSPNet ** [?] | 82.2 | 91.3 | 72.9 | 90.7 | 68.2 | 77.7 | 95.5 | 90.7 | 94.7 | 40.9 | 89.5 | 72.6 | 91.6 | 94.1 | 88.3 | 88.8 | 67.3 | 92.9 | 62.6 | 85.2 | 74.0 | 21-Aug-2018 | |
MSCI [?] | 88.0 | 96.8 | 76.8 | 97.0 | 80.6 | 89.3 | 97.4 | 93.8 | 97.1 | 56.7 | 94.3 | 78.3 | 93.5 | 97.1 | 94.0 | 92.8 | 72.3 | 92.6 | 73.6 | 90.8 | 85.4 | 08-Jul-2018 | |
Score Map Pyramid Net [?] | 69.3 | 80.9 | 38.5 | 79.0 | 58.5 | 68.6 | 83.2 | 80.0 | 85.7 | 31.0 | 66.1 | 56.2 | 76.2 | 71.0 | 81.1 | 81.6 | 54.9 | 74.6 | 49.4 | 75.9 | 68.9 | 06-Jul-2018 | |
PAN [?] | 84.0 | 95.7 | 75.2 | 94.0 | 73.7 | 79.6 | 96.4 | 93.7 | 94.1 | 40.5 | 93.3 | 72.4 | 89.1 | 94.1 | 91.6 | 89.5 | 73.6 | 93.2 | 62.8 | 87.3 | 78.6 | 04-Jul-2018 | |
FSSI300 [?] | 75.1 | 91.1 | 42.6 | 89.1 | 66.4 | 69.2 | 92.5 | 88.5 | 86.8 | 33.2 | 79.2 | 63.2 | 82.4 | 81.4 | 86.9 | 82.1 | 58.1 | 83.2 | 53.0 | 83.1 | 71.5 | 21-Jun-2018 | |
Feature_Pyramids [?] | 81.0 | 93.9 | 60.2 | 86.8 | 70.7 | 75.3 | 92.9 | 91.3 | 92.0 | 42.7 | 90.0 | 71.3 | 88.7 | 92.9 | 88.8 | 89.3 | 60.7 | 88.3 | 65.7 | 87.7 | 76.2 | 06-Jun-2018 | |
ResSegNet [?] | 80.4 | 93.6 | 65.2 | 92.4 | 67.0 | 74.9 | 93.9 | 88.5 | 92.8 | 37.4 | 88.8 | 72.7 | 89.1 | 91.9 | 88.7 | 86.6 | 68.6 | 85.9 | 59.1 | 82.0 | 73.3 | 28-May-2018 | |
dscnn [?] | 81.2 | 94.0 | 58.5 | 91.3 | 69.2 | 78.2 | 95.5 | 89.8 | 92.9 | 38.5 | 90.3 | 70.2 | 90.8 | 93.5 | 87.0 | 87.4 | 63.4 | 89.5 | 65.1 | 88.9 | 75.8 | 25-May-2018 | |
DeepLabv3+_AASPP [?] | 88.5 | 97.4 | 80.3 | 97.1 | 80.1 | 89.3 | 97.4 | 94.1 | 96.9 | 61.9 | 95.1 | 77.2 | 94.2 | 97.5 | 94.4 | 93.0 | 72.4 | 93.8 | 72.6 | 93.3 | 83.3 | 22-May-2018 | |
ExFuse [?] | 87.9 | 96.8 | 80.3 | 97.0 | 82.5 | 87.8 | 96.3 | 92.6 | 96.4 | 53.3 | 94.3 | 78.4 | 94.1 | 94.9 | 91.6 | 92.3 | 81.7 | 94.8 | 70.3 | 90.1 | 83.8 | 22-May-2018 | |
TKCNet [?] | 83.2 | 94.7 | 46.5 | 94.9 | 77.7 | 83.7 | 92.6 | 92.2 | 94.9 | 45.3 | 91.1 | 72.4 | 90.7 | 95.8 | 91.6 | 90.3 | 69.9 | 93.8 | 62.1 | 88.7 | 82.5 | 20-Apr-2018 | |
SRC-B-MachineLearningLab [?] | 88.5 | 97.2 | 78.6 | 97.1 | 80.6 | 89.7 | 97.4 | 93.7 | 96.7 | 59.1 | 95.4 | 81.1 | 93.2 | 97.5 | 94.2 | 92.9 | 73.5 | 93.3 | 74.2 | 91.0 | 85.0 | 19-Apr-2018 | |
FDNet_16s [?] | 84.0 | 95.4 | 77.9 | 95.9 | 69.1 | 80.6 | 96.4 | 92.6 | 95.5 | 40.5 | 92.6 | 70.6 | 93.8 | 93.1 | 90.4 | 89.9 | 71.2 | 92.7 | 63.1 | 88.5 | 77.7 | 22-Mar-2018 | |
EncNet [?] | 85.9 | 95.3 | 76.9 | 94.2 | 80.2 | 85.3 | 96.5 | 90.8 | 96.3 | 47.9 | 93.9 | 80.0 | 92.4 | 96.6 | 90.5 | 91.5 | 70.9 | 93.6 | 66.5 | 87.7 | 80.8 | 15-Mar-2018 | |
Metu_Unified_Net [?] | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 87.8 | - | - | - | - | - | 10-Mar-2018 | |
Curtin_Qilin [?] | 75.6 | 85.4 | 38.5 | 86.5 | 63.8 | 74.8 | 91.3 | 86.8 | 88.3 | 33.5 | 84.1 | 62.4 | 83.6 | 87.7 | 84.9 | 83.5 | 61.4 | 88.5 | 58.0 | 80.8 | 69.0 | 09-Mar-2018 | |
DeepLabv3+ [?] | 87.8 | 97.0 | 77.1 | 97.1 | 79.3 | 89.3 | 97.4 | 93.2 | 96.6 | 56.9 | 95.0 | 79.2 | 93.1 | 97.0 | 94.0 | 92.8 | 71.3 | 92.9 | 72.4 | 91.0 | 84.9 | 09-Feb-2018 | |
DeepLabv3+_JFT [?] | 89.0 | 97.5 | 77.9 | 96.2 | 80.4 | 90.8 | 98.3 | 95.5 | 97.6 | 58.8 | 96.1 | 79.2 | 95.0 | 97.3 | 94.1 | 93.8 | 78.5 | 95.5 | 74.4 | 93.8 | 81.6 | 09-Feb-2018 | |
DFN [?] | 86.2 | 96.4 | 78.6 | 95.5 | 79.1 | 86.4 | 97.1 | 91.4 | 95.0 | 47.7 | 92.9 | 77.2 | 91.0 | 96.7 | 92.2 | 91.7 | 76.5 | 93.1 | 64.4 | 88.3 | 81.2 | 15-Jan-2018 | |
HPN [?] | 85.8 | 94.1 | 67.0 | 95.2 | 81.9 | 88.3 | 95.5 | 90.4 | 95.9 | 40.0 | 92.7 | 82.5 | 91.7 | 95.3 | 92.6 | 91.6 | 73.6 | 94.1 | 69.4 | 91.1 | 81.9 | 13-Dec-2017 | |
MSRSegNet-UW [?] | 81.0 | 93.7 | 64.1 | 92.5 | 68.9 | 79.7 | 91.2 | 86.4 | 90.4 | 41.9 | 88.3 | 72.6 | 89.3 | 90.2 | 86.0 | 86.6 | 67.2 | 89.5 | 66.5 | 83.7 | 76.6 | 23-Nov-2017 | |
DIS [?] | 86.8 | 94.0 | 73.3 | 93.5 | 79.1 | 84.8 | 95.4 | 89.5 | 93.4 | 53.6 | 94.8 | 79.0 | 93.6 | 95.2 | 91.5 | 89.6 | 78.1 | 93.0 | 79.4 | 94.3 | 81.3 | 13-Sep-2017 | |
DeepLabv3-JFT [?] | 86.9 | 96.9 | 73.2 | 95.5 | 78.4 | 86.5 | 96.8 | 90.3 | 97.1 | 51.4 | 95.0 | 73.4 | 94.0 | 96.8 | 94.0 | 92.3 | 81.5 | 95.4 | 67.2 | 90.8 | 81.8 | 05-Aug-2017 | |
CASIA_IVA_SDN [?] | 86.6 | 96.9 | 78.6 | 96.0 | 79.6 | 84.1 | 97.1 | 91.9 | 96.6 | 48.5 | 94.3 | 78.9 | 93.6 | 95.5 | 92.1 | 91.1 | 75.0 | 93.8 | 64.8 | 89.0 | 84.6 | 29-Jul-2017 | |
Ladder_DenseNet [?] | 78.3 | 90.3 | 68.7 | 89.0 | 60.8 | 71.9 | 91.0 | 85.5 | 91.7 | 34.7 | 81.9 | 68.2 | 86.7 | 86.6 | 87.1 | 85.9 | 66.5 | 89.2 | 59.8 | 78.6 | 74.2 | 25-Jul-2017 | |
FCN-2s_Dilated_VGG16 [?] | 67.6 | 81.1 | 35.7 | 78.0 | 58.5 | 63.9 | 82.8 | 79.7 | 81.4 | 27.8 | 71.2 | 53.6 | 75.1 | 74.8 | 79.2 | 77.8 | 55.3 | 74.5 | 45.5 | 72.7 | 60.0 | 20-Jul-2017 | |
BlitzNet300 [?] | 75.5 | 91.5 | 40.4 | 82.6 | 64.5 | 71.7 | 93.3 | 85.2 | 84.9 | 41.8 | 79.1 | 70.6 | 79.3 | 82.7 | 86.6 | 84.2 | 55.3 | 81.0 | 60.1 | 85.6 | 71.6 | 19-Jul-2017 | |
BlitzNet512 [?] | 78.8 | 92.4 | 42.7 | 78.8 | 67.5 | 77.0 | 95.2 | 88.5 | 90.1 | 39.1 | 85.5 | 73.2 | 85.5 | 89.6 | 88.5 | 87.3 | 67.8 | 85.9 | 62.9 | 88.8 | 74.5 | 19-Jul-2017 | |
FCN-2s_Dilated_VGG19 [?] | 69.0 | 81.8 | 37.0 | 79.5 | 57.2 | 67.5 | 83.8 | 79.3 | 83.0 | 28.5 | 74.5 | 57.5 | 76.0 | 75.9 | 79.5 | 78.6 | 57.0 | 77.8 | 45.3 | 73.7 | 63.2 | 11-Jul-2017 | |
IDW-CNN [?] | 86.3 | 94.8 | 67.3 | 93.4 | 74.8 | 84.6 | 95.3 | 89.6 | 93.6 | 54.1 | 94.9 | 79.0 | 93.3 | 95.5 | 91.7 | 89.2 | 77.5 | 93.7 | 79.2 | 94.0 | 80.8 | 30-Jun-2017 | |
DeepLabv3 [?] | 85.7 | 96.4 | 76.6 | 92.7 | 77.8 | 87.6 | 96.7 | 90.2 | 95.4 | 47.5 | 93.4 | 76.3 | 91.4 | 97.2 | 91.0 | 92.1 | 71.3 | 90.9 | 68.9 | 90.8 | 79.3 | 20-Jun-2017 | |
Deep Layer Cascade (LC) [?] | 82.7 | 85.5 | 66.7 | 94.5 | 67.2 | 84.0 | 96.1 | 89.8 | 93.5 | 47.2 | 90.4 | 71.5 | 88.9 | 91.7 | 89.2 | 89.1 | 70.4 | 89.4 | 70.7 | 84.2 | 79.6 | 06-Apr-2017 | |
VGG19_FCN [?] | 68.1 | 81.7 | 35.9 | 79.8 | 57.5 | 66.9 | 84.1 | 79.6 | 80.8 | 28.2 | 72.1 | 53.3 | 74.0 | 72.1 | 78.5 | 78.2 | 55.5 | 76.7 | 43.4 | 73.8 | 65.1 | 06-Apr-2017 | |
BlitzNet [?] | 75.6 | 90.1 | 38.7 | 87.5 | 68.6 | 70.1 | 93.1 | 86.4 | 89.2 | 32.3 | 81.7 | 67.9 | 82.2 | 82.9 | 84.7 | 81.5 | 63.3 | 85.5 | 55.5 | 83.1 | 70.6 | 17-Mar-2017 | |
BlitzNet [?] | 73.9 | 91.4 | 40.4 | 76.4 | 62.6 | 74.8 | 91.1 | 86.2 | 85.2 | 35.6 | 83.1 | 59.0 | 77.9 | 84.6 | 84.1 | 80.6 | 57.2 | 86.5 | 56.1 | 78.8 | 67.4 | 17-Mar-2017 | |
Large_Kernel_Matters [?] | 83.6 | 95.3 | 68.7 | 94.1 | 72.6 | 82.4 | 96.0 | 89.3 | 93.0 | 47.8 | 89.6 | 70.8 | 89.2 | 93.3 | 90.1 | 91.2 | 72.0 | 89.8 | 67.8 | 88.9 | 76.9 | 16-Mar-2017 | |
ResNet_DUC_HDC [?] | 83.1 | 92.1 | 64.6 | 94.7 | 71.0 | 81.0 | 94.6 | 89.7 | 94.9 | 45.6 | 93.7 | 74.4 | 92.0 | 95.1 | 90.0 | 88.7 | 69.1 | 90.4 | 62.7 | 86.4 | 78.2 | 01-Mar-2017 | |
** ResNet-38_COCO ** [?] | 84.9 | 96.2 | 75.2 | 95.4 | 74.4 | 81.7 | 93.7 | 89.9 | 92.5 | 48.2 | 92.0 | 79.9 | 90.1 | 95.5 | 91.8 | 91.2 | 73.0 | 90.5 | 65.4 | 88.7 | 80.6 | 22-Jan-2017 | |
Multipath-RefineNet [?] | 84.2 | 95.0 | 73.2 | 93.5 | 78.1 | 84.8 | 95.6 | 89.8 | 94.1 | 43.7 | 92.0 | 77.2 | 90.8 | 93.4 | 88.6 | 88.1 | 70.1 | 92.9 | 64.3 | 87.7 | 78.8 | 17-Jan-2017 | |
ResNet-38_MS [?] | 83.1 | 95.2 | 72.5 | 95.1 | 70.8 | 78.5 | 91.7 | 90.0 | 92.4 | 41.9 | 90.8 | 73.9 | 90.6 | 93.8 | 90.5 | 89.5 | 72.6 | 89.8 | 63.2 | 87.8 | 79.1 | 09-Dec-2016 | |
PSPNet [?] | 85.4 | 95.8 | 72.7 | 95.0 | 78.9 | 84.4 | 94.7 | 92.0 | 95.7 | 43.1 | 91.0 | 80.3 | 91.3 | 96.3 | 92.3 | 90.1 | 71.5 | 94.4 | 66.9 | 88.8 | 82.0 | 06-Dec-2016 | |
RRF-4s [?] | 69.4 | 79.5 | 57.3 | 78.7 | 61.8 | 64.1 | 83.9 | 78.1 | 80.4 | 30.0 | 73.0 | 59.4 | 74.3 | 73.9 | 80.8 | 77.9 | 53.9 | 76.4 | 46.1 | 71.7 | 63.9 | 30-Nov-2016 | |
Weak_manifold_CNN [?] | 65.3 | 80.9 | 32.9 | 73.2 | 57.7 | 63.0 | 83.9 | 73.5 | 76.6 | 27.0 | 65.9 | 52.6 | 70.9 | 69.8 | 73.0 | 74.9 | 53.3 | 70.1 | 45.4 | 72.4 | 62.7 | 11-Nov-2016 | |
DP_ResNet_CRF [?] | 81.0 | 94.0 | 59.5 | 91.8 | 68.1 | 75.9 | 95.2 | 88.9 | 93.2 | 37.7 | 90.8 | 70.8 | 89.2 | 92.7 | 87.7 | 87.9 | 65.5 | 90.3 | 62.6 | 87.2 | 75.5 | 10-Nov-2016 | |
Dual-Multi-Reso-MR [?] | 72.4 | 87.6 | 40.3 | 80.6 | 62.9 | 71.3 | 88.1 | 84.4 | 84.7 | 29.6 | 77.8 | 58.5 | 80.0 | 81.0 | 85.4 | 82.1 | 55.0 | 83.8 | 48.2 | 80.3 | 65.3 | 03-Nov-2016 | |
HikSeg_COCO [?] | 81.4 | 95.0 | 64.2 | 91.5 | 79.0 | 78.7 | 93.4 | 88.4 | 94.3 | 45.8 | 89.6 | 65.2 | 90.6 | 92.8 | 88.7 | 87.5 | 62.4 | 88.4 | 56.4 | 86.2 | 75.3 | 02-Oct-2016 | |
OBP-HJLCN [?] | 80.4 | 92.7 | 54.8 | 91.6 | 68.0 | 76.9 | 95.7 | 89.3 | 92.6 | 35.2 | 89.0 | 69.3 | 89.4 | 92.7 | 87.9 | 87.5 | 66.8 | 88.5 | 62.2 | 86.1 | 76.2 | 13-Sep-2016 | |
SegModel [?] | 81.8 | 93.6 | 60.2 | 93.6 | 69.1 | 76.4 | 96.3 | 88.2 | 95.5 | 37.9 | 90.8 | 73.3 | 91.1 | 94.3 | 88.6 | 88.6 | 64.8 | 90.1 | 63.7 | 87.3 | 78.2 | 23-Aug-2016 | |
CentraleSupelec Deep G-CRF [?] | 80.2 | 92.9 | 61.2 | 91.0 | 66.3 | 77.7 | 95.3 | 88.9 | 92.4 | 33.8 | 88.4 | 69.1 | 89.8 | 92.9 | 87.7 | 87.5 | 62.6 | 89.9 | 59.2 | 87.1 | 74.2 | 12-Aug-2016 | |
CMT-FCN-ResNet-CRF [?] | 80.0 | 92.5 | 55.3 | 92.2 | 66.0 | 76.9 | 95.1 | 88.6 | 93.9 | 35.1 | 87.6 | 71.6 | 89.3 | 92.8 | 87.9 | 88.0 | 62.0 | 88.0 | 59.7 | 86.1 | 75.7 | 02-Aug-2016 | |
UNIST_GDN_CRF [?] | 73.2 | 87.9 | 37.8 | 88.8 | 64.5 | 70.7 | 87.7 | 81.3 | 87.1 | 32.5 | 76.7 | 66.6 | 80.3 | 76.6 | 82.2 | 82.3 | 57.9 | 84.5 | 55.9 | 78.5 | 64.2 | 29-Jul-2016 | |
UNIST_GDN_CRF_ENS [?] | 74.0 | 88.6 | 48.6 | 88.8 | 64.7 | 70.4 | 87.2 | 81.8 | 86.4 | 32.0 | 77.1 | 64.1 | 80.5 | 78.0 | 84.0 | 83.3 | 59.2 | 85.9 | 56.8 | 77.9 | 65.0 | 29-Jul-2016 | |
UNIST_GDN_FCN [?] | 62.2 | 74.5 | 31.9 | 66.7 | 49.7 | 60.5 | 76.9 | 75.9 | 76.0 | 22.9 | 57.6 | 54.5 | 73.0 | 59.4 | 75.0 | 73.7 | 51.0 | 67.5 | 43.3 | 70.0 | 56.4 | 27-Jul-2016 | |
UNIST_GDN_FCN_FC [?] | 64.4 | 75.6 | 31.5 | 69.2 | 51.6 | 62.9 | 78.8 | 76.7 | 78.7 | 24.6 | 61.7 | 60.3 | 74.5 | 62.6 | 76.1 | 74.3 | 51.5 | 70.6 | 47.3 | 74.0 | 58.4 | 27-Jul-2016 | |
DeepSqueeNet_CRF [?] | 70.1 | 85.7 | 37.4 | 83.4 | 59.7 | 67.8 | 85.2 | 79.8 | 81.4 | 27.9 | 72.3 | 60.4 | 76.5 | 78.2 | 82.7 | 78.8 | 57.3 | 78.6 | 49.0 | 77.6 | 61.0 | 21-Jul-2016 | |
DeepSqueeNet [?] | 65.7 | 76.1 | 34.3 | 76.4 | 56.0 | 62.0 | 82.7 | 75.4 | 78.3 | 25.6 | 64.3 | 58.8 | 73.3 | 69.3 | 79.3 | 76.7 | 53.2 | 72.1 | 46.2 | 69.3 | 59.1 | 20-Jul-2016 | |
LRR_4x_ResNet_COCO [?] | 79.3 | 92.4 | 45.1 | 94.6 | 65.2 | 75.8 | 95.1 | 89.1 | 92.3 | 39.0 | 85.7 | 70.4 | 88.6 | 89.4 | 88.6 | 86.6 | 65.8 | 86.2 | 57.4 | 85.7 | 77.3 | 18-Jul-2016 | |
FCN_CLC_MSP [?] | 70.8 | 86.2 | 40.1 | 83.9 | 57.8 | 64.7 | 87.9 | 81.3 | 85.9 | 28.3 | 80.0 | 61.9 | 80.7 | 82.5 | 79.7 | 80.2 | 54.7 | 81.3 | 39.3 | 78.9 | 59.2 | 01-Jul-2016 | |
LRR_4x_COCO [?] | 78.7 | 93.2 | 44.2 | 89.4 | 65.4 | 74.9 | 93.9 | 87.0 | 92.0 | 42.9 | 83.7 | 68.9 | 86.5 | 88.0 | 89.0 | 87.2 | 67.3 | 85.6 | 64.0 | 84.1 | 71.5 | 16-Jun-2016 | |
FER_WSSS_REGION_SCORE_POOL [?] | 38.0 | 33.1 | 21.7 | 27.7 | 17.7 | 38.4 | 55.8 | 38.3 | 57.9 | 13.6 | 37.4 | 29.2 | 43.9 | 39.1 | 52.4 | 44.4 | 30.2 | 48.7 | 26.4 | 31.8 | 36.3 | 14-Jun-2016 | |
LRR_4x_de_pyramid_VOC [?] | 75.9 | 91.8 | 41.0 | 83.0 | 62.3 | 74.3 | 93.0 | 86.8 | 88.7 | 36.6 | 81.8 | 63.4 | 84.7 | 85.9 | 85.1 | 83.1 | 62.0 | 84.6 | 55.6 | 84.9 | 70.0 | 07-Jun-2016 | |
Bayesian FCN [?] | 65.4 | 80.8 | 34.9 | 75.2 | 57.0 | 64.1 | 80.9 | 77.2 | 78.0 | 26.4 | 65.6 | 44.0 | 72.6 | 70.8 | 78.7 | 76.8 | 52.4 | 71.0 | 40.4 | 73.8 | 61.8 | 07-Jun-2016 | |
Bayesian Dilation Network [?] | 73.1 | 88.6 | 39.0 | 86.2 | 63.3 | 67.1 | 88.1 | 81.9 | 86.8 | 34.7 | 81.1 | 57.1 | 81.3 | 86.5 | 83.4 | 83.4 | 53.7 | 84.0 | 53.3 | 80.5 | 62.5 | 07-Jun-2016 | |
DeepLabv2-CRF [?] | 79.7 | 92.6 | 60.4 | 91.6 | 63.4 | 76.3 | 95.0 | 88.4 | 92.6 | 32.7 | 88.5 | 67.6 | 89.6 | 92.1 | 87.0 | 87.4 | 63.3 | 88.3 | 60.0 | 86.8 | 74.5 | 06-Jun-2016 | |
CASIA_SegResNet_CRF_COCO [?] | 79.3 | 93.8 | 42.2 | 93.1 | 68.6 | 75.3 | 95.3 | 88.8 | 92.5 | 36.5 | 84.3 | 64.2 | 86.8 | 87.8 | 87.5 | 88.5 | 69.2 | 89.7 | 64.1 | 86.8 | 74.6 | 03-Jun-2016 | |
CASIA_IVA_OASeg [?] | 78.3 | 93.8 | 41.9 | 89.4 | 67.5 | 71.5 | 94.6 | 85.3 | 89.5 | 38.1 | 88.4 | 64.8 | 87.0 | 90.5 | 84.9 | 83.3 | 67.5 | 86.9 | 68.1 | 83.4 | 74.0 | 21-May-2016 | |
Adelaide_VeryDeep_FCN_VOC [?] | 79.1 | 91.9 | 48.1 | 93.4 | 69.3 | 75.5 | 94.2 | 87.5 | 92.8 | 36.7 | 86.9 | 65.2 | 89.1 | 90.2 | 86.5 | 87.2 | 64.6 | 90.1 | 59.7 | 85.5 | 72.7 | 13-May-2016 | |
Oxford_TVG_HO_CRF [?] | 77.9 | 92.5 | 59.1 | 90.3 | 70.6 | 74.4 | 92.4 | 84.1 | 88.3 | 36.8 | 85.6 | 67.1 | 85.1 | 86.9 | 88.2 | 82.6 | 62.6 | 85.0 | 56.3 | 81.9 | 72.5 | 16-Mar-2016 | |
FCN-8s-heavy [?] | 67.2 | 82.4 | 36.1 | 75.6 | 61.5 | 65.4 | 83.4 | 77.2 | 80.1 | 27.9 | 66.8 | 51.5 | 73.6 | 71.9 | 78.9 | 77.1 | 55.3 | 73.4 | 44.3 | 74.0 | 63.2 | 06-Feb-2016 | |
DeepLab-CRF-Attention-DT [?] | 76.3 | 93.2 | 41.7 | 88.0 | 61.7 | 74.9 | 92.9 | 84.5 | 90.4 | 33.0 | 82.8 | 63.2 | 84.5 | 85.0 | 87.2 | 85.7 | 60.5 | 87.7 | 57.8 | 84.3 | 68.2 | 03-Feb-2016 | |
DeepLab-CRF-Attention [?] | 75.7 | 91.1 | 40.9 | 86.9 | 62.1 | 74.2 | 92.3 | 84.4 | 90.1 | 34.0 | 81.7 | 66.0 | 83.5 | 83.9 | 86.5 | 84.6 | 59.1 | 87.2 | 59.6 | 81.0 | 66.2 | 03-Feb-2016 | |
MERL_UMD_Deep_GCRF_COCO [?] | 74.8 | 89.9 | 42.6 | 90.0 | 65.0 | 69.2 | 89.9 | 83.9 | 88.2 | 31.3 | 81.8 | 66.4 | 82.9 | 81.1 | 85.7 | 83.4 | 58.4 | 88.4 | 56.7 | 77.7 | 64.3 | 15-Jan-2016 | |
CentraleSuperBoundaries++ [?] | 76.0 | 91.1 | 38.5 | 90.9 | 68.7 | 74.2 | 89.9 | 85.3 | 89.1 | 34.4 | 82.5 | 65.6 | 83.1 | 82.9 | 85.7 | 85.4 | 60.6 | 84.5 | 59.9 | 80.2 | 69.9 | 13-Jan-2016 | |
CCBM [?] | 72.3 | 87.8 | 46.7 | 79.0 | 63.6 | 70.5 | 83.7 | 75.5 | 86.9 | 31.0 | 81.9 | 61.3 | 81.5 | 85.9 | 81.1 | 76.5 | 58.7 | 77.7 | 50.4 | 76.6 | 69.8 | 29-Nov-2015 | |
** SegNet ** [?] | 59.9 | 73.6 | 37.6 | 62.0 | 46.8 | 58.6 | 79.1 | 70.1 | 65.4 | 23.6 | 60.4 | 45.6 | 61.8 | 63.5 | 75.3 | 74.9 | 42.6 | 63.7 | 42.5 | 67.8 | 52.7 | 10-Nov-2015 | |
Adelaide_Context_CNN_CRF_COCO [?] | 77.8 | 92.9 | 39.6 | 84.0 | 67.9 | 75.3 | 92.7 | 83.8 | 90.1 | 44.3 | 85.5 | 64.9 | 87.3 | 88.8 | 84.5 | 85.5 | 68.1 | 89.0 | 62.8 | 81.2 | 71.4 | 06-Nov-2015 | |
MERL_DEEP_GCRF [?] | 73.2 | 85.2 | 43.9 | 83.3 | 65.2 | 68.3 | 89.0 | 82.7 | 85.3 | 31.1 | 79.5 | 63.3 | 80.5 | 79.3 | 85.5 | 81.0 | 60.5 | 85.5 | 52.0 | 77.3 | 65.1 | 17-Oct-2015 | |
CUHK_DPN_COCO [?] | 77.5 | 89.0 | 61.6 | 87.7 | 66.8 | 74.7 | 91.2 | 84.3 | 87.6 | 36.5 | 86.3 | 66.1 | 84.4 | 87.8 | 85.6 | 85.4 | 63.6 | 87.3 | 61.3 | 79.4 | 66.4 | 22-Sep-2015 | |
Adelaide_Context_CNN_CRF_VOC [?] | 75.3 | 90.6 | 37.6 | 80.0 | 67.8 | 74.4 | 92.0 | 85.2 | 86.2 | 39.1 | 81.2 | 58.9 | 83.8 | 83.9 | 84.3 | 84.8 | 62.1 | 83.2 | 58.2 | 80.8 | 72.3 | 30-Aug-2015 | |
POSTECH_DeconvNet_CRF_VOC [?] | 74.8 | 90.0 | 40.8 | 84.2 | 67.3 | 70.7 | 90.9 | 84.8 | 87.4 | 34.8 | 83.0 | 58.7 | 82.3 | 87.1 | 86.9 | 82.4 | 64.5 | 84.6 | 54.9 | 77.5 | 64.1 | 18-Aug-2015 | |
Adelaide_Context_CNN_CRF_COCO [?] | 77.2 | 92.3 | 38.8 | 82.9 | 66.1 | 75.1 | 92.4 | 83.1 | 88.6 | 41.8 | 85.9 | 62.8 | 86.7 | 88.4 | 84.0 | 85.4 | 67.4 | 88.8 | 61.9 | 81.9 | 71.7 | 13-Aug-2015 | |
MSRA_BoxSup [?] | 75.2 | 89.8 | 38.0 | 89.2 | 68.9 | 68.0 | 89.6 | 83.0 | 87.7 | 34.4 | 83.6 | 67.1 | 81.5 | 83.7 | 85.2 | 83.5 | 58.6 | 84.9 | 55.8 | 81.2 | 70.7 | 18-May-2015 | |
DeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint [?] | 73.9 | 89.2 | 46.7 | 88.5 | 63.5 | 68.4 | 87.0 | 81.2 | 86.3 | 32.6 | 80.7 | 62.4 | 81.0 | 81.3 | 84.3 | 82.1 | 56.2 | 84.6 | 58.3 | 76.2 | 67.2 | 26-Apr-2015 | |
Oxford_TVG_CRF_RNN_COCO [?] | 74.7 | 90.4 | 55.3 | 88.7 | 68.4 | 69.8 | 88.3 | 82.4 | 85.1 | 32.6 | 78.5 | 64.4 | 79.6 | 81.9 | 86.4 | 81.8 | 58.6 | 82.4 | 53.5 | 77.4 | 70.1 | 22-Apr-2015 | |
Oxford_TVG_CRF_RNN_VOC [?] | 72.0 | 87.5 | 39.0 | 79.7 | 64.2 | 68.3 | 87.6 | 80.8 | 84.4 | 30.4 | 78.2 | 60.4 | 80.5 | 77.8 | 83.1 | 80.6 | 59.5 | 82.8 | 47.8 | 78.3 | 67.1 | 22-Apr-2015 | |
POSTECH_EDeconvNet_CRF_VOC [?] | 72.5 | 89.9 | 39.3 | 79.7 | 63.9 | 68.2 | 87.4 | 81.2 | 86.1 | 28.5 | 77.0 | 62.0 | 79.0 | 80.3 | 83.6 | 80.2 | 58.8 | 83.4 | 54.3 | 80.7 | 65.0 | 22-Apr-2015 | |
Hypercolumn [?] | 62.6 | 68.7 | 33.5 | 69.8 | 51.3 | 70.2 | 81.1 | 71.9 | 74.9 | 23.9 | 60.6 | 46.9 | 72.1 | 68.3 | 74.5 | 72.9 | 52.6 | 64.4 | 45.4 | 64.9 | 57.4 | 09-Apr-2015 | |
DeepLab-MSc-CRF-LargeFOV [?] | 71.6 | 84.4 | 54.5 | 81.5 | 63.6 | 65.9 | 85.1 | 79.1 | 83.4 | 30.7 | 74.1 | 59.8 | 79.0 | 76.1 | 83.2 | 80.8 | 59.7 | 82.2 | 50.4 | 73.1 | 63.7 | 02-Apr-2015 | |
TTI_zoomout_v2 [?] | 69.6 | 85.6 | 37.3 | 83.2 | 62.5 | 66.0 | 85.1 | 80.7 | 84.9 | 27.2 | 73.2 | 57.5 | 78.1 | 79.2 | 81.1 | 77.1 | 53.6 | 74.0 | 49.2 | 71.7 | 63.3 | 30-Mar-2015 | |
DeepLab-CRF-LargeFOV [?] | 70.3 | 83.5 | 36.6 | 82.5 | 62.3 | 66.5 | 85.4 | 78.5 | 83.7 | 30.4 | 72.9 | 60.4 | 78.5 | 75.5 | 82.1 | 79.7 | 58.2 | 82.0 | 48.8 | 73.7 | 63.3 | 28-Mar-2015 | |
DeepLab-CRF-COCO-LargeFOV [?] | 72.7 | 89.1 | 38.3 | 88.1 | 63.3 | 69.7 | 87.1 | 83.1 | 85.0 | 29.3 | 76.5 | 56.5 | 79.8 | 77.9 | 85.8 | 82.4 | 57.4 | 84.3 | 54.9 | 80.5 | 64.1 | 18-Mar-2015 | |
DeepLab-CRF-COCO-Strong [?] | 70.4 | 85.3 | 36.2 | 84.8 | 61.2 | 67.5 | 84.6 | 81.4 | 81.0 | 30.8 | 73.8 | 53.8 | 77.5 | 76.5 | 82.3 | 81.6 | 56.3 | 78.9 | 52.3 | 76.6 | 63.3 | 11-Feb-2015 | |
CRF_RNN [?] | 65.2 | 80.9 | 34.0 | 72.9 | 52.6 | 62.5 | 79.8 | 76.3 | 79.9 | 23.6 | 67.7 | 51.8 | 74.8 | 69.9 | 76.9 | 76.9 | 49.0 | 74.7 | 42.7 | 72.1 | 59.6 | 10-Feb-2015 | |
MSRA_BoxSup [?] | 71.0 | 86.4 | 35.5 | 79.7 | 65.2 | 65.2 | 84.3 | 78.5 | 83.7 | 30.5 | 76.2 | 62.6 | 79.3 | 76.1 | 82.1 | 81.3 | 57.0 | 78.2 | 55.0 | 72.5 | 68.1 | 10-Feb-2015 | |
DeepLab-CRF-MSc [?] | 67.1 | 80.4 | 36.8 | 77.4 | 55.2 | 66.4 | 81.5 | 77.5 | 78.9 | 27.1 | 68.2 | 52.7 | 74.3 | 69.6 | 79.4 | 79.0 | 56.9 | 78.8 | 45.2 | 72.7 | 59.3 | 30-Dec-2014 | |
DeepLab-CRF [?] | 66.4 | 78.4 | 33.1 | 78.2 | 55.6 | 65.3 | 81.3 | 75.5 | 78.6 | 25.3 | 69.2 | 52.7 | 75.2 | 69.0 | 79.1 | 77.6 | 54.7 | 78.3 | 45.1 | 73.3 | 56.2 | 23-Dec-2014 | |
MSRA_CFM [?] | 61.8 | 75.7 | 26.7 | 69.5 | 48.8 | 65.6 | 81.0 | 69.2 | 73.3 | 30.0 | 68.7 | 51.5 | 69.1 | 68.1 | 71.7 | 67.5 | 50.4 | 66.5 | 44.4 | 58.9 | 53.5 | 17-Dec-2014 | |
TTI_zoomout_16 [?] | 64.4 | 81.9 | 35.1 | 78.2 | 57.4 | 56.5 | 80.5 | 74.0 | 79.8 | 22.4 | 69.6 | 53.7 | 74.0 | 76.0 | 76.6 | 68.8 | 44.3 | 70.2 | 40.2 | 68.9 | 55.3 | 24-Nov-2014 | |
TTI_zoomout [?] | 58.4 | 70.3 | 31.9 | 68.3 | 46.4 | 52.1 | 75.3 | 68.4 | 75.3 | 19.2 | 58.4 | 49.9 | 69.6 | 63.0 | 70.1 | 67.6 | 41.5 | 64.0 | 34.9 | 64.2 | 47.3 | 17-Nov-2014 | |
FCN-8s [?] | 62.2 | 76.8 | 34.2 | 68.9 | 49.4 | 60.3 | 75.3 | 74.7 | 77.6 | 21.4 | 62.5 | 46.8 | 71.8 | 63.9 | 76.5 | 73.9 | 45.2 | 72.4 | 37.4 | 70.9 | 55.1 | 12-Nov-2014 | |
NUS_UDS [?] | 50.0 | 67.0 | 24.5 | 47.2 | 45.0 | 47.9 | 65.3 | 60.6 | 58.5 | 15.5 | 50.8 | 37.4 | 45.8 | 59.9 | 62.0 | 52.7 | 40.8 | 48.2 | 36.8 | 53.1 | 45.6 | 29-Oct-2014 | |
SDS [?] | 51.6 | 63.3 | 25.7 | 63.0 | 39.8 | 59.2 | 70.9 | 61.4 | 54.9 | 16.8 | 45.0 | 48.2 | 50.5 | 51.0 | 57.7 | 63.3 | 31.8 | 58.7 | 31.2 | 55.7 | 48.5 | 21-Jul-2014 | |
BONN_O2PCPMC_FGT_SEGM [?] | 47.8 | 64.0 | 27.3 | 54.1 | 39.2 | 48.7 | 56.6 | 57.7 | 52.5 | 14.2 | 54.8 | 29.6 | 42.2 | 58.0 | 54.8 | 50.2 | 36.6 | 58.6 | 31.6 | 48.4 | 38.6 | 08-Aug-2013 | |
TTIC-divmbest-rerank [?] | 48.1 | 62.7 | 25.6 | 46.9 | 43.0 | 54.8 | 58.4 | 58.6 | 55.6 | 14.6 | 47.5 | 31.2 | 44.7 | 51.0 | 60.9 | 53.5 | 36.6 | 50.9 | 30.1 | 50.2 | 46.8 | 15-Nov-2012 | |
BONN_O2PCPMC_FGT_SEGM [?] | 47.5 | 63.4 | 27.3 | 56.1 | 37.7 | 47.2 | 57.9 | 59.3 | 55.0 | 11.5 | 50.8 | 30.5 | 45.0 | 58.4 | 57.4 | 48.6 | 34.6 | 53.3 | 32.4 | 47.6 | 39.2 | 23-Sep-2012 | |
BONNGC_O2P_CPMC_CSI [?] | 46.8 | 63.6 | 26.8 | 45.6 | 41.7 | 47.1 | 54.3 | 58.6 | 55.1 | 14.5 | 49.0 | 30.9 | 46.1 | 52.6 | 58.2 | 53.4 | 32.0 | 44.5 | 34.6 | 45.3 | 43.1 | 23-Sep-2012 | |
BONN_CMBR_O2P_CPMC_LIN [?] | 46.7 | 63.9 | 23.8 | 44.6 | 40.3 | 45.5 | 59.6 | 58.7 | 57.1 | 11.7 | 45.9 | 34.9 | 43.0 | 54.9 | 58.0 | 51.5 | 34.6 | 44.1 | 29.9 | 50.5 | 44.5 | 23-Sep-2012 |
Title | Method | Affiliation | Contributors | Description | Date |
---|---|---|---|---|---|
DeepLabv3+ with Fillin fusion | A new feature fusion method: FillIn | Beijing University of Technology | Tian Liu Lichun Wang Shaofan Wang | https://arxiv.org/abs/1912.08059 The new version of our paper is not update yet. The feature fusion is actually privilege operation: Only use in training. | 2020-05-25 18:35:34 |
Adaptive Affinity Fields for Semantic Segmentation | AAF_PSPNet | UC Berkeley / ICSI | Tsung-Wei Ke*, Jyh-Jing Hwang*, Ziwei Liu, Stella X. Yu (* equal contribution) | Existing semantic segmentation methods mostly rely on per-pixel supervision, unable to capture structural regularity present in natural images. Instead of learning to enforce semantic labels on individual pixels, we propose to enforce affinity field patterns in individual pixel neighbourhoods, i.e., the semantic label patterns of whether neighbouring pixels are in the same segment should match between the prediction and the ground-truth. The affinity fields characterize geometric relationships within the image, such as "motorcycles have round wheels". We further develop a novel method for learning the optimal neighbourhood size for each semantic category, with an adversarial loss that optimizes over worst-case scenarios. Unlike the common Conditional Random Field (CRF) approaches, our adaptive affinity field (AAF) method has no extra parameters during inference, and is less sensitive to appearance changes in the image. | 2018-08-21 16:28:38 |
AGV BANA RES NAL | AGV BANA RES NAL | AGV BANA RES NAL | AGV BANA RES NAL | AGV BANA RES NAL | 2022-01-31 04:20:30 |
AGV BANA VGG NAL attempt 5 | AGV BANA VGG NAL attempt 5 | AGV BANA VGG NAL attempt 5 | AGV BANA VGG NAL attempt 5 | AGV BANA VGG NAL attempt 5 | 2022-01-30 16:24:19 |
Adaptive Progressive Decision Network | APDN | UESTC | Hengcan Shi, Hongliang Li, Qingbo Wu | Adaptive Progressive Decision Network | 2019-05-28 08:03:53 |
Adelaide_Context_CNN_CRF_COCO | Adelaide_Context_CNN_CRF_COCO | The University of Adelaide; ACRV; D2DCRC | Guosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel; | Please refer to our technical report: http://arxiv.org/abs/1504.01013. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both CNNs and CRFs. | 2015-11-06 07:46:13 |
Adelaide_Context_CNN_CRF_COCO | Adelaide_Context_CNN_CRF_COCO | The University of Adelaide; ACRV; D2DCRC | Guosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel; | Please refer to our technical report: Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation (available at: http://arxiv.org/abs/1504.01013). This technical report will be updated later. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both DCNNs and CRFs. Specifically, we train CRFs whose potential functions are modelled by fully convolutional neural networks (FCNNs). The resulted deep conditional random fields (DCRFs) are thus able to learn complex feature representations; and during the course of learning, dependencies between the output variables are taken into account. As in conventional DCNNs, the training of our model is performed in an end-to-end fashion using back-propagation. Different from directly maximizing likelihood, however, inference may be needed at each gradient descent iteration, which can be computationally very expensive since typically millions of iterations are required. To enable efficient training, we propose to use approximate training, namely, piecewise training of CRFs, avoiding repeated inference. | 2015-08-13 04:13:59 |
Adelaide_Context_CNN_CRF_VOC | Adelaide_Context_CNN_CRF_VOC | The University of Adelaide; ACRV; D2DCRC | Guosheng Lin; Chunhua Shen; Ian Reid; Anton van den Hengel; | Please refer to our technical report: Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation (available at: http://arxiv.org/abs/1504.01013). This technical report will be updated later. We explore contextual information to improve semantic image segmentation by taking advantage of the strength of both DCNNs and CRFs. Specifically, we train CRFs whose potential functions are modelled by fully convolutional neural networks (FCNNs). The resulted deep conditional random fields (DCRFs) are thus able to learn complex feature representations; and during the course of learning, dependencies between the output variables are taken into account. As in conventional DCNNs, the training of our model is performed in an end-to-end fashion using back-propagation. Different from directly maximizing likelihood, however, inference may be needed at each gradient descent iteration, which can be computationally very expensive since typically millions of iterations are required. To enable efficient training, we propose to use approximate training, namely, piecewise training of CRFs, avoiding repeated inference. | 2015-08-30 11:49:27 |
High-performance Very Deep FCN | Adelaide_VeryDeep_FCN_VOC | The University of Adelaide; D2DCRC | Zifeng Wu, Chunhua Shen, Anton van den Hengel | We propose a method for high-performance semantic image segmentation based on very deep fully convolutional networks. A few design factors are carefully examined to achieve the result. Details can be found in the paper "High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks”, Zifeng Wu, Chunhua Shen, Anton van den Hengel”: http://arxiv.org/abs/1604.04339. Note that the system used for this submission was trained on the augmented VOC 2012 data ONLY. | 2016-05-13 04:57:00 |
Auto-DeepLab-L | Auto-DeepLab-L | Johns Hopkins University; Google Inc.; Stanford University | Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei | In this work, we study Neural Architecture Search for semantic image segmentation, an important computer vision task that assigns a semantic label to every pixel in an image. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Without any ImageNet pretraining, our architecture searched specifically for semantic image segmentation attains state-of-the-art performance. Please refer to https://arxiv.org/abs/1901.02985 for details. | 2019-01-11 19:43:31 |
O2P Regressor + Composite Statistical Inference | BONNGC_O2P_CPMC_CSI | (1) University of Bonn, (2) Georgia Institute of Technology, (3) University of Coimbra | Joao Carreira (1,3) Fuxin Li (2) Guy Lebanon (2) Cristian Sminchisescu (1) | We utilize a novel probabilistic inference procedure (unpublished yet), Composite Statisitcal Inference (CSI), on semantic segmentation using predictions on overlapping figure-ground hypotheses. Regressor predictions on segment overlaps to the ground truth object are modelled as generated by the true overlap with the ground truth segment plus noise. A model of ground truth overlap is defined by parametrizing on the unknown percentage of each superpixel that belongs to the unknown ground truth. A joint optimization on all the superpixels and all the categories is then performed in order to maximize the likelihood of the SVR predictions. The optimization has a tight convex relaxation so solutions can be expected to be close to the global optimum. A fast and optimal search algorithm is then applied to retrieve each object. CSI takes the intuition from the SVRSEGM inference algorithm that multiple predictions on similar segments can be combined to better consolidate the segment mask. But it fully develops the idea by constructing a probabilistic framework and performing composite MLE jointly on all segments and categories. Therefore it is able to consolidate better object boundaries and handle hard cases when objects interact closely and heavily occlude each other. For each image, we use 150 overlapping figure-ground hypotheses generated by the CPMC algorithm (Carreira and Sminchisescu, PAMI 2012), and linear SVR predictions on them with the novel second order O2P features (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONN_CMBR_O2P_CPMC_LIN) as the input to the inference algorithm. | 2012-09-23 23:49:02 |
Linear SVR with second-order pooling. | BONN_CMBR_O2P_CPMC_LIN | (1) University of Bonn, (2) University of Coimbra | Joao Carreira (2,1) Rui Caseiro (2) Jorge Batista (2) Cristian Sminchisescu (1) | We present a novel effective local feature aggregation method that we use in conjunction with an existing figure-ground segmentation sampling mechanism. This submission is described in detail in [1]. We sample multiple figure-ground segmentation candidates per image using the Constrained Parametric Min-Cuts (CPMC) algorithm. SIFT, masked SIFT and LBP features are extracted on the whole image, then pooled over each object segmentation candidate to generate global region descriptors. We employ a novel second-order pooling procedure, O2P, with two non-linearities: a tangent space mapping and power normalization. The global region descriptors are passed through linear regressors for each category, then labeled segments in each image having scores above some threshold are pasted onto the image in the order of these scores. Learning is performed using an epsilon-insensitive loss function on overlap with ground truth, similar to [2], but within a linear formulation (using LIBLINEAR). comp6: learning uses all images in the segmentation+detection trainval sets, and external ground truth annotations provided by courtesy of the Berkeley vision group. comp5: one model is trained for each category using the available ground truth segmentations from the 2012 trainval set. Then, on each image having no associated ground truth segmentations, the learned models are used together with bounding box constraints, low-level cues and region competition to generate predicted object segmentations inside all bounding boxes. Afterwards, learning proceeds similarly to the fully annotated case. 1. “Semantic Segmentation with Second-Order Pooling”, Carreira, Caseiro, Batista, Sminchisescu. ECCV 2012. 2. "Object Recognition by Ranking Figure-Ground Hypotheses", Li, Carreira, Sminchisescu. CVPR 2010. | 2012-09-23 19:11:47 |
BONN_O2PCPMC_FGT_SEGM | BONN_O2PCPMC_FGT_SEGM | (1) Universitfy of Bonn, (2) University of Coimbra, (3) Georgia Institute of Technology, (4) Vienna University of Technology | Joao Carreira(1,2), Adrian Ion(4), Fuxin Li(3), Cristian Sminchisescu(1) | We present a joint image segmentation and labeling model which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales using CPMC (Carreira and Sminchisescu, PAMI 2012), constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag (Ion, Carreira, Sminchisescu, ICCV2011), followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure (Ion, Carreira, Sminchisescu, NIPS2011). As meta-features we combine outputs from linear SVRs using novel second order O2P features to predict the overlap between segments and ground-truth objects of each class (Carreira, Caseiro, Batista, Sminchisescu, ECCV2012; see VOC12 entry BONNCMBR_O2PCPMC_LINEAR), bounding box object detectors, and kernel SVR outputs trained to predict the overlap between segments and ground-truth objects of each class (Carreira, Li, Sminchisescu, IJCV 2012). comp6: the O2P SVR learning uses all images in the segmentation+detection trainval sets, and external ground truth annotations provided by courtesy of the Berkeley vision group. | 2012-09-23 21:39:35 |
BONN_O2PCPMC_FGT_SEGM | BONN_O2PCPMC_FGT_SEGM | (1) Universitfy of Bonn, (2) University of Coimbra, (3) Georgia Institute of Technology, (4) Vienna University of Technology | Joao Carreira(1,2), Adrian Ion(4), Fuxin Li(3), Cristian Sminchisescu(1) | Same as before, except tilings non-maximal | 2013-08-08 05:54:53 |
Bayesian Dilation Network | Bayesian Dilation Network | University of Cambridge | Alex Kendall | http://arxiv.org/abs/1511.02680 | 2016-06-07 08:28:00 |
Bayesian FCN | Bayesian FCN | University of Cambridge | Alex Kendall | http://mi.eng.cam.ac.uk/projects/segnet/ | 2016-06-07 08:36:38 |
Fully conv net for segmentation and detection | BlitzNet | Inria | Nikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia Schmid | CNN for joint segmentation and detection (based on SSD). Input resolution 300. Trained on VOC07 trainval + VOC12 trainval. | 2017-03-17 18:24:29 |
Fully conv net for segmentation and detection | BlitzNet | Inria | Nikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia Schmid | CNN for joint segmentation and detection (based on SSD). Input resolution 512. Trained on VOC07 trainval + VOC12 trainval. | 2017-03-17 18:22:43 |
FCN | BlitzNet300 | INRIA | Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid | CNN for joint segmentation and detection (based on SSD). Input resolution 300. Operates with speed 24 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO. | 2017-07-19 13:57:45 |
FCN | BlitzNet512 | INRIA | Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid | CNN for joint segmentation and detection (based on SSD). Input resolution 512. Operates with speed 19 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO. | 2017-07-19 13:38:53 |
Objectness-aware Semantic Segmentation | CASIA_IVA_OASeg | Institute of Automation, Chinese Academy of Sciences | Yuhang Wang, Jing Liu, Yong Li, Jun Fu, Hang Song, Hanqing Lu | We propose an objectness-aware semantic segmentation framework (OA-Seg) consisting of two deep networks. One is a lightweight deconvolutional neural network (Light-DCNN) which obviously decreases model size and convergence time with impressive segmentation performance. The other one is an object proposal network (OPN) used to roughly locate object regions. MSCOCO is used to extend training data and CRF is used as post-processing. | 2016-05-21 01:52:15 |
CASIA_IVA_SDN | CASIA_IVA_SDN | National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences | Jun Fu, Jing Liu, Yuhang Wang, Zhenwei Shen, Zhiwei Fang, Hanqing Lu | We propose a Stacked Deconvolutional Network (SDN) for semantic segmentation. We stack multiple SDN units to make network deeper and meanwhile, dense connections and hierarchical supervision are adopted to promote network optimization. CRF is not employed! | 2017-07-29 06:00:31 |
CASIA_SegResNet_CRF_COCO | CASIA_SegResNet_CRF_COCO | Institude of Automation, Chinese Academy of Sciences | Xinze Chen, Guangliang Cheng, Yinghao Cai | We propose a novel semantic segmentation method, which consists of three parts: a SAR-based data augmentation method, a deeper residual network including three effective techniques and an online hard pixels mining. We combine these three parts to train an end-to-end network. | 2016-06-03 09:20:50 |
CCBM | CCBM | University of Tsinghua | Qiurui Wang, Chun Yuan, Zhihui Lin, Zhicheng Wang, Xin Qiu | We propose a method combined with convolutional neural network and Conditional Boltzmann Machines for object segmentation, called CCBM, which further utilizes human visual border detection method. We use CNNs to extract features and segment them by improved Conditional Boltzmann Machines. We also use Structured Random Forests based method to detect object border for a better effert. Finally, each superpixel is labelled as output. The proposed method for this submission was trained on VOC 2012 Segmentation training data and a subset of COCO 2014 training data. | 2015-11-29 07:26:11 |
Co-occurrent Features in Semantic Segmentation | CFNet | Amazon | Hang Zhang, Han Zhang, Chenguang Wang, Junyuan Xie | Recent work has achieved great success in utilizing global contextual information for semantic segmentation, including increasing the receptive field and aggregating pyramid feature representations. In this paper, we go beyond global context and explore the fine-grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts the distribution of co-occurrent features for a given target. To leverage the semantic context in the co-occurrent features, we build an Aggregated Co-occurrent Feature (ACF) Module by aggregating the probability of the co-occurrent feature within the co-occurrent context. ACF Module learns a fine-grained spatial invariant representation to capture co-occurrent context information across the scene. Our approach significantly improves the segmentation results using FCN and achieves superior performance 54.0% mIoU on Pascal Context, 87.2% mIoU on Pascal VOC 2012 and 44.89% mIoU on ADE20K datasets with ResNet-101 base network. | 2019-06-12 03:49:01 |
CMT-FCN-ResNet-CRF | CMT-FCN-ResNet-CRF | Intel labs China and Tsinghua University | Libin Wang, Anbang, Yao, Jianguo Li, Yurong Chen, Li Zhang? | We propose a novel coupled multi-task FCN. Both VOC 2012 and COCO dataset are used for training, and CRF is applied as post-processing step. | 2016-08-02 09:57:05 |
CRF as RNN | CRF_RNN | University of Oxford | Shuai Zheng | We introduce a new form of convolutional neural network, called CRF-RNN, which expresses a conditional random field (CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. See the paper: "Conditional Random Fields as Recurrent Neural Networks". | 2015-02-10 11:03:16 |
CTNet | CTNet | Nanjing University Of Science And Technology | CTNet | CTNet | 2020-10-29 01:38:27 |
Deep Parsing Network | CUHK_DPN_COCO | The Chinese University of Hong Kong | Ziwei Liu*, Xiaoxiao Li*, Ping Luo, Chen Change Loy, Xiaoou Tang | This work addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN architecture to model unary terms and additional layers are carefully devised to approximate the mean field algorithm (MF) for pairwise terms. It has several appealing properties. First, different from the recent works that combined CNN and MRF, where many iterations of MF were required for each training image during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing works as its special cases. Third, DPN makes MF easier to be parallelized and speeded up in Graphical Processing Unit (GPU). The system used for this submission was trained on augmented VOC 2012 and MS-COCO 2014 training set. Please refer to the paper "Semantic Image Segmentation via Deep Parsing Network" (http://arxiv.org/abs/1509.02634) for further information. | 2015-09-22 16:52:27 |
Learning to Predict CaC for semantic segmentation | CaCNet | CUHK | Jianbo Liu, Junjun He, Jimmy S. Ren, Yu Qiao, Hongsheng Li | Long-range contextual information is essential for achieving high-performance semantic segmentation. Previous feature re-weighting methods demonstrate that using global context for re-weighting feature channels can effectively improve the accuracy of semantic segmentation. However, the globally-sharing feature re-weighting vector might not be optimal for regions of different classes in the input image. In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps. In CaC-Net, a set of context-adaptive convolution kernels are predicted from the global contextual information in a parameter-efficient manner. When used for convolution with the semantic feature maps, the predicted convolutional kernels can generate the spatially-varying feature weighting factors capturing both global and local contextual information. Comprehensive experimental results show that our CaC-Net achieves superior segmentation performance on three public datasets, PASCAL Context, PASCAL VOC 2012 and ADE20K. | 2020-05-29 05:19:26 |
Deep G-CRF (QO) combined with Deeplab-v2 | CentraleSupelec Deep G-CRF | CentraleSupelec / INRIA | Siddhartha Chandra & Iasonas Kokkinos | We employ the deep Gaussian CRF Quadratic Optimization formulation to learn pairwise terms for semantic segmentation using the Deeplab-v2-resnet-101 network. Additionally, we use the dense-CRF post-processing to refine object boundaries. This work is an accepted paper at ECCV 2016 and will be presented at the conference. Please refer to our arXiv report here: http://arxiv.org/abs/1603.08358 We will update the report with more details soon. | 2016-08-12 11:21:28 |
"Super-Human" boundaries combined with Deeplab | CentraleSuperBoundaries++ | CentraleSupelec / INRIA | Iasonas Kokkinos | We exploit our "super-human" boundary detector with a multi-resolution variant of the Deeplab system (LargeFOV, pre-trained on MSCOCO). The boundary information comes in the form of Normalized Cut eigenvectors used in DenseCRF inference and boundary-dependent pairwise terms, used in Graph-Cut inference. This is an updated version of our earlier submission, using more training rounds and a single-shot training algorithm. Details on the system and our "super human" boundary detector are provided in http://arxiv.org/abs/1511.07386 | 2016-01-13 16:00:02 |
modified deeplab | Curtin_Qilin | Curtin University | Qilin li | a modified version of deeplab-resnet101 | 2018-03-09 03:59:28 |
Dense Context-Aware Network for Semantic Segmentat | DCANet | Institution of Information Science and Electrical Engineering, Zhejiang University | Yifu Liu Chenfeng Xu Zhihong Chen Chao Chen | In contrast to some previous works utilizing the multi-scale context fusion, we propose a novel module, named Dense Context-Aware (DCA) module, to adaptively integrate local detail information with global dependencies through a more efficient way. Driven by the contextual relationship, the DCA module can effectively complete the aggregation of multi-scale information to generate more powerful features. Meanwhile, the proposed DCA module is easy to apply and can be flexibility adjusted inside the existing deep networks. To further capture the long-range contextual information, we specially design two extended structures based on the DCA modules. By taking a progressive mannner under different scales, our networks can make use of context information to improve feature representations for robust segmentation. Due to privacy concerns, we will make the paper and code publicly available at https://github.com/YifuLiuL/DCANet. | 2020-01-13 08:36:04 |
Discriminative Feature Network | DFN | HUST | Changqian Yu | We design a discriminative feature network for semantic segmentation. | 2018-01-15 04:32:54 |
DFPnet for real-time semantic segmentation | DFPnet | Dalian Maritime University | Shuhao Ma | Deep Feature Pyramid net(DFPnet) is the first model that can apply image pyramid technology to real-time semantic segmentation. DFPnet is a flexible model which can be applied to image segmentation, target detection, image classification tasks, and can make corresponding adjustments for different data, facing the network can change different structures, in short, DFPnet adopts open thinking. | 2018-08-26 12:09:50 |
Deep Dual Learning for Semantic Image Segmentation | DIS | Sun Yat-Sen University, The Chinese University of Hong Kong | Ping Luo*, Guangrun Wang*, Liang Lin, Xiaogang Wang | We present a novel learning setting, which consists of two complementary learning problems that are jointly solved. One predicts labelmaps and tags from images, and the other reconstructs the images using the predicted labelmaps. Given an image with tags only, its labelmap can be inferred by leveraging the images and tags as constraints. The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance. DIS is able to clean tags that have noises. | 2017-09-13 18:25:17 |
Dual-path Class-aware Attention Network | DP-CAN | Tianjin University | Hailong Zhu | Our proposed dual-path class-aware attention network exploit category-level context-free attention mechanism for semantic segmentation. This model is trained with pascal voc 2012 train_aug and finetuned on trainval. Multi-scale inputs and flipping are used in testing. | 2019-01-25 12:36:41 |
Dual-path Class-aware Attention Network | DP-CAN_decoder | Tianjin University | Hailong Zhu | Dual-path Class-aware Attention Network with dual-path refinement module as decoder. | 2019-01-26 15:07:22 |
DP_ResNet_CRF | DP_ResNet_CRF | (1) Beijing University of Posts and Telecommunications (BUPT); (2) Beijing Moshanghua Tech (DressPlus) | Lu Yang(1, 2), Qing Song(1), Bin Liu(2), Yuhang He(2), Zuoxin Li(2), Xiongwei Xia(2) | Our network is based on ResNet-152, dilation convolution \ data augmentation \ pre-train on coco \ multi scale test are used for this submission. We also use densecrf as post-processing to refine object boundaries. | 2016-11-10 12:05:10 |
Dynamic routing encoding network | DREN | Huazhong University of Science and Technology | ZhaoyangHu | On the basis of FCN network, we add dynamic routing to classify the context and add the context to help the network recognise. | 2019-03-29 02:04:11 |
Deep Layer Cascade (LC) | Deep Layer Cascade (LC) | The Chinese University of Hong Kong | Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang | We propose a novel deep layer cascade (LC) method to improve the accuracy and speed of semantic segmentation. Unlike the conventional model cascade (MC) that is composed of multiple independent models, LC treats a single deep model as a cascade of several sub-models. Earlier sub-models are trained to handle easy and confident regions, and they progressively feed-forward harder regions to the next sub-model for processing. Convolutions are only calculated on these regions to reduce computations. The proposed method possesses several advantages. First, LC classifies most of the easy regions in the shallow stage and makes deeper stage focuses on a few hard regions. Such an adaptive and 'difficulty-aware' learning improves segmentation performance. Second, LC accelerates both training and testing of deep network thanks to early decisions in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable framework, allowing joint learning of all sub-models. We evaluate our method on PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and fast speed. Please refer to the paper "Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade" (https://arxiv.org/abs/1704.01344) for further information. | 2017-04-06 14:46:45 |
DeepLab-CRF | DeepLab-CRF | (1) UCLA (2) Google (3) TTIC (4) ECP / INRIA | Liang-Chieh Chen (1) and George Papandreou (2,3) and Iasonas Kokkinos (4) and Kevin Murphy (2) and Alan L. Yuille (1) | This work brings together methods from Deep Convolutional Neural Networks (DCNNs) and probabilistic graphical models for the task of semantic image segmentation. We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Efficient computation is achieved by (i) careful network re-purposing and (ii) a novel application of the ’hole’ algorithm from the wavelet community, allowing dense computation of neural net responses at 8 frames per second on a modern GPU. See http://arxiv.org/abs/1412.7062 for further information. | 2014-12-23 02:29:44 |
DeepLab-CRF-Attention | DeepLab-CRF-Attention | (1) UCLA (2) Baidu | Liang-Chieh Chen (1) and Yi Yang (2) and Jiang Wang (2) and Wei Xu (2) and Alan L. Yuille (1) | This work is the extension of DeepLab-CRF-COCO-LargeFOV (pretrained on MS-COCO) by further incorporating (1) multi-scale inputs (2) extra supervision and (3) attention model. Further information will be provided in an *updated* version of http://arxiv.org/abs/1511.03339. | 2016-02-03 23:10:45 |
DeepLab-CRF-Attention-DT | DeepLab-CRF-Attention-DT | (1) UCLA (2) Google | Liang-Chieh Chen (1) and Jonathan T. Barron (2) and George Papandreou (2) and Kevin Murphy (2) and Alan L. Yuille (1) | This work is the extension of DeepLab-CRF-Attention by further incorporating a discriminatively trained Domain Transform. Further information will be provided in an *updated* version of http://arxiv.org/abs/1511.03328. | 2016-02-03 23:13:01 |
DeepLab-CRF-COCO-LargeFOV | DeepLab-CRF-COCO-LargeFOV | (1) Google (2) UCLA | George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) | Similar to DeepLab-CRF-COCO-Strong, but the network has a larger field-of-view on the image. Further information will be provided in an updated version of http://arxiv.org/abs/1502.02734. | 2015-03-18 04:09:39 |
DeepLab-CRF-COCO-Strong | DeepLab-CRF-COCO-Strong | (1) Google (2) UCLA | George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) | Similar to DeepLab-CRF, but network training also included the pixel-level semantic segmentation annotations of the MS-COCO (v. 2014) dataset. See http://arxiv.org/abs/1502.02734 for further information. | 2015-02-11 01:44:22 |
DeepLab-CRF-LargeFOV | DeepLab-CRF-LargeFOV | (1) Google (2) UCLA | George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) | Similar to DeepLab-CRF, but the network has a larger field-of-view on the image. Note that the model has NOT been fine-tuned on MS-COCO dataset. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062. | 2015-03-28 17:22:26 |
DeepLab-CRF-MSc | DeepLab-CRF-MSc | (1) UCLA (2) Google (3) TTIC (4) ECP / INRIA | Liang-Chieh Chen (1) and George Papandreou (2,3) and Iasonas Kokkinos (4) and Kevin Murphy (2) and Alan L. Yuille (1) | Similar to DeepLab-CRF, except that multiscale features (direct connections from intermediate layers to the classifier) are also exploited. Specifically, we attach to the input image and each of the first four max pooling layers a two-layer MLP (first layer: 128 3x3 convolutional filters, second layer: 128 1x1 convolutional filters) whose score map is concatenated to the VGG final layer score map. The final score map fed into the softmax layer thus consists of 4,096 + 5 * 128 = 4,736 channels. | 2014-12-30 02:52:40 |
DeepLab-MSc-CRF-LargeFOV | DeepLab-MSc-CRF-LargeFOV | (1) Google (2) UCLA | George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) | Similar to DeepLab-MSc-CRF, but the network has a larger field-of-view on the image. Note that the model has NOT been fine-tuned on MS-COCO dataset. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062. | 2015-04-02 06:57:21 |
DeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint | DeepLab-MSc-CRF-LargeFOV-COCO-CrossJoint | (1) Google (2) UCLA | George Papandreou (1) and Liang-Chieh Chen (2) and and Kevin Murphy (1) and Alan L. Yuille (2) | Similar to Deeplab-CRF model, but with feature extraction at multiple network levels and large field of view. We jointly train DeepLab on Pascal VOC 2012 and MS-COCO, sharing the top-level network weights for the common classes, using pixel-level annotation in both datasets. Further information will be provided in an updated version of http://arxiv.org/abs/1412.7062 and http://arxiv.org/abs/1502.02734. | 2015-04-26 17:48:09 |
DeepLab_XI | DeepLab_XI | xiaoi research | Bo Zhang, Xiaoke Wang, Guixiong Chen | We extend the deeplab method. Both VOC 2012 and COCO dataset are used for training. | 2019-05-07 07:08:00 |
DeepLabv2-CRF | DeepLabv2-CRF | (1) UCLA (2) Google (3) ECP / INRIA | Liang-Chieh Chen (1,2) and George Papandreou (2) and Iasonas Kokkinos (3) and Kevin Murphy (2) and Alan L. Yuille (1) | DeepLabv2-CRF is based on three main methods. First, we employ convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool to repurpose ResNet-101 (trained on image classification task) in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within DCNNs. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and fully connected Conditional Random Fields (CRFs). See http://arxiv.org/abs/1606.00915 for further information. | 2016-06-06 01:59:20 |
DeepLabv3 | DeepLabv3 | Google Inc. | Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam | In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks. We propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. See http://arxiv.org/abs/1706.05587 for further information. | 2017-06-20 01:59:26 |
DeepLabv3+ | DeepLabv3+ | Google Inc. | Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam | Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on the PASCAL VOC 2012 semantic image segmentation dataset and achieve a state-of-art performance without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow. For details, please refer to https://arxiv.org/abs/1802.02611. | 2018-02-09 16:12:04 |
DeepLabv3+_AASPP | DeepLabv3+_AASPP | Tsinghua University | Jiancheng Li | DeepLabv3+ with Attention Atrous Spatial Pyramid Pooling. | 2018-05-22 15:44:09 |
DeepLabv3+_JFT | DeepLabv3+_JFT | Google Inc. | Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam | DeepLabv3+ by fine-tuning from the model pretrained on JFT-300M dataset. For details, please refer to https://arxiv.org/abs/1802.02611. | 2018-02-09 16:16:47 |
DeepLabv3-JFT | DeepLabv3-JFT | Google Inc. | Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam | DeepLabv3 by fine-tuning from the model pretrained on JFT-300M dataset. See http://arxiv.org/abs/1706.05587 for further information. | 2017-08-05 01:16:48 |
DeepSqueeNet | DeepSqueeNet | Sun Yat-sen University, SYSU | HongPeng Wu,Long Chen, Kai Huang | We propose a method for semantic image segmentation. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1)SmallerDNNsrequirelesscommunicationacrossservers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an au-tonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To pro-vide all of these advantages, we propose a CNN architecture called DeepSqueeNet to semantic image segmentation . It based on SqueezeNet and VGG16. DeepSqueeNet achieves Deeplab(Based on VGG16) accuracy on semantic image segmentation with 10x fewer parameters. | 2016-07-20 13:16:16 |
DeepSqueeNet_CRF | DeepSqueeNet_CRF | Sun Yat-sen University, SYSU | HongPeng Wu,Long Chen, Kai Huang | We propose a method for semantic image segmentation. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1)SmallerDNNsrequirelesscommunicationacrossservers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an au-tonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To pro-vide all of these advantages, we propose a CNN architecture called DeepSqueeNet to semantic image segmentation . It based on SqueezeNet and VGG16. DeepSqueeNet achieves Deeplab(Based on VGG16) accuracy on semantic image segmentation with 10x fewer parameters. we add CRF | 2016-07-21 12:47:19 |
Dual Multi-Scale Manifold Ranking Network | Dual-Multi-Reso-MR | Wuhan University | Mi Zhang, Ye Lv, Min Luo, Jiasi Yi | We proposed a multi-scale network which utilize the dilated and non-dilated convolutional network as a dual. In both networks, a manifold ranking optimization method is embedded to optimize in a single stream jointly, i.e. no need to train the unary and pairwise network separately. And such a feedforward network makes it possible to train in an end-to-end fashion and guarantee a global optimal. | 2016-11-03 12:27:49 |
Expectation-Maximization Attention Networks for S | EMANet152 | Peking University | Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu | We formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure. | 2019-08-15 16:22:33 |
ESPNetv2 | ESPNetv2 | University of Washington | Hannaneh Hajishirzi Mohammad Rastegari Linda Shapiro | We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2, for modeling visual and sequential data. Our network uses group point-wise and depth-wise dilated separable convolutions to learn representations from a large effective receptive field with fewer FLOPs and parameters. The performance of our network is evaluated on three different tasks: (1) object classification, (2) semantic segmentation, and (3) language modeling. Experiments on these tasks, including image classification on the ImageNet and language modeling on the PenTree bank dataset, demonstrate the superior performance of our method over the state-of-the-art methods. Our network has better generalization properties than ShuffleNetv2 when tested on the MSCOCO multi-object classification task and the Cityscapes urban scene semantic segmentation task. Our experiments show that ESPNetv2 is much more power efficient than existing state-of-the-art efficient methods including ShuffleNets and MobileNets. Our code is open-source and available at https://github.com/sacmehta/ESPNetv2 | 2019-03-23 22:32:58 |
EfficientNet-L2 + NAS-FPN + Noisy Student | EfficientNet-L2 + NAS-FPN + Noisy Student | Google Inc. | Golnaz Ghiasi, Barret Zoph, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin Cubuk, Quoc V. Le | Single-scale testing and without pre-training on COCO. See https://arxiv.org/abs/2006.06882 for details. | 2020-06-15 19:50:31 |
Efficient_Segmentation | EfficientNet_MSCID_Segmentation | Tianjin University | Xiu Su, Hongyan Xu | EfficientNet with MSCID module for segmentation | 2019-08-15 02:00:39 |
Context Encoding for Semantic Segmentation | EncNet | Rutgers University, Amazon, SenseTime, CUHK | Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal | Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017. | 2018-03-15 21:21:01 |
ExFuse | ExFuse | Fudan University, Megvii Inc. | Zhenli Zhang, Xiangyu Zhang, Chao Peng, Jian Sun | For more details, please refer to https://arxiv.org/abs/1804.03821. | 2018-05-22 09:27:16 |
Dilated FCN using VGG16 and Skip Architectures | FCN-2s_Dilated_VGG16 | Center for Cognitive Skill Enhancement, Independent University Bangladesh | Sharif Amit Kamran, Ali Shihab Sabbir | The weights were transferred from VGG16 and then the fully connected layers were converted to convolutional layers. Dilated convolution was used instead of vanila convolution in fc6 layer.The upsampling was done with Stride 2 and the upsampled layers were concatened in steps using four skip architectures. Pascal VOC2012 training data and SBD traning and validation data was used for training in two stages. | 2017-07-20 20:23:41 |
Dilated FCN using VGG19 and Skip Architectures | FCN-2s_Dilated_VGG19 | Center for Cognitive Skill Enhancement, Independent University Banlgadesh | Sharif Amit Kamran, Ali Shihab Sabbir | The weights were transferred from VGG19 and then the fully connected layers were converted to convolutional layers. Dilated convolution was used instead of vanilla convolution in fc6 layer.The upsampling was done with Stride 2 and the upsampled layers were concatenated in steps using four skip architectures. Pascal VOC2012 training data and SBD training and validation data was used for training in two stages. | 2017-07-11 16:57:52 |
Fully convolutional net | FCN-8s | UC Berkeley | Jonathan Long, Evan Shelhamer, Trevor Darrell | We apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net. | 2014-11-12 09:08:39 |
Fully convolutional net | FCN-8s-heavy | UC Berkeley | Jonathan Long, Evan Shelhamer, Trevor Darrell | We apply fully convolutional nets end-to-end, pixels-to-pixels for segmentation, rearchitecting nets that have been highly successful in classification. We achieve pixelwise prediction and learning in nets with extensive pooling and subsampling using in-network upsampling layers. Inference and learning are both performed on whole images by dense feedforward computation and backpropagation. With skip layers that combine deep, coarse, semantic information and shallow, fine, appearance information, we produce refined, detailed segmentations. We train our fully convolutional net, FCN-8s, end-to-end for segmentation while taking advantage of recent successes in classification by initializing from parameters adapted from the VGG 16-layer net. The network is learned online with high momentum for better optimization. | 2016-02-06 09:57:31 |
FCN16s-Resnet101 | FCN16s-Resnet101 | peking university | personal | FCN?output stride 16? based on resnet101 | 2019-01-26 12:50:15 |
FCN with Cross-layer Concat and Multi-scale Pred | FCN_CLC_MSP | National Tsing Hua University, Taiwan | Tun-Huai Shih, Chiou-Ting Hsu | We replace the original fc layers in VGG-16 with several conv and pool layers to extract hierarchical features (Pool3-5 and additional pool6-8). We then use pool3-8 to generate multi-scale predictions, and aggregate them to derive the dense prediction result. To jointly exploit the information from lower- and higher-level layers when making prediction, we adopt cross-layer concatenation to combine poolx features (lower-level) with prediction result of coarser stream (high-level). This makes the predictions of finer streams more robust. We do not adopt any pre- or post- processing steps. The number of parameters is about 36M, while the original FCN is 134M. We train all prediction streams at the same time using VOC additional annotated images (10582 in total), and it takes less than one day to train our FCN model on a single GTX Titan X GPU. | 2016-07-01 04:27:14 |
FDNet_16s | FDNet_16s | HongKong University of Science and Technology, altizure.com | Mingmin Zhen, Jinglu Wang, Siyu Zhu, Runze Zhang, Shiwei Li, Tian Fang, Long Quan | A fully dense neural network with encoder-decoder structure is proposed that we abbreviate as FDNet. For each stage in the decoder module, feature maps of all the previous blocks are adaptively aggregated to feedforward as input. | 2018-03-22 08:52:44 |
Weaky sup. segmentation by region scores' pooling | FER_WSSS_REGION_SCORE_POOL | University of Zagreb | Josip Krapac Sinisa Segvic | We address the problem of semantic segmentation of objects in weakly supervised setting, when only image-wide labels are available. We describe an image with a set of pre-trained convolutional features (from layer conv5.4 of 19-layer VGG-E network) and embed this set into a Fisher vector (64 component GMM, diagonal covariance for components, normalization only with inverse of Fisher matrix). We learn a linear classifier (logistic regression), apply the learned classifier on the set of all image regions (efficiently, using integral images), and propagate region scores back to the pixels. Compared to the alternatives the proposed method is simple, fast in inference, and especially in training. The details are described in the conference paper Krapac, Segvic: "Weakly-supervised semantic segmentation by redistributing region scores back to the pixels", GCPR 2016 | 2016-06-14 15:02:23 |
FSSI300 | FSSI300 | Beihang University | Zuoxin Li | FSSI300 Res50 | 2018-06-21 11:27:57 |
Learning Feature Pyramids | Feature_Pyramids | Sun Yat-Sen University, The Chinese University of Hong Kong | Guangrun Wang, Wei Yang | This model predicts segmentation via learning feature pyramids (LFP). LFP is originally used for human pose machine, described in the paper "Learning Feature Pyramids for Human Pose Estimation" (https://arxiv.org/abs/1708.01101). We extend it to the semantic image segmentation. The code and model are available at https://github.com/wanggrun/Learning-Feature-Pyramids | 2018-06-06 03:55:27 |
Gluon DeepLabV3 152 | Gluon DeepLabV3 152 | Amazon AI | Hang Zhang et al. | https://gluon-cv.mxnet.io | 2018-10-03 18:18:27 |
GluonCV DeepLabV3 | GluonCV DeepLabV3 | Amazon | Hang Zhang et al. | See details in GluonCV https://gluon-cv.mxnet.io/ | 2018-09-07 00:48:31 |
GluonCV FCN | GluonCV FCN | Amazon | Hang Zhang et al. | Please see details in GluonCV https://gluon-cv.mxnet.io/ | 2018-09-07 01:11:12 |
GluonCV PSP | GluonCV PSP | Amazon | Hang Zhang et al. | Please see details in GluonCV https://gluon-cv.mxnet.io/ | 2018-09-07 00:51:53 |
Hierarchical Parsing Net | HPN | UESTC | Hengcan Shi | HPN leverages global image semantic information and context among multiple objects to boost semantic segmentation. | 2017-12-13 02:30:24 |
Hamburger | HamNet_w/o_COCO | Peking University | Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, Zhouchen Lin | Paper: Is Attention Better Than Matrix Decomposition? Accepted to ICLR 2021. Link: https://openreview.net/pdf?id=1FvkSpWosOl Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank completion problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. | 2021-01-25 07:03:38 |
HikSeg_COCO | HikSeg_COCO | Hikvision Research Institute | Haiming Sun, Di Xie, Shiliang Pu | We begin with DilatedNet, and add a module which multi-scale features are combined step-wise. The network is able to learn to put different weights to features of different scales. This submission is first trained on COCO training set and validation set, then fine-tuned on PASCAL training set. | 2016-10-02 09:16:41 |
Hypercolumn | Hypercolumn | UC Berkeley | Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik | Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the con- trary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activa- tions of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmenta- tion, where we improve state-of-the-art from 49.7 mean APr to 60.0, keypoint localization, where we get a 3.3 point boost over and part labeling, where we show a 6.6 point gain over a strong baseline. | 2015-04-09 02:01:36 |
Learning Object Interactions and Descriptions for | IDW-CNN | Sun Yat-sen University; The Chinese University of Hong Kong | Guangrun Wang*, Ping Luo*, Liang Lin, Xiaogang Wang | This work increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset. Unlike previous image captioning datasets, where captions were manually and densely annotated, images and their descriptions in IDW are automatically downloaded from Internet without any manual cleaning and refinement. An IDW-CNN is proposed to jointly train IDW and existing image segmentation dataset such as Pascal VOC 2012 (VOC). | 2017-06-30 00:11:24 |
KSAC(X-65) with hard image | KSAC-H | The University of Technology, Sydney | Ye Huang | KSAC (Xception-65) + hard image bootstrap in OS = 16 | 2019-10-26 14:19:05 |
Ladder DenseNet-161 | LDN-161 | University of Zagreb | Ivan Kreso, Josip Krapac, Sinisa Segvic | Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images (journal submission). Trained on train+val+augmented data. DenseNet-161 backbone. | 2019-04-18 19:03:42 |
Laplacian reconstruction and refinement | LRR_4x_COCO | University of California Irvine | Golnaz Ghiasi, Charless C. Fowlkes | We introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission is based on VGG-16 and it was trained on augmented PASCAL VOC and MS-COCO data. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation (http://arxiv.org/abs/1605.02264). | 2016-06-16 06:19:08 |
Laplacian reconstruction and refinement | LRR_4x_ResNet_COCO | University of California Irvine | Golnaz Ghiasi Charless C. Fowlkes | We introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission is based on ResNet-101 and it was trained on augmented PASCAL VOC and MS-COCO data. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation (http://arxiv.org/abs/1605.02264). | 2016-07-18 19:07:32 |
Laplacian reconstruction and refinement | LRR_4x_de_pyramid_VOC | University of California Irvine | Charless C. Fowlkes Golnaz Ghiasi | We introduce a CNN architecture that reconstructs high-resolution class label predictions from low-resolution feature maps using class-specific basis functions. Our multi-resolution architecture also uses skip connections from higher resolution feature maps to successively refine segment boundaries reconstructed from lower resolution maps. The model used for this submission was trained on augmented PASCAL VOC. Please refer to our technical report: Laplacian Reconstruction and Refinement for Semantic Segmentation | 2016-06-07 03:55:11 |
CVRSUAD submission, paper ID 21 | Ladder_DenseNet | UNIZG-FER | ivan.kreso@fer.hr | CVRSUAD submission paper ID 21: Ladder-style DenseNets for Semantic Segmentation of Large Natural Images | 2017-07-25 17:42:21 |
Large_Kernel_Matters | Large_Kernel_Matters | Tsinghua University | Peng Chao, Yu Gang, Zhang Xiangyu | We use the large kernel to generate the feature map and score map, resnet101 is applied with COCO, SBD datasets. No CRF or similar post processing methods are employed! No Multiscale | 2017-03-16 01:58:16 |
Deep Gaussian CRF | MERL_DEEP_GCRF | Mitsubishi Electric Research Laboratories | Raviteja Vemulapalli Oncel Tuzel | We use two deep networks, one for generating unary potentials and the other for generating pairwise potentials. Then we use Gaussian CRF model for structured prediction. | 2015-10-17 14:55:31 |
Gaussian CRF on top of Deeplab CNN | MERL_UMD_Deep_GCRF_COCO | University of Maryland, College Park | Raviteja Vemulapalli (UMD) Oncel Tuzel (MERL) Ming-Yu Liu (MERL) Rama Chellappa (UMD) | We use two deep networks, one for generating unary potentials and the other for generating pairwise potentials. Then we use a Gaussian CRF model for structured prediction. The entire model is trained end-to-end. | 2016-01-15 05:23:48 |
MSCI for Semantic Segmentation | MSCI | Shenzhen University | Di Lin; Yuanfeng Ji | We propose a novel scheme for aggregating features from different scales, which we refer to as Multi-Scale Context Intertwining (MSCI). Please see our paper http://vcc.szu.edu.cn/Di_Lin/papers/MSCI_eccv2018.pdf | 2018-07-08 04:07:31 |
Box-Supervision | MSRA_BoxSup | Microsoft Research Asia | Jifeng Dai, Kaiming He, Jian Sun | BoxSup makes use of bounding box annotations to supervise convolutional networks for semantic segmentation. From these boxes, we estimate segmentation masks with the help of region proposals. These masks are used to update the convolutional network, which is in turn fed back to mask estimation. This procedure is iterated. This result is achieved by semi-supervised training on the segmentation masks from PASCAL VOC and a large amount of bounding boxes from Microsoft COCO. See http://arxiv.org/abs/1503.01640 for details. | 2015-02-10 09:35:40 |
MSRA_BoxSup | MSRA_BoxSup | Microsoft Research Asia | Jifeng Dai, Kaiming He, Jian Sun | This is an implementation of "BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation". We train a BoxSup model using the union set of VOC 2007 boxes, COCO boxes, and the augmented VOC 2012 training set. See http://arxiv.org/abs/1503.01640 for details. | 2015-05-18 09:42:54 |
Convolutional Feature Masking | MSRA_CFM | Microsoft Research Asia | Jifeng Dai, Kaiming He, Jian Sun | The method exploits shape information via ``masking" convolutional features. The proposal segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The CNN features of segments are directly masked out from these maps and used to train classifiers for recognition. Competitive accuracy and compelling computational speed are demonstrated by the proposed method. We achieve this result by utilizing segment proposal generated by Multi-scale Combinatorial Grouping (MCG), and initializing network parameters from the VGG 16-layer net. See http://arxiv.org/abs/1412.1283 for details. | 2014-12-17 02:56:52 |
Multi-Scale Residual Network for Segmentation | MSRSegNet-UW | University of Washington | Linda Shapiro, Hannaneh Hajishirzi | Using the prior work, we create a custom network that is fast as well as accurate. Our network runs at 21 fps (full resolution) while at 60 fps at a resolution of 224 x224. At low resolution, our network is as accurate as FCN-8s. More details are here: https://arxiv.org/pdf/1711.08040.pdf | 2017-11-23 01:26:37 |
MasksegNet | MasksegNet | Kyunghee university | masksegnet | MasksegNet | 2019-05-16 12:20:50 |
Multi-Task Learning for Human Pose Estimation | Metu_Unified_Net | Middle East Technical University | Salih Karagoz, Muhammed Kocabas, Emre Akbas | Multi-Task Learning for Multi-Person Pose Estimation, Human Semantic Segmentation and Human Detection. The model works simultaneously. We just only trained with coco-dataset. No additional data has used. | 2018-03-10 12:39:37 |
Multipath-RefineNet | Multipath-RefineNet | The University of Adelaide; ACRV; | Guosheng Lin; Anton Milan; Chunhua Shen; Ian Reid; | Please refer to our technical report for details: "RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation" (https://arxiv.org/abs/1611.06612). Our source code is available at: https://github.com/guosheng/refinenet | 2017-01-17 18:03:57 |
Unified Object Detection and Semantic Segmentation | NUS_UDS | NUS | Jian Dong, Qiang Chen, Shuicheng Yan, Alan Yuille | Motivated by the complementary effect observed from the typical failure cases of object detection and semantic segmentation, we propose a uni?ed framework for joint object detection and semantic segmentation [1]. By enforcing the consistency between final detection and segmentation results, our unified framework can effectively leverage the advantages of leading techniques for these two tasks. Furthermore, both local and global context information are integrated into the framework to better distinguish the ambiguous samples. By jointly optimizing the model parameters for all the components, the relative importance of different component is automatically learned for each category to guarantee the overall performance. [1] Jian Dong, Qiang Chen, Shuicheng Yan, Alan Yuille: Towards Unified Object Detection and Semantic Segmentation. ECCV 2014 | 2014-10-29 16:07:10 |
Joint a network to guided and masking | OBP-HJLCN | national central university | Jia-Ching Wang , Chien-Yao Wang, Jyun-Hong Li | We proposed a hierarchical joint guided networks which has ability to predict objects greater and finer. We also proposed a novel way to guided segmentation by object and boundary. | 2016-09-13 15:21:45 |
Oxford_TVG_CRF_RNN_COCO | Oxford_TVG_CRF_RNN_COCO | [1] University of Oxford / [2] Baidu IDL | Shuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1] | We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF inference with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, Berkeley augmented data and a subset of COCO 2014 train data. More details will be available in the paper http://arxiv.org/abs/1502.03240. | 2015-04-22 11:26:57 |
Oxford_TVG_CRF_RNN_VOC | Oxford_TVG_CRF_RNN_VOC | [1] University of Oxford / [2] Baidu IDL | Shuai Zheng [1]; Sadeep Jayasumana [1]; Bernardino Romera-Paredes [1]; Chang Huang [2]; Philip Torr [1] | We introduce a new form of convolutional neural network, called CRF-RNN, which expresses Dense Conditional Random Fields (Dense CRF) as a Recurrent Neural Network (RNN). We plug this CRF-RNN network into an existing deep CNN to obtain a system that combines the desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF with CNNs, making it possible to train the whole system end-to-end with the usual back-propagation algorithm. The system used for this submission was trained on VOC 2012 Segmentation challenge train data, and Berkeley augmented data (COCO dataset was not used). More details will be available in the paper http://arxiv.org/abs/1502.03240. | 2015-04-22 10:24:43 |
Higher Order CRF in CNN | Oxford_TVG_HO_CRF | University of Oxford | Anurag Arnab Sadeep Jayasumana Shuai Zheng Philip Torr | We integrate a conditional random field with higher order potentials into a deep neural network. Our higher order potentials are based on object detector outputs and superpixel oversegmentation, and formulated such that their corresponding mean-field updates are differentiable. For further details, please refer to http://arxiv.org/abs/1511.08119 | 2016-03-16 21:12:47 |
PAN | PAN | BIT, Megvii Inc. | Hanchao Li | Pyramid Attention Network for Semantic Segmentation; (without COCO pretrain) | 2018-07-04 13:10:20 |
POSTECH_DeconvNet_CRF_VOC | POSTECH_DeconvNet_CRF_VOC | POSTECH (Pohang University of Science and Technology) | Hyeonwoo Noh, Seunghoon Hong, Bohyung Han. | We propose a novel semantic segmentation algorithm by learning a deconvolution network. Our deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. The trained network is applied to each proposal in an input image, and the final semantic segmentation map is constructed by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks; our segmentation method typically identifies more detailed structures and handles objects in multiple scales more naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset without external training data. See http://arxiv.org/abs/1505.04366 for details. | 2015-08-18 18:42:18 |
POSTECH_EDeconvNet_CRF_VOC | POSTECH_EDeconvNet_CRF_VOC | POSTECH(Pohang University of Science and Technology) | Hyeonwoo Noh, Seunghoon Hong, Bohyung Han | We propose a novel semantic segmentation algorithm by learning a deconvolution network. Our deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. The trained network is applied to each proposal in an input image, and the final semantic segmentation map is constructed by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks; our segmentation method typically identifies more detailed structures and handles objects in multiple scales more naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset without external training data. | 2015-04-22 21:33:03 |
PSPNet | PSPNet | CUHK, SenseTime | Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia | Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU score as 85.4% on PASCAL VOC 2012 and 80.2% on Cityscapes. https://arxiv.org/abs/1612.01105 | 2016-12-06 02:22:13 |
Encoder-decoder with FCN | PSP_flow | The University of Northwestern Polytechnical University | Yanhua Zhang | Spatial pyramid structure and a feature alignment. | 2021-07-13 14:21:30 |
Residual Forest classifier with FCN features | RRF-4s | Monash University | Yan Zuo, Tom Drummond | We replace the solver component of FCN with a Random Residual Forest (RRF) Classifier and treat FCN as a generic feature extractor to train the RRF classifier | 2016-11-30 23:31:43 |
Tensor low-rank Reconstruction | RecoNet152_coco | Tencent | Please contact with wanli chen chenwl@mail.sustech.edu.cn | Please contact with wanli chen chenwl@mail.sustech.edu.cn | 2019-10-26 04:39:21 |
Res2Net:Multi-scale Backbone Architecture | Res2Net | Nankai University | Shanghua Gao, Ming-Ming Cheng | Res2Net: A New Multi-scale Backbone Architecture (TPAMI20) We propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. source code: https://github.com/Res2Net | 2020-02-22 05:29:02 |
ResNet-38 with COCO | ResNet-38_COCO | The University of Adelaide | Zifeng Wu, Chunhua Shen, Anton van den Hengel | Pre-trained with COCO, and tested with multiple scales. See our report https://arxiv.org/abs/1611.10080 for details. | 2017-01-22 04:44:14 |
ResNet-38 Multi-scale | ResNet-38_MS | The University of Adelaide | Zifeng Wu, Chunhua Shen, Anton van den Hengel | Single model; multi-scale testing; NO COCO; NO CRF-based post-processing. For more details, refer to our report https://arxiv.org/abs/1611.10080 and code https://github.com/itijyou/ademxapp. | 2016-12-09 12:19:24 |
ResNet_DUC_HDC_TuSimple | ResNet_DUC_HDC | UC San Diego, CMU, UIUC, TuSimple | Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, Garrison Cottrell | We improve pixel-wise semantic segmentation by manipulating convolution-related operations: 1) we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information; 2) we implement hybrid dilated convolution (HDC) to aggregate global information and alleviate what we call the "gridding issue" caused by the standard dilated convolution operation. Current submission is single model and single scale testing. Pretrained models: https://goo.gl/DQMeun Paper link: https://arxiv.org/abs/1702.08502 | 2017-03-01 20:22:41 |
ResSegNet | ResSegNet | SCUT-CIVIC | Mengxi Li | - | 2018-05-28 04:39:01 |
SDS | SDS | UC Berkeley | Bharath Hariharan Pablo Arbelaez Ross Girshick Jitendra Malik | We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [1]), introducing a novel architecture tailored for SDS. We then use category-specific, top-down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 4 point boost (8% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work. | 2014-07-21 22:46:22 |
SRC-B-MachineLearningLab | SRC-B-MachineLearningLab | Samsung R&D Institue China - Beijing, Machine Learning Lab | Jianlong Yuan, Shu Wang, Wei Zhao, Hanchao Jia, Zhenbo Luo | The model is pretrained on ImageNet, and fineturned on COCO VOC SBD. The result is tested by multi scale and filp. The paper is in preparing. | 2018-04-19 03:08:39 |
Score Map Pyramid Net | Score Map Pyramid Net | Dalian Maritime University | Shuhao Ma | Our method is fast | 2018-07-06 13:27:16 |
SegModel | SegModel | Peking Univerisity | Falong Shen, Peking University | Deep fully convolutional networks with conditional random field. Trained on MSCOCO trainval set and Pascal VOC 12 train set. | 2016-08-23 04:04:21 |
SegNeXt | SegNeXt | Tsinghua University and Nankai University | Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu. | SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation (NeurIPS 2022). A simple CNN-based method for semantic segmentation. | 2022-09-19 11:12:10 |
SegNet | SegNet | University of Cambridge | Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla | SegNet is a memory efficient real time deep convolutional encoder-decoder architecture. For more information, please see our publications and web demo at: http://mi.eng.cam.ac.uk/projects/segnet/ | 2015-11-10 09:48:12 |
asfcas | SepaNet | dqwdaw | asfcae | gvsfdvc | 2019-10-25 16:30:20 |
SpDConv2 | SpDConv2 | SpDConv2 | SpDConv2 | SpDConv2 | 2021-01-06 03:14:39 |
Tree-structured Kronecker Convolutional Networks | TKCNet | Institute of Computing Technology, Chinese Academy of Sciences | Tianyi Wu, Sheng Tang, Rui Zhang, Linghui Li, Yongdong Zhang | Most existing semantic segmentation methods employ atrous convolution to enlarge the receptive field of filters, but neglect important local contextual information. To tackle this issue, we firstly propose a novel Kronecker convolution which adopts Kronecker product to expand its kernel for taking into account the feature vectors neglected by atrous convolutions. Therefore, it can capture local contextual information and enlarge the field of view of filters simultaneously without introducing extra parameters. Secondly, we propose Tree-structured Feature Aggregation (TFA) module which follows a recursive rule to expand and forms a hierarchical structure. Thus, it can naturally learn representations of multi-scale objects and encode hierarchical contextual information in complex scenes. Finally, we design Tree-structured Kronecker Convolutional Networks (TKCN) that employs Kronecker convolution and TFA module. Extensive experiments on three datasets, PASCAL VOC 2012, PASCAL-Context and Cityscapes, verify the effectiveness of our proposed approach. Created on | 2018-04-20 13:04:57 |
Diverse M-Best with discriminative reranking | TTIC-divmbest-rerank | (1) Toyota Technological Institute at Chicago, (2) Virginia Tech | Payman Yadollahpour (1), Dhruv Batra (1,2), Greg Shakhnarovich (1) | We generate a set of M=10 full image segmentations using Diverse M-Best algorithm from [BYGS'12], applied to inference in the O2P model (Carreira et al., 2012). Then we discriminatively train a reranker based on a novel set of features. The learning of the reranker uses relative loss, with the objective to minimize gap with the oracle (the hindsight-best of the M segmentations), and relies on slack-rescaling structural SVM. The details are described in [YBS'13]. References: [BYGS'12] Batra, Yadollahpour, Guzman, Shakhnarovich, ECCV 2012. [YBS'13] Yadollahpour, Batra, Shakhnarovich, CVPR 2013. | 2012-11-15 04:03:01 |
Feedforward segmentation with zoom-out features | TTI_zoomout | TTI-Chicago | Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich | Our method uses a feedforward network to directly label superpixels. For each superpixel we use features extracted from a nested set of "zoom-out" regions, from purely local to image-level. | 2014-11-17 04:57:49 |
Feedforward segmentation with zoom-out features | TTI_zoomout_16 | TTI-Chicago | Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich | Same as before, except using VGG 16-layer network instead of VGG CNN-S network. Fine-tuning on VOC-2012 was not performed. See http://arxiv.org/abs/1412.0774 for details. | 2014-11-24 08:54:05 |
Feedforward semantic segmentation with zoom-out features | TTI_zoomout_v2 | TTI-Chicago | Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich | Similar to TTI_zoomout_16, except the way that we set the number and scope of zoom-out levels. In this version, zoom-out levels correspond to receptive field sizes of different layers in a convolutional neural network. Our model is trained only on VOC-2012. Details are provided in our CVPR 2015 paper available at http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mostajabi_Feedforward_Semantic_Segmentation_2015_CVPR_paper.pdf. | 2015-03-30 18:40:04 |
Global Deconvolutional Network with CRF | UNIST_GDN_CRF | Ulsan National Institute of Science and Technology (UNIST) | Vladimir Nekrasov, Janghoon Ju, Jaesik Choi | We propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline DeepLab-CRF. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930. | 2016-07-29 07:23:03 |
Global Deconvolutional Network with CRF | UNIST_GDN_CRF_ENS | Ulsan National Institute of Science and Technology (UNIST) | Vladimir Nekrasov, Janghoon Ju, Jaesik Choi | We propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline DeepLab-CRF. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930. | 2016-07-29 07:25:56 |
Global Deconvolutional Network | UNIST_GDN_FCN | Ulsan National Institute of Science and Technology (UNIST) | Vladimir Nekrasov, Janghoon Ju, Jaesik Choi | We propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Our model shows superior performance over baseline FCN-32s. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930. | 2016-07-27 01:39:17 |
Global Deconvolutional Network | UNIST_GDN_FCN_FC | Ulsan National Institute of Science and Technology (UNIST) | Vladimir Nekrasov, Janghoon Ju, Jaesik Choi | We propose a novel architecture to conduct the deconvolution operation and acquire dense predictions, and an additional loss function, which forces the network to boost the recognition accuracy. Besides that, we append a fully-connected layer after the down-sampled image to refine current predictions. Our model shows superior performance over baseline FCN-32s and even outperforms more powerful multi-scale variant. Further details are provided in "Global Deconvolutional Networks", http://arxiv.org/abs/1602.03930. | 2016-07-27 01:49:02 |
Fully convolutional neural net using VGG19 | VGG19_FCN | - | Sharif Amit Kamran , Md. Asif Bin Khaled , Sabit Bin Kabir , Dr. Hasan Muhammad , Moin Mostakim | We use VGG-19 classification neural net and then make it fully convolulational. Moreover, we use skip architectures by concatenating upsampled pool 1 to 4 with the score layer to get finer features. Training was done on two stages, first on Pascal VOC training dataset , secondly on both SBD training plus validation datasets. | 2017-04-06 23:22:53 |
CNN segmentation based on manifold learning | Weak_manifold_CNN | University of Central Florida | Marzieh Edraki | CNN manifold learning for segmentation | 2016-11-11 23:34:20 |
FLATTENET | XC-FLATTENET | Sichuan University, Chengdu, China | Xin Cai | It is well-known that the reduced feature resolution due to repeated subsampling operations poses a serious challenge to Fully Convolutional Network (FCN) based models. In contrast to the commonly-used strategies, such as dilated convolution and encoder-decoder structure, we introduce a novel Flattening Module to produce high-resolution predictions without either removing any subsampling operations or building a complicated decoder module. https://ieeexplore.ieee.org/document/8932465/metrics#metrics | 2020-01-17 07:46:18 |
new ConcatASPP | Xception65_ConcatASPP_Decoder | Tianjin University and Nankai University | Xiu Su, Hongyan Xu, Hong Kang | a new ASPP method | 2019-07-26 02:23:38 |
deeplabv3+ resnet50 | deeplabv3+ resnet50 | Northwestern Polytechnical University | Liying Gao, Peng Wang | deeplabv3+ resnet50 | 2018-12-11 13:36:13 |
deeplabv3+ resnet50 | deeplabv3+ resnet50 | Northwestern Polytechnical University | Liying Gao, Peng Wang | weakly supervised segmentation, replace FCN by deeplabv3+ | 2018-12-11 13:32:23 |
deeplabv3+ vgg16 | deeplabv3+ vgg16 | Northwestern Polytechnical University | Liying Gao, Peng Wang | deeplabv3+ vgg16 63.69 val | 2018-12-12 08:46:27 |
deeplabv3+ vgg16 | deeplabv3+ vgg16 | Northwestern Polytechnical University | Liying Gao, Peng Wang | deeplabv3+ vgg16 63.69 val | 2018-12-12 07:54:27 |
dsanet | dsanet | dsanet | dsanet | dsanet | 2019-11-23 03:51:33 |
dscnn | dscnn | jw | jw | dscnn | 2018-05-25 19:49:13 |
fdsf | fdsf | fsdf | fsdf | fsdf | 2018-11-22 01:07:09 |
high revolution network baseline | hrnet_baseline | UCAS | xiaoyang | In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. | 2020-01-26 05:12:51 |
MFF Network | multi-scale feature fusion network | shenzhen university | Sijun Dong, Di Lin | we proposed a novel network to make full use of context information for semantic segmentation. | 2018-11-26 13:04:53 |
fast laddernet | resnet 101 + fast laddernet | Yale University | Juntang Zhuang | resnet 101 + fast laddernet | 2018-10-29 19:53:41 |
resnet38 | resnet38_deeplab | Tsinghua University | Chen Qian | waiting for submission | 2021-11-06 01:49:46 |
Semi-supervised seg with weak masks | weak_semi_seg | Xiamen University | Lin Cheng | Semi-supervised segmentation with weak masks. We use 1.4k strong masks and 9k weak masks with class labels. | 2021-07-03 08:34:39 |
mixup | china | 123 | 123 | 2020-07-10 10:36:10 |