PASCAL VOC Challenge performance evaluation and download server |
|
Home | Leaderboard |
mean | aero plane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | dining table | dog | horse | motor bike | person | potted plant | sheep | sofa | train | tv/ monitor | submission date | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ATLDETv2 [?] | 92.9 | 97.4 | 96.3 | 94.2 | 89.0 | 89.0 | 95.5 | 95.7 | 98.0 | 84.7 | 96.4 | 82.1 | 97.4 | 97.6 | 96.6 | 96.1 | 79.4 | 96.2 | 87.0 | 96.2 | 92.5 | 26-Oct-2019 | |
AInnoDetection [?] | 92.3 | 96.6 | 95.3 | 94.4 | 87.3 | 87.5 | 94.4 | 94.1 | 98.4 | 82.6 | 96.5 | 82.9 | 97.9 | 96.9 | 96.2 | 95.0 | 79.8 | 95.7 | 86.9 | 96.5 | 91.0 | 01-Jul-2019 | |
AccurateDET (ensemble) [?] | 92.3 | 97.0 | 95.2 | 92.6 | 88.7 | 88.4 | 92.9 | 95.2 | 96.9 | 85.5 | 94.4 | 83.4 | 96.4 | 96.5 | 96.0 | 96.5 | 82.0 | 95.2 | 86.6 | 95.1 | 91.3 | 18-Jun-2019 | |
AccurateDET [?] | 91.3 | 96.6 | 95.1 | 91.5 | 87.2 | 87.0 | 92.2 | 94.0 | 96.5 | 83.4 | 94.1 | 80.0 | 96.1 | 96.4 | 95.8 | 95.7 | 79.7 | 95.1 | 85.1 | 94.6 | 90.1 | 17-Jun-2019 | |
tencent_retail_ft:DET [?] | 91.2 | 96.1 | 94.9 | 92.7 | 85.8 | 88.4 | 93.5 | 94.9 | 97.1 | 80.0 | 94.8 | 78.8 | 96.7 | 96.4 | 96.0 | 95.9 | 79.0 | 95.9 | 83.1 | 95.0 | 88.5 | 21-Jan-2019 | |
Sogou_MM_GCFE_RCNN(ensemble model) [?] | 91.1 | 95.9 | 94.6 | 93.3 | 86.2 | 87.1 | 93.2 | 95.1 | 97.1 | 81.1 | 94.4 | 77.1 | 96.5 | 96.6 | 95.8 | 95.4 | 77.9 | 95.4 | 84.1 | 95.0 | 89.5 | 25-Sep-2018 | |
Sogou_MM_GCFE_RCNN(single model) [?] | 91.0 | 95.9 | 94.1 | 93.3 | 86.2 | 87.0 | 93.1 | 95.1 | 97.1 | 81.1 | 94.4 | 77.1 | 96.5 | 96.6 | 95.8 | 95.4 | 77.9 | 95.4 | 83.4 | 94.9 | 89.5 | 25-Sep-2018 | |
FXRCNN (single model) [?] | 90.7 | 96.4 | 95.1 | 92.0 | 84.3 | 87.1 | 92.8 | 94.4 | 97.4 | 80.7 | 93.5 | 76.0 | 96.7 | 96.7 | 95.6 | 95.5 | 78.3 | 94.6 | 83.3 | 95.4 | 88.0 | 13-Jul-2018 | |
ATLDET [?] | 90.7 | 96.0 | 94.9 | 91.8 | 85.2 | 87.6 | 93.0 | 94.5 | 97.5 | 80.7 | 93.8 | 75.6 | 96.6 | 96.2 | 95.8 | 95.5 | 78.3 | 95.2 | 82.5 | 94.8 | 89.2 | 13-Aug-2018 | |
PACITYAIDetection [?] | 89.8 | 95.3 | 93.6 | 91.1 | 85.4 | 83.9 | 91.6 | 93.3 | 96.8 | 80.1 | 95.5 | 74.3 | 96.3 | 95.7 | 94.4 | 94.7 | 77.5 | 94.1 | 82.7 | 94.2 | 86.4 | 26-Sep-2019 | |
Ali_DCN_SSD_ENSEMBLE [?] | 89.2 | 95.4 | 93.7 | 91.8 | 82.8 | 81.7 | 92.4 | 93.4 | 97.6 | 75.7 | 94.1 | 74.2 | 96.4 | 95.1 | 94.2 | 93.3 | 72.5 | 94.1 | 82.8 | 94.6 | 87.7 | 28-May-2018 | |
CM-CV&AR: DET [?] | 89.1 | 95.7 | 94.4 | 92.0 | 81.1 | 82.9 | 93.8 | 90.0 | 97.1 | 74.6 | 95.4 | 70.4 | 96.7 | 96.2 | 95.3 | 93.4 | 73.7 | 94.8 | 81.1 | 96.0 | 88.2 | 20-Aug-2019 | |
VIM_SSD(COCO+07++12, single model, one-stage) [?] | 89.0 | 96.0 | 93.0 | 90.3 | 83.4 | 80.6 | 91.9 | 94.4 | 96.2 | 77.5 | 93.3 | 75.1 | 95.2 | 95.1 | 94.2 | 93.6 | 72.0 | 93.6 | 82.7 | 94.5 | 86.6 | 27-Jun-2018 | |
FOCAL_DRFCN(VOC+COCO, single model) [?] | 88.8 | 95.0 | 93.3 | 91.8 | 82.9 | 81.9 | 91.6 | 93.0 | 97.1 | 76.7 | 92.5 | 71.7 | 96.2 | 94.9 | 94.2 | 93.7 | 75.3 | 93.3 | 80.0 | 94.7 | 85.4 | 01-Mar-2018 | |
R4D_faster_rcnn [?] | 88.6 | 94.6 | 92.3 | 91.3 | 82.3 | 79.4 | 91.8 | 91.8 | 97.4 | 76.6 | 93.6 | 75.3 | 97.0 | 94.6 | 93.5 | 92.6 | 75.1 | 92.0 | 80.9 | 94.4 | 86.5 | 20-Nov-2016 | |
R-FCN, ResNet Ensemble(VOC+COCO) [?] | 88.4 | 94.8 | 92.9 | 90.6 | 82.4 | 81.8 | 89.9 | 91.7 | 97.1 | 76.0 | 93.4 | 71.9 | 96.6 | 94.3 | 93.9 | 92.8 | 75.7 | 91.9 | 80.8 | 93.6 | 86.4 | 09-Oct-2016 | |
FF_CSSD(VOC+COCO, one-stage, single model) [?] | 88.4 | 95.4 | 93.5 | 90.8 | 82.8 | 78.4 | 90.4 | 91.8 | 96.9 | 75.1 | 92.7 | 74.2 | 95.7 | 95.1 | 94.2 | 93.0 | 71.6 | 93.9 | 81.9 | 94.1 | 86.7 | 28-May-2018 | |
CU-SuperDet [?] | 88.1 | 94.8 | 94.1 | 91.0 | 80.3 | 81.3 | 92.5 | 88.5 | 96.1 | 73.2 | 94.8 | 69.0 | 95.5 | 95.3 | 95.1 | 92.2 | 72.8 | 94.1 | 80.1 | 94.9 | 87.4 | 16-Jan-2020 | |
HIK_FRCN [?] | 87.9 | 95.0 | 93.2 | 91.3 | 80.3 | 77.7 | 90.6 | 89.9 | 97.8 | 72.8 | 93.7 | 70.7 | 97.2 | 95.4 | 94.0 | 91.8 | 72.7 | 92.8 | 81.1 | 94.1 | 86.2 | 19-Sep-2016 | |
PFPNet512_ECCV [?] | 87.8 | 94.6 | 92.4 | 88.7 | 82.7 | 79.1 | 90.5 | 93.2 | 96.2 | 74.9 | 92.8 | 73.1 | 94.2 | 93.5 | 93.6 | 92.7 | 70.7 | 93.0 | 80.1 | 93.8 | 86.7 | 22-Mar-2018 | |
VIM_SSD [?] | 87.6 | 95.3 | 92.0 | 88.7 | 81.6 | 78.5 | 91.4 | 93.2 | 95.7 | 74.9 | 91.6 | 73.5 | 94.2 | 93.0 | 93.2 | 93.0 | 70.5 | 93.0 | 79.1 | 94.3 | 85.0 | 11-May-2018 | |
Deformable R-FCN, ResNet-101 (VOC+COCO) [?] | 87.1 | 94.0 | 91.7 | 88.5 | 79.4 | 78.0 | 89.7 | 90.8 | 96.9 | 74.2 | 93.1 | 71.3 | 95.9 | 94.8 | 93.2 | 92.5 | 71.7 | 91.8 | 78.3 | 93.2 | 83.3 | 23-Mar-2017 | |
FasterRcnn-ResNeXt101(COCO+07++12, single model) [?] | 86.8 | 93.9 | 93.4 | 88.3 | 80.2 | 72.6 | 89.4 | 89.3 | 96.8 | 73.0 | 91.5 | 72.3 | 95.4 | 94.5 | 93.8 | 91.7 | 70.7 | 90.6 | 81.2 | 92.6 | 83.9 | 04-May-2017 | |
RefineDet (VOC+COCO,single model,VGG16,one-stage) [?] | 86.8 | 94.7 | 91.5 | 88.8 | 80.4 | 77.6 | 90.4 | 92.3 | 95.6 | 72.5 | 91.6 | 69.9 | 93.9 | 93.5 | 92.4 | 92.6 | 68.8 | 92.4 | 78.5 | 93.6 | 85.2 | 16-Mar-2018 | |
AngDet [?] | 86.3 | 94.4 | 92.1 | 88.4 | 78.4 | 71.7 | 89.2 | 90.4 | 95.9 | 74.6 | 91.7 | 72.9 | 94.7 | 94.0 | 93.6 | 91.2 | 66.4 | 91.0 | 81.8 | 93.1 | 80.7 | 21-Oct-2018 | |
AngDet [?] | 85.5 | 93.9 | 91.6 | 88.0 | 76.6 | 70.7 | 88.5 | 89.9 | 95.6 | 72.2 | 92.3 | 72.1 | 94.9 | 93.6 | 93.1 | 90.7 | 65.4 | 90.5 | 78.3 | 91.9 | 80.2 | 04-Oct-2018 | |
PSSNet(VOC+COCO) [?] | 85.5 | 92.4 | 91.4 | 85.9 | 78.6 | 75.8 | 88.0 | 89.8 | 95.2 | 72.4 | 87.8 | 72.2 | 94.0 | 92.7 | 93.2 | 92.3 | 70.7 | 88.8 | 76.1 | 92.1 | 81.2 | 30-Mar-2018 | |
MLANet [?] | 85.3 | 93.8 | 91.2 | 86.8 | 78.3 | 71.6 | 88.8 | 89.9 | 94.6 | 72.2 | 91.2 | 71.6 | 92.3 | 92.4 | 90.5 | 90.9 | 66.0 | 92.2 | 76.9 | 91.3 | 82.4 | 26-Mar-2020 | |
SHS_Faster_RCNN_Upgrade_v2 [?] | 85.3 | 94.2 | 90.2 | 85.4 | 77.1 | 73.4 | 89.3 | 89.3 | 94.3 | 71.7 | 87.1 | 73.1 | 93.0 | 91.4 | 92.7 | 92.3 | 69.5 | 87.7 | 79.8 | 92.4 | 81.7 | 25-Feb-2019 | |
ESNet [?] | 85.2 | 93.9 | 91.9 | 86.1 | 75.8 | 70.4 | 89.5 | 89.1 | 94.2 | 71.7 | 91.6 | 71.0 | 92.9 | 92.1 | 92.9 | 90.8 | 63.6 | 91.1 | 79.4 | 91.3 | 84.5 | 26-Feb-2019 | |
R-FCN, ResNet (VOC+COCO) [?] | 85.0 | 92.3 | 89.9 | 86.7 | 74.7 | 75.2 | 86.7 | 89.0 | 95.8 | 70.2 | 90.4 | 66.5 | 95.0 | 93.2 | 92.1 | 91.1 | 71.0 | 89.7 | 76.0 | 92.0 | 83.4 | 09-Oct-2016 | |
MONet(VOC+COCO) [?] | 84.3 | 92.4 | 90.5 | 84.7 | 75.4 | 71.6 | 87.2 | 88.9 | 94.6 | 70.5 | 86.9 | 71.0 | 92.3 | 91.8 | 90.8 | 91.7 | 69.8 | 89.1 | 75.1 | 91.3 | 79.6 | 01-Apr-2018 | |
FSSD512 [?] | 84.2 | 92.8 | 90.0 | 86.2 | 75.9 | 67.7 | 88.9 | 89.0 | 95.0 | 68.8 | 90.9 | 68.7 | 92.8 | 92.1 | 91.4 | 90.2 | 63.1 | 90.1 | 76.9 | 91.5 | 82.7 | 07-Nov-2017 | |
PVANet+ [?] | 84.2 | 93.5 | 89.8 | 84.1 | 75.6 | 69.7 | 88.2 | 87.9 | 93.4 | 70.0 | 87.7 | 75.3 | 92.9 | 90.5 | 90.9 | 90.2 | 67.3 | 86.4 | 80.3 | 92.0 | 78.8 | 26-Oct-2016 | |
PFPNet512 VGG16 07++12+COCO [?] | 83.8 | 93.0 | 89.9 | 85.1 | 75.8 | 66.4 | 88.4 | 88.3 | 94.0 | 67.9 | 89.5 | 69.7 | 92.0 | 91.8 | 91.6 | 88.7 | 61.1 | 89.1 | 78.4 | 90.5 | 84.3 | 18-Oct-2017 | |
BlitzNet512 [?] | 83.8 | 93.1 | 89.4 | 84.7 | 75.5 | 65.0 | 86.6 | 87.4 | 94.5 | 69.9 | 88.8 | 71.7 | 92.5 | 91.6 | 91.1 | 88.9 | 61.2 | 90.4 | 79.2 | 91.8 | 83.0 | 19-Jul-2017 | |
Faster RCNN, ResNet (VOC+COCO) [?] | 83.8 | 92.1 | 88.4 | 84.8 | 75.9 | 71.4 | 86.3 | 87.8 | 94.2 | 66.8 | 89.4 | 69.2 | 93.9 | 91.9 | 90.9 | 89.6 | 67.9 | 88.2 | 76.8 | 90.3 | 80.0 | 10-Dec-2015 | |
DES512_COCO [?] | 83.7 | 92.6 | 90.0 | 83.7 | 74.5 | 66.3 | 88.5 | 88.6 | 94.5 | 70.2 | 87.4 | 71.5 | 92.2 | 91.2 | 92.3 | 89.0 | 60.2 | 89.5 | 79.6 | 90.1 | 82.6 | 09-Mar-2018 | |
PVANet+ (compressed) [?] | 83.7 | 92.8 | 88.9 | 83.4 | 74.7 | 68.7 | 88.2 | 87.8 | 93.5 | 69.5 | 87.3 | 74.3 | 93.1 | 89.5 | 89.9 | 90.2 | 66.8 | 86.4 | 79.8 | 91.9 | 78.2 | 18-Nov-2016 | |
Cascaded_CrystalNet [?] | 83.6 | 92.6 | 89.5 | 83.5 | 74.7 | 69.7 | 87.5 | 87.6 | 92.9 | 70.0 | 86.9 | 75.0 | 91.6 | 89.5 | 90.6 | 90.2 | 67.2 | 85.2 | 80.0 | 91.4 | 76.9 | 23-Dec-2017 | |
ESNet [?] | 83.5 | 93.2 | 90.0 | 84.4 | 73.8 | 70.2 | 88.1 | 87.7 | 93.9 | 68.2 | 88.8 | 69.5 | 91.5 | 91.3 | 90.8 | 89.5 | 63.6 | 89.2 | 74.3 | 89.9 | 81.8 | 23-Feb-2019 | |
DOH_512 (single VGG16, COCO+VOC07++12) [?] | 83.4 | 93.0 | 89.8 | 84.5 | 74.3 | 63.2 | 89.3 | 88.2 | 94.2 | 68.0 | 88.0 | 69.1 | 92.3 | 91.4 | 90.2 | 89.0 | 62.6 | 89.2 | 76.7 | 90.8 | 83.2 | 07-Nov-2017 | |
innovisgroup Faster R-CNN [?] | 83.2 | 93.1 | 87.0 | 83.3 | 74.1 | 70.1 | 87.9 | 88.5 | 92.3 | 68.1 | 86.3 | 72.5 | 90.4 | 89.3 | 90.9 | 89.9 | 66.7 | 87.4 | 76.5 | 91.2 | 79.2 | 22-May-2018 | |
ICT_360_ISD [?] | 82.6 | 90.7 | 89.4 | 87.0 | 75.8 | 70.1 | 86.0 | 86.5 | 96.2 | 65.3 | 86.8 | 62.1 | 94.6 | 90.6 | 90.5 | 89.7 | 63.5 | 87.3 | 72.7 | 90.7 | 77.1 | 18-Nov-2016 | |
Rank of experts (VOC07++12) [?] | 82.2 | 90.4 | 87.4 | 85.3 | 72.9 | 70.8 | 84.5 | 87.2 | 95.6 | 64.6 | 87.1 | 65.4 | 94.3 | 89.7 | 89.5 | 89.2 | 66.0 | 85.1 | 72.5 | 89.6 | 76.6 | 15-Nov-2017 | |
SSD512 VGG16 07++12+COCO [?] | 82.2 | 91.4 | 88.6 | 82.6 | 71.4 | 63.1 | 87.4 | 88.1 | 93.9 | 66.9 | 86.6 | 66.3 | 92.0 | 91.7 | 90.8 | 88.5 | 60.9 | 87.0 | 75.4 | 90.2 | 80.4 | 10-Oct-2016 | |
R-DAD (VOC07++12) [?] | 82.0 | 90.2 | 88.1 | 85.3 | 73.3 | 71.4 | 84.5 | 87.4 | 94.6 | 65.1 | 86.8 | 64.0 | 94.1 | 89.7 | 89.2 | 89.3 | 64.5 | 83.5 | 72.2 | 89.5 | 77.6 | 06-Mar-2018 | |
FSSD300 [?] | 82.0 | 92.2 | 89.2 | 81.8 | 72.3 | 59.7 | 87.4 | 84.4 | 93.5 | 66.8 | 87.7 | 70.4 | 92.1 | 90.9 | 89.6 | 87.7 | 56.9 | 86.8 | 79.0 | 90.7 | 81.3 | 10-Nov-2017 | |
RUN_3WAY_300, VGG16, 07++12+COCO [?] | 81.7 | 91.5 | 88.6 | 80.3 | 71.2 | 59.6 | 86.4 | 84.2 | 94.1 | 66.6 | 86.5 | 70.4 | 92.1 | 90.5 | 89.6 | 87.5 | 57.7 | 86.7 | 79.6 | 90.4 | 80.2 | 13-Oct-2017 | |
YOLOv2 (VOC + COCO) [?] | 81.5 | 90.0 | 88.6 | 82.2 | 71.7 | 65.5 | 85.5 | 84.2 | 92.9 | 67.2 | 87.6 | 70.0 | 91.2 | 90.5 | 90.0 | 88.6 | 62.5 | 83.8 | 70.7 | 88.8 | 79.4 | 21-Oct-2017 | |
Light R-CNN [?] | 81.1 | 90.4 | 88.7 | 83.1 | 71.7 | 64.1 | 84.5 | 84.2 | 94.9 | 63.8 | 85.0 | 65.8 | 94.0 | 88.0 | 88.9 | 88.3 | 62.7 | 85.0 | 73.1 | 89.4 | 75.8 | 06-Feb-2020 | |
SSD based method [?] | 81.0 | 91.8 | 87.5 | 82.5 | 71.2 | 65.6 | 85.4 | 86.2 | 92.8 | 64.0 | 85.9 | 64.7 | 91.6 | 89.0 | 88.7 | 87.9 | 59.2 | 87.5 | 73.5 | 88.8 | 76.8 | 24-Oct-2018 | |
ESNet [?] | 81.0 | 91.4 | 87.4 | 81.5 | 70.7 | 60.6 | 86.6 | 86.0 | 92.8 | 65.5 | 86.5 | 68.9 | 91.1 | 88.6 | 89.3 | 87.4 | 60.7 | 86.3 | 73.6 | 88.0 | 77.1 | 08-Feb-2019 | |
Light R-CNN [?] | 80.5 | 89.3 | 87.5 | 82.8 | 71.2 | 62.4 | 84.5 | 83.8 | 94.4 | 64.1 | 84.5 | 67.0 | 92.5 | 88.0 | 87.0 | 87.0 | 62.3 | 83.2 | 73.9 | 88.2 | 76.9 | 15-Jan-2020 | |
DenseSSD-512 07++12 [?] | 80.5 | 91.0 | 87.4 | 82.1 | 68.8 | 61.0 | 84.7 | 84.9 | 92.9 | 63.5 | 85.6 | 68.2 | 90.8 | 89.1 | 89.1 | 86.6 | 56.8 | 86.1 | 74.8 | 88.7 | 77.6 | 05-Dec-2017 | |
BlitzNet300 [?] | 80.2 | 91.0 | 86.5 | 80.0 | 70.1 | 54.7 | 84.4 | 84.1 | 92.5 | 65.1 | 83.5 | 69.2 | 91.2 | 88.1 | 88.5 | 85.7 | 55.8 | 85.4 | 79.3 | 89.8 | 78.2 | 19-Jul-2017 | |
OHEM+FRCN, VGG16, VOC+COCO [?] | 80.1 | 90.1 | 87.4 | 79.9 | 65.8 | 66.3 | 86.1 | 85.0 | 92.9 | 62.4 | 83.4 | 69.5 | 90.6 | 88.9 | 88.9 | 83.6 | 59.0 | 82.0 | 74.7 | 88.2 | 77.3 | 18-Apr-2016 | |
Light R-CNN [?] | 80.0 | 89.2 | 87.5 | 80.9 | 71.1 | 63.5 | 82.9 | 83.5 | 92.9 | 63.3 | 82.8 | 67.1 | 92.6 | 87.0 | 87.5 | 86.2 | 62.4 | 82.7 | 73.3 | 87.4 | 75.8 | 07-Jan-2020 | |
DSSD513_ResNet101_07++12 [?] | 80.0 | 92.1 | 86.6 | 80.3 | 68.7 | 58.2 | 84.3 | 85.0 | 94.6 | 63.3 | 85.9 | 65.6 | 93.0 | 88.5 | 87.8 | 86.4 | 57.4 | 85.2 | 73.4 | 87.8 | 76.8 | 15-Feb-2017 | |
RUN_3WAY_512, VGG16, 07++12 [?] | 79.8 | 90.0 | 87.3 | 80.2 | 67.4 | 62.4 | 84.9 | 85.6 | 92.9 | 61.8 | 84.9 | 66.2 | 90.9 | 89.1 | 88.0 | 86.5 | 55.4 | 85.0 | 72.6 | 87.7 | 76.8 | 22-Oct-2017 | |
SSD300 VGG16 07++12+COCO [?] | 79.3 | 91.0 | 86.0 | 78.1 | 65.0 | 55.4 | 84.9 | 84.0 | 93.4 | 62.1 | 83.6 | 67.3 | 91.3 | 88.9 | 88.6 | 85.6 | 54.7 | 83.8 | 77.3 | 88.3 | 76.5 | 03-Oct-2016 | |
DSOD300+ [?] | 79.3 | 90.5 | 87.4 | 77.5 | 67.4 | 57.7 | 84.7 | 83.6 | 92.6 | 64.8 | 81.3 | 66.4 | 90.1 | 87.8 | 88.1 | 87.3 | 57.9 | 80.3 | 75.6 | 88.1 | 76.7 | 16-Mar-2017 | |
BlitzNet [?] | 79.0 | 90.0 | 85.3 | 80.4 | 67.2 | 53.6 | 82.9 | 83.6 | 93.8 | 62.6 | 84.0 | 65.9 | 91.6 | 86.6 | 87.7 | 84.6 | 56.8 | 84.7 | 74.0 | 88.0 | 75.8 | 17-Mar-2017 | |
Res101+hyper+FasterRCNN(COCO+0712trainval) [?] | 78.9 | 88.9 | 85.3 | 79.9 | 68.4 | 63.8 | 84.1 | 83.9 | 91.0 | 62.0 | 83.2 | 64.3 | 88.8 | 87.6 | 85.9 | 87.1 | 60.8 | 80.7 | 70.5 | 88.0 | 73.0 | 10-Feb-2017 | |
EGCI-Net [?] | 78.5 | 89.6 | 86.8 | 75.6 | 64.5 | 53.7 | 85.3 | 82.6 | 92.8 | 63.5 | 83.0 | 67.5 | 90.1 | 87.0 | 87.9 | 85.1 | 56.7 | 79.5 | 75.7 | 87.0 | 75.2 | 26-Feb-2019 | |
SSD512 VGG16 07++12 [?] | 78.5 | 90.0 | 85.3 | 77.7 | 64.3 | 58.5 | 85.1 | 84.3 | 92.6 | 61.3 | 83.4 | 65.1 | 89.9 | 88.5 | 88.2 | 85.5 | 54.4 | 82.4 | 70.7 | 87.1 | 75.6 | 13-Oct-2016 | |
DCFF-Net [?] | 77.6 | 88.9 | 87.0 | 74.1 | 63.3 | 52.4 | 83.6 | 82.2 | 91.4 | 61.3 | 81.8 | 65.5 | 90.8 | 86.5 | 87.4 | 84.9 | 53.9 | 80.4 | 75.2 | 86.4 | 75.2 | 29-Jun-2018 | |
HFM_VGG16 [?] | 77.5 | 88.8 | 85.1 | 76.8 | 64.8 | 61.4 | 85.0 | 84.1 | 90.0 | 59.9 | 82.6 | 61.9 | 88.5 | 85.2 | 85.6 | 86.9 | 56.7 | 79.5 | 67.5 | 85.4 | 73.4 | 21-Mar-2016 | |
Res101+FasterRCNN(COCO+0712trainval) [?] | 77.3 | 86.9 | 83.7 | 76.5 | 65.9 | 59.5 | 81.9 | 82.6 | 90.9 | 60.1 | 81.0 | 64.2 | 88.0 | 84.9 | 86.2 | 85.2 | 58.7 | 79.5 | 72.6 | 86.4 | 71.3 | 05-Feb-2017 | |
FFD_07++12 [?] | 77.2 | 89.0 | 86.5 | 72.4 | 61.7 | 51.9 | 83.9 | 81.3 | 91.7 | 61.0 | 80.6 | 66.3 | 88.8 | 86.8 | 86.6 | 85.1 | 54.1 | 80.0 | 75.8 | 87.2 | 74.4 | 16-Apr-2018 | |
shufflenetv2_yolov3 [?] | 77.2 | 89.9 | 84.4 | 79.3 | 66.0 | 55.6 | 83.8 | 82.2 | 92.1 | 57.0 | 81.1 | 64.1 | 88.9 | 85.5 | 86.3 | 86.5 | 56.4 | 82.4 | 65.5 | 84.2 | 73.0 | 25-Feb-2020 | |
DCFF-Net [?] | 77.2 | 89.4 | 85.6 | 73.1 | 63.2 | 52.1 | 84.3 | 81.1 | 92.1 | 61.1 | 81.5 | 65.4 | 89.7 | 86.7 | 88.4 | 84.6 | 52.7 | 80.0 | 73.2 | 87.0 | 72.8 | 03-Jul-2018 | |
RUN300_3WAY, VGG16, 07++12 [?] | 77.1 | 88.2 | 84.4 | 76.2 | 63.8 | 53.1 | 82.9 | 79.5 | 90.9 | 60.7 | 82.5 | 64.1 | 89.6 | 86.5 | 86.6 | 83.3 | 51.5 | 83.0 | 74.0 | 87.6 | 74.4 | 26-Sep-2017 | |
DenseSSD-300 07++12 [?] | 77.0 | 87.5 | 84.9 | 77.0 | 64.0 | 49.6 | 84.3 | 79.3 | 91.6 | 60.0 | 82.6 | 64.8 | 90.3 | 88.1 | 87.2 | 82.5 | 51.1 | 81.8 | 74.0 | 86.8 | 72.6 | 29-Nov-2017 | |
FasterRCNN [?] | 76.8 | 84.4 | 85.5 | 81.4 | 65.4 | 60.3 | 84.9 | 83.8 | 93.4 | 62.0 | 85.7 | 55.5 | 90.8 | 88.4 | 81.4 | 85.7 | 50.5 | 82.7 | 65.2 | 89.0 | 60.0 | 23-Jul-2017 | |
fasterRCNN+COCO+VOC+MCC [?] | 76.8 | 84.4 | 85.5 | 81.4 | 65.4 | 60.3 | 84.9 | 83.8 | 93.4 | 62.0 | 85.7 | 55.5 | 90.8 | 88.4 | 81.4 | 85.7 | 50.5 | 82.7 | 65.2 | 89.0 | 60.0 | 23-Jul-2017 | |
Fast-rcnn [?] | 76.8 | 84.1 | 86.7 | 79.4 | 64.8 | 59.2 | 85.2 | 81.4 | 94.6 | 63.3 | 86.9 | 54.6 | 92.0 | 90.1 | 81.7 | 85.0 | 51.5 | 83.7 | 63.7 | 90.0 | 58.7 | 24-Oct-2017 | |
IFRN_07+12 [?] | 76.6 | 87.8 | 83.9 | 79.0 | 64.5 | 58.9 | 82.2 | 82.0 | 91.4 | 56.5 | 82.3 | 62.4 | 90.4 | 85.6 | 86.4 | 86.4 | 55.1 | 80.5 | 62.7 | 85.4 | 69.2 | 07-Jun-2016 | |
ION [?] | 76.4 | 87.5 | 84.7 | 76.8 | 63.8 | 58.3 | 82.6 | 79.0 | 90.9 | 57.8 | 82.0 | 64.7 | 88.9 | 86.5 | 84.7 | 82.3 | 51.4 | 78.2 | 69.2 | 85.2 | 73.5 | 23-Nov-2015 | |
DSOD300 [?] | 76.3 | 89.4 | 85.3 | 72.9 | 62.7 | 49.5 | 83.6 | 80.6 | 92.1 | 60.8 | 77.9 | 65.6 | 88.9 | 85.5 | 86.8 | 84.6 | 51.1 | 77.7 | 72.3 | 86.0 | 72.2 | 17-Mar-2017 | |
PLN [?] | 76.0 | 88.3 | 84.7 | 77.4 | 65.9 | 55.8 | 82.0 | 79.4 | 91.9 | 58.2 | 77.3 | 58.8 | 89.5 | 85.3 | 85.3 | 82.9 | 55.8 | 79.6 | 64.6 | 86.5 | 69.9 | 27-Mar-2017 | |
Faster RCNN baseline (VOC+COCO) [?] | 75.9 | 87.4 | 83.6 | 76.8 | 62.9 | 59.6 | 81.9 | 82.0 | 91.3 | 54.9 | 82.6 | 59.0 | 89.0 | 85.5 | 84.7 | 84.1 | 52.2 | 78.9 | 65.5 | 85.4 | 70.2 | 24-Nov-2015 | |
MNC baseline [?] | 75.9 | 86.4 | 81.1 | 76.4 | 64.3 | 57.8 | 81.1 | 80.3 | 92.0 | 55.2 | 82.6 | 61.0 | 89.9 | 86.4 | 84.6 | 85.4 | 53.1 | 79.8 | 66.1 | 84.7 | 69.9 | 15-Dec-2015 | |
SSD300 VGG16 07++12 [?] | 75.8 | 88.1 | 82.9 | 74.4 | 61.9 | 47.6 | 82.7 | 78.8 | 91.5 | 58.1 | 80.0 | 64.1 | 89.4 | 85.7 | 85.5 | 82.6 | 50.2 | 79.8 | 73.6 | 86.6 | 72.1 | 18-Oct-2016 | |
Faster+resnet101+07++12 [?] | 75.8 | 86.9 | 83.2 | 78.3 | 61.9 | 58.1 | 79.4 | 80.4 | 91.7 | 55.9 | 81.0 | 58.7 | 91.1 | 85.2 | 84.8 | 83.2 | 54.7 | 78.6 | 67.6 | 84.9 | 70.6 | 14-Nov-2017 | |
RFCN_DCN [?] | 75.7 | 85.7 | 83.0 | 76.9 | 63.6 | 57.8 | 79.4 | 79.5 | 92.9 | 58.2 | 79.6 | 60.9 | 90.3 | 85.3 | 85.1 | 83.5 | 55.7 | 79.6 | 64.5 | 84.6 | 68.1 | 27-Jun-2017 | |
MCC_FRCN, ResNet101, 07++12 [?] | 75.4 | 86.0 | 83.5 | 78.3 | 62.2 | 59.5 | 80.4 | 79.1 | 91.2 | 55.9 | 80.1 | 56.3 | 90.2 | 86.6 | 84.1 | 82.8 | 53.0 | 78.2 | 65.5 | 85.4 | 69.9 | 21-Nov-2016 | |
YOLOv2 [?] | 75.4 | 86.6 | 85.0 | 76.8 | 61.1 | 55.5 | 81.2 | 78.2 | 91.8 | 56.8 | 79.6 | 61.7 | 89.7 | 86.0 | 85.0 | 84.2 | 51.2 | 79.4 | 62.9 | 84.9 | 71.0 | 23-Feb-2017 | |
BlitzNet [?] | 75.4 | 87.5 | 82.2 | 74.6 | 61.6 | 46.0 | 81.5 | 78.4 | 91.4 | 58.2 | 80.3 | 64.9 | 89.1 | 83.6 | 85.8 | 81.5 | 50.6 | 79.9 | 74.8 | 84.9 | 71.2 | 17-Mar-2017 | |
as [?] | 75.0 | 85.1 | 82.0 | 78.5 | 63.2 | 58.0 | 79.9 | 81.2 | 91.2 | 56.7 | 79.0 | 59.0 | 89.6 | 83.2 | 82.0 | 83.2 | 54.7 | 79.8 | 63.4 | 82.3 | 68.3 | 14-Nov-2019 | |
LocNet [?] | 74.8 | 86.3 | 83.0 | 76.1 | 60.8 | 54.6 | 79.9 | 79.0 | 90.6 | 54.3 | 81.6 | 62.0 | 89.0 | 85.7 | 85.5 | 82.8 | 49.7 | 76.6 | 67.5 | 83.2 | 67.4 | 06-Nov-2015 | |
DC-SPP-YOLO [?] | 74.6 | 86.9 | 82.5 | 75.7 | 61.2 | 52.9 | 82.5 | 78.4 | 91.0 | 52.8 | 80.2 | 60.8 | 89.4 | 83.5 | 85.5 | 82.5 | 49.5 | 79.8 | 63.9 | 83.7 | 68.3 | 08-Oct-2018 | |
DDT augmentation based on web images [?] | 74.4 | 86.5 | 81.9 | 76.2 | 63.4 | 55.4 | 80.8 | 80.1 | 89.7 | 51.6 | 78.6 | 56.2 | 88.8 | 84.8 | 85.5 | 82.6 | 50.6 | 78.1 | 64.1 | 85.6 | 68.1 | 26-Jul-2017 | |
MR_CNN_S_CNN_MORE_DATA [?] | 73.9 | 85.5 | 82.9 | 76.6 | 57.8 | 62.7 | 79.4 | 77.2 | 86.6 | 55.0 | 79.1 | 62.2 | 87.0 | 83.4 | 84.7 | 78.9 | 45.3 | 73.4 | 65.8 | 80.3 | 74.0 | 06-Jun-2015 | |
HyperNet_VGG [?] | 71.4 | 84.2 | 78.5 | 73.6 | 55.6 | 53.7 | 78.7 | 79.8 | 87.7 | 49.6 | 74.9 | 52.1 | 86.0 | 81.7 | 83.3 | 81.8 | 48.6 | 73.5 | 59.4 | 79.9 | 65.7 | 12-Oct-2015 | |
HyperNet_SP [?] | 71.3 | 84.1 | 78.3 | 73.3 | 55.5 | 53.6 | 78.6 | 79.6 | 87.5 | 49.5 | 74.9 | 52.1 | 85.6 | 81.6 | 83.2 | 81.6 | 48.4 | 73.2 | 59.3 | 79.7 | 65.6 | 28-Oct-2015 | |
MR_CNN_S_CNN [?] | 70.7 | 85.0 | 79.6 | 71.5 | 55.3 | 57.7 | 76.0 | 73.9 | 84.6 | 50.5 | 74.3 | 61.7 | 85.5 | 79.9 | 81.7 | 76.4 | 41.0 | 69.0 | 61.2 | 77.7 | 72.1 | 09-May-2015 | |
Fast R-CNN + YOLO [?] | 70.7 | 83.4 | 78.5 | 73.5 | 55.8 | 43.4 | 79.1 | 73.1 | 89.4 | 49.4 | 75.5 | 57.0 | 87.5 | 80.9 | 81.0 | 74.7 | 41.8 | 71.5 | 68.5 | 82.1 | 67.2 | 06-Nov-2015 | |
FasterRCNN [?] | 70.4 | 82.1 | 78.6 | 72.6 | 54.3 | 52.1 | 77.3 | 76.6 | 87.4 | 49.8 | 76.1 | 50.5 | 86.5 | 80.1 | 82.0 | 80.8 | 46.7 | 70.6 | 58.8 | 80.5 | 65.3 | 23-Jul-2017 | |
RPN [?] | 70.4 | 84.9 | 79.8 | 74.3 | 53.9 | 49.8 | 77.5 | 75.9 | 88.5 | 45.6 | 77.1 | 55.3 | 86.9 | 81.7 | 80.9 | 79.6 | 40.1 | 72.6 | 60.9 | 81.2 | 61.5 | 01-Jun-2015 | |
DEEP_ENSEMBLE_COCO [?] | 70.1 | 84.0 | 79.4 | 71.6 | 51.9 | 51.1 | 74.1 | 72.1 | 88.6 | 48.3 | 73.4 | 57.8 | 86.1 | 80.0 | 80.7 | 70.4 | 46.6 | 69.6 | 68.8 | 75.9 | 71.4 | 03-May-2015 | |
OHEM+FRCN, VGG16 [?] | 69.8 | 81.5 | 78.9 | 69.6 | 52.3 | 46.5 | 77.4 | 72.1 | 88.2 | 48.8 | 73.8 | 58.3 | 86.9 | 79.7 | 81.4 | 75.0 | 43.0 | 69.5 | 64.8 | 78.5 | 68.9 | 18-Apr-2016 | |
Networks on Convolutional Feature Maps [?] | 68.8 | 82.8 | 79.0 | 71.6 | 52.3 | 53.7 | 74.1 | 69.0 | 84.9 | 46.9 | 74.3 | 53.1 | 85.0 | 81.3 | 79.5 | 72.2 | 38.9 | 72.4 | 59.5 | 76.7 | 68.1 | 17-Apr-2015 | |
Fast R-CNN VGG16 extra data [?] | 68.4 | 82.3 | 78.4 | 70.8 | 52.3 | 38.7 | 77.8 | 71.6 | 89.3 | 44.2 | 73.0 | 55.0 | 87.5 | 80.5 | 80.8 | 72.0 | 35.1 | 68.3 | 65.7 | 80.4 | 64.2 | 17-Apr-2015 | |
segDeepM [?] | 66.4 | 81.1 | 75.6 | 65.7 | 47.7 | 46.1 | 72.1 | 69.1 | 86.8 | 43.0 | 71.0 | 53.0 | 84.9 | 76.3 | 78.8 | 68.8 | 40.0 | 70.0 | 61.8 | 71.4 | 64.1 | 04-Mar-2016 | |
UMICH_FGS_STRUCT [?] | 66.4 | 82.9 | 76.1 | 64.1 | 44.6 | 49.4 | 70.3 | 71.2 | 84.6 | 42.7 | 68.6 | 55.8 | 82.7 | 77.1 | 79.9 | 68.7 | 41.4 | 69.0 | 60.0 | 72.0 | 66.2 | 20-Jun-2015 | |
YOLOv2-resnet-18-101 [?] | 64.1 | 80.2 | 71.8 | 67.7 | 50.5 | 45.3 | 72.3 | 71.9 | 79.6 | 45.5 | 61.9 | 47.6 | 77.1 | 66.6 | 75.1 | 75.4 | 42.4 | 63.3 | 55.6 | 73.7 | 58.0 | 18-May-2022 | |
NUS_NIN_c2000 [?] | 63.8 | 80.2 | 73.8 | 61.9 | 43.7 | 43.0 | 70.3 | 67.6 | 80.7 | 41.9 | 69.7 | 51.7 | 78.2 | 75.2 | 76.9 | 65.1 | 38.6 | 68.3 | 58.0 | 68.7 | 63.3 | 30-Oct-2014 | |
BabyLearning [?] | 63.2 | 78.0 | 74.2 | 61.3 | 45.7 | 42.7 | 68.2 | 66.8 | 80.2 | 40.6 | 70.0 | 49.8 | 79.0 | 74.5 | 77.9 | 64.0 | 35.3 | 67.9 | 55.7 | 68.7 | 62.6 | 12-Nov-2014 | |
NUS_NIN [?] | 62.4 | 77.9 | 73.1 | 62.6 | 39.5 | 43.3 | 69.1 | 66.4 | 78.9 | 39.1 | 68.1 | 50.0 | 77.2 | 71.3 | 76.1 | 64.7 | 38.4 | 66.9 | 56.2 | 66.9 | 62.7 | 30-Oct-2014 | |
R-CNN (bbox reg) [?] | 62.4 | 79.6 | 72.7 | 61.9 | 41.2 | 41.9 | 65.9 | 66.4 | 84.6 | 38.5 | 67.2 | 46.7 | 82.0 | 74.8 | 76.0 | 65.2 | 35.6 | 65.4 | 54.2 | 67.4 | 60.3 | 26-Oct-2014 | |
YOLOv1 [?] | 61.4 | 75.6 | 69.5 | 63.4 | 42.4 | 27.6 | 72.1 | 59.8 | 85.8 | 39.8 | 66.5 | 48.2 | 81.5 | 75.7 | 73.5 | 67.2 | 31.7 | 60.9 | 55.1 | 75.9 | 55.2 | 16-Sep-2021 | |
R-CNN [?] | 59.2 | 76.8 | 70.9 | 56.6 | 37.5 | 36.9 | 62.9 | 63.6 | 81.1 | 35.7 | 64.3 | 43.9 | 80.4 | 71.6 | 74.0 | 60.0 | 30.8 | 63.4 | 52.0 | 63.5 | 58.7 | 25-Oct-2014 | |
YOLO [?] | 57.9 | 77.0 | 67.2 | 57.7 | 38.3 | 22.7 | 68.3 | 55.9 | 81.4 | 36.2 | 60.8 | 48.5 | 77.2 | 72.3 | 71.3 | 63.5 | 28.9 | 52.2 | 54.8 | 73.9 | 50.8 | 06-Nov-2015 | |
Feature Edit [?] | 56.3 | 74.6 | 69.1 | 54.4 | 39.1 | 33.1 | 65.2 | 62.7 | 69.7 | 30.8 | 56.0 | 44.6 | 70.0 | 64.4 | 71.1 | 60.2 | 33.3 | 61.3 | 46.4 | 61.7 | 57.8 | 06-Sep-2014 | |
CPE [?] | 54.6 | 73.1 | 75.4 | 60.0 | 25.1 | 35.0 | 62.8 | 55.2 | 73.8 | 28.9 | 66.3 | 30.0 | 69.7 | 70.1 | 76.4 | 36.3 | 32.3 | 53.2 | 44.1 | 62.4 | 62.1 | 14-Sep-2021 | |
WithoutFR_CEP [?] | 54.3 | 74.3 | 75.0 | 56.8 | 27.8 | 29.8 | 62.6 | 55.1 | 76.8 | 30.4 | 64.5 | 29.4 | 71.8 | 67.7 | 77.6 | 31.4 | 33.0 | 56.3 | 44.3 | 63.3 | 58.9 | 23-Sep-2021 | |
CEP [?] | 53.3 | 76.3 | 74.2 | 61.4 | 32.4 | 35.5 | 65.4 | 61.4 | 79.0 | 25.1 | 68.5 | 22.7 | 75.6 | 70.0 | 76.7 | 4.0 | 28.2 | 56.2 | 29.8 | 66.7 | 57.0 | 16-Mar-2021 | |
R-CNN (bbox reg) [?] | 53.3 | 71.8 | 65.8 | 52.0 | 34.1 | 32.6 | 59.6 | 60.0 | 69.8 | 27.6 | 52.0 | 41.7 | 69.6 | 61.3 | 68.3 | 57.8 | 29.6 | 57.8 | 40.9 | 59.3 | 54.1 | 13-Mar-2014 | |
ss-pcl [?] | 52.6 | 74.4 | 75.0 | 58.7 | 36.2 | 34.9 | 63.0 | 64.3 | 70.0 | 22.8 | 67.9 | 30.5 | 66.8 | 72.7 | 76.8 | 4.7 | 24.3 | 60.7 | 51.3 | 43.8 | 53.2 | 18-Dec-2021 | |
ss-pcl [?] | 52.3 | 74.1 | 75.6 | 58.7 | 34.5 | 35.5 | 63.8 | 64.6 | 65.7 | 22.8 | 66.5 | 30.5 | 65.9 | 72.2 | 76.8 | 5.0 | 24.3 | 61.4 | 52.3 | 43.9 | 51.7 | 20-Dec-2021 | |
ss-pcl [?] | 52.3 | 74.9 | 75.5 | 59.5 | 32.1 | 35.3 | 63.0 | 64.5 | 68.2 | 21.9 | 66.7 | 30.8 | 67.6 | 72.0 | 76.3 | 5.1 | 24.3 | 60.4 | 54.3 | 41.2 | 51.9 | 20-Dec-2021 | |
ss-pcl [?] | 52.2 | 75.6 | 74.7 | 56.8 | 36.0 | 33.8 | 63.8 | 65.3 | 65.1 | 22.8 | 66.4 | 29.6 | 65.4 | 72.6 | 76.2 | 3.2 | 26.0 | 61.2 | 53.2 | 43.1 | 53.1 | 15-Dec-2021 | |
ss-pcl [?] | 51.6 | 74.6 | 74.4 | 57.4 | 31.4 | 35.3 | 62.5 | 65.2 | 66.4 | 22.4 | 66.3 | 30.0 | 64.8 | 71.8 | 75.2 | 4.0 | 25.7 | 60.5 | 54.2 | 40.3 | 50.2 | 20-Dec-2021 | |
SDS [?] | 50.7 | 69.7 | 58.4 | 48.5 | 28.3 | 28.8 | 61.3 | 57.5 | 70.8 | 24.1 | 50.7 | 35.9 | 64.9 | 59.1 | 65.8 | 57.1 | 26.0 | 58.8 | 38.6 | 58.9 | 50.7 | 21-Jul-2014 | |
R-CNN [?] | 49.6 | 68.1 | 63.8 | 46.1 | 29.4 | 27.9 | 56.6 | 57.0 | 65.9 | 26.5 | 48.7 | 39.5 | 66.2 | 57.3 | 65.4 | 53.2 | 26.2 | 54.5 | 38.1 | 50.6 | 51.6 | 30-Jan-2014 | |
EAC-Net [?] | 49.1 | 68.9 | 73.1 | 51.2 | 34.7 | 33.0 | 61.1 | 58.9 | 44.9 | 27.8 | 66.0 | 25.4 | 60.6 | 62.8 | 77.7 | 3.0 | 29.8 | 54.9 | 33.6 | 53.5 | 61.4 | 16-Nov-2021 | |
FSD [?] | 48.7 | 75.7 | 72.3 | 52.5 | 27.2 | 35.9 | 63.3 | 59.3 | 56.2 | 25.8 | 58.6 | 28.1 | 61.8 | 44.1 | 75.1 | 2.8 | 24.2 | 55.2 | 38.5 | 59.1 | 58.4 | 08-May-2021 | |
SGCM [?] | 47.6 | 60.1 | 68.5 | 51.7 | 26.4 | 27.0 | 60.3 | 57.1 | 66.9 | 23.0 | 57.4 | 25.0 | 52.2 | 58.4 | 71.6 | 15.1 | 27.9 | 54.0 | 35.9 | 55.6 | 59.0 | 09-Mar-2019 | |
YOLOv1-resnet-18-50 [?] | 47.3 | 66.7 | 56.1 | 49.5 | 25.9 | 17.8 | 60.2 | 45.9 | 70.6 | 26.1 | 43.0 | 41.1 | 67.5 | 59.2 | 62.4 | 47.6 | 17.6 | 35.6 | 45.7 | 64.6 | 42.4 | 13-May-2022 | |
WSODE [?] | 46.9 | 75.3 | 70.2 | 51.4 | 29.0 | 31.6 | 60.1 | 57.7 | 21.1 | 22.4 | 59.5 | 28.5 | 34.0 | 64.8 | 74.8 | 6.8 | 27.9 | 53.1 | 45.1 | 62.9 | 62.1 | 17-Dec-2020 | |
Poselets2 [?] | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 58.7 | - | - | - | - | - | 06-Jun-2014 | |
Metu_Unified_Net [?] | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 89.9 | - | - | - | - | - | 10-Mar-2018 | |
Geometric shape [?] | - | - | 3.8 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 19-Jun-2016 |
Title | Method | Affiliation | Contributors | Description | Date |
---|---|---|---|---|---|
AInnovation Detection | AInnoDetection | AInnovation Co.Ltd. | Faen Zhang, Jiahong Wu, Zhizheng Yang,Haotian Cao, Jianfei Song,Xinyu Fan | All models are pre-trained on MSCOCO and then fine-tuned on VOC2012 datasets. We use ResNeXt152+DCN+FPN+CASCADE.Multi-scale train and test techniques are used during training and inference.Ensemble of four models used. | 2019-07-01 03:52:50 |
ATLDET | ATLDET | ATL(Alibaba Turing Lab) | Xuan Jin | ATLDET is pre-trained on ImageNet, fine-tuned on the MS COCO then Fine-tuned on Pascal VOC. Feature of instance segementation is concatenated. | 2018-08-13 08:13:19 |
ATLDETv2 | ATLDETv2 | ATL(Alibaba Turing Lab) | Xuan Jin, Wei Su, Rong Zhang, Yuan He, Hui Xue | ATLDETv2 is pre-trained on ImageNet then fine-tuned on the MSCOCO. Beyond fine-tuning, domain adaptive methods provide better results when we train on Pascal VOC. Backbone is ResneXt152_32x8d with DCN. Multi-scale strategy, soft-nms are also used. Final results come from ensemble of 2 models. | 2019-10-26 06:49:57 |
Accurate Detection | AccurateDET | 4Paradigm Data Intelligence Lab | Fengfu Li | I use adaptive method to generate high quality proposals for Faster RCNN. The backbone network is ResNeXt101 + DCN + FPN. Multi-scale + random flip techniques are used during training; while in the testing phase, only flip technique is used. The mAP on VOC07 test set is about 92.8. | 2019-06-17 10:32:01 |
Accurate Detection (ensemble) | AccurateDET (ensemble) | 4Paradigm Data Intelligence Lab | Fengfu Li | I use adaptive method to generate high quality proposals for Faster RCNN. The backbone network is ResNeXt101 + DCN + FPN. Multi-scale + random flip techniques are used during training; while in the testing phase, only flip technique is used. By using ensemble of three methods, the mAP on voc07 test set is about 93.8. | 2019-06-18 01:25:20 |
DCN with SoftNMS and FF-SSD ensemble | Ali_DCN_SSD_ENSEMBLE | Alibaba Group, Machine Intelligence Technology Lab | Hongbin Wang, Zhibin Wang, Hao Li | All models are pre-trained on ImageNet 1K dataset, and then fine-tuned on COCO detection dataset. Deformable R-FCN are enhanced by SoftNMS. SSD are enhanced by feature fusion. Ensemble version uses DCN and SSD. | 2018-05-28 03:02:24 |
AngDet(ResV1-101,VOC07++12,One-Stage,MS-Test) | AngDet | NJUST | Ang Li | We concentrate on fully attention to enhance the SSD Network. | 2018-10-04 13:47:12 |
AngDet | AngDet | NJUST | Ang Li | AngDet | 2018-10-21 03:33:19 |
Computational Baby Learning | BabyLearning | National University of Singapore | Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan | This entry is an implementation of the framework described in "Computational Baby Learning" (http://arxiv.org/abs/1411.2861). We build a computational model to interpret and mimic the baby learning process, based on prior knowledge modelling, exemplar learning, and learning with video contexts. Training data: (1) We used only two positive instances along with ~20,000 unlabelled videos to train the detector for each object category. (2) We used data from ILSVRC 2012 to pre-train the Network in Network [1] and fine-tuned the network with our newly mined instances. [1] Min Lin, Qiang Chen, Shuicheng Yan. Network In Network. In ICLR 2014. | 2014-11-12 03:50:50 |
Fully conv net for segmentation and detection | BlitzNet | Inria | Nikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia Schmid | CNN for joint segmentation and detection (based on SSD). Input resolution 512. Trained on VOC07 trainval + VOC12 trainval. | 2017-03-17 18:22:43 |
Fully conv net for segmentation and detection | BlitzNet | Inria | Nikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia Schmid | CNN for joint segmentation and detection (based on SSD). Input resolution 300. Trained on VOC07 trainval + VOC12 trainval. | 2017-03-17 18:24:29 |
FCN | BlitzNet300 | INRIA | Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid | CNN for joint segmentation and detection (based on SSD). Input resolution 300. Operates with speed 24 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO. | 2017-07-19 13:57:45 |
FCN | BlitzNet512 | INRIA | Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid | CNN for joint segmentation and detection (based on SSD). Input resolution 512. Operates with speed 19 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO. | 2017-07-19 13:38:53 |
detection | CEP | zzu | hu | detection | 2021-03-16 04:03:03 |
WSOD | CPE | zzu | suqihu | WSOD | 2021-09-14 08:00:43 |
CU-SuperDet:HTC+DCN+SNIPER | CU-SuperDet | ChinaUnicom-AI | Zhiang Hao, Shiguo Lian | SuperDet uses MS COCO as the pre training set and fine tunes the VOC data set. Resnext-101 is used as the backbone network. The network adopts DCN + HTC structure. In the process of training, multi-scale and random turning are used. The test set adopts multi-scale fusion, and the final result is fused with the result of SNIPER model | 2020-01-16 11:00:44 |
Cascaded deeply supervised CrystalNet | Cascaded_CrystalNet | DevABeyond | Jian Liang | Cascaded deeply supervised CrystalNet which is derived from a tailed faster rcnn network and incorprate with transform branch between stages | 2017-12-23 14:30:19 |
DC-SPP-YOLO | DC-SPP-YOLO | Beijing Uiversity of Chemistry and Technology | Zhanchao Huang | Dense Connecting and Spatial Pyramid Pooling YOLO, base network: darknet19, VOC 2007+2012 trainval, no COCO | 2018-10-08 12:45:43 |
dense convolutional and feature fused detector | DCFF-Net | Huazhong University of Science and Technology | Jingjuan Guo, Caihong Yuan, Zhiqiang Zhao, Ping Feng | Our network architecture is motivated by DSOD, and do not need pre-training on Imagenet but from scratch. we simplify the basic framework and introduce a novel feature fusion module that can extract more context feature maps. | 2018-07-03 08:16:35 |
dense convolutional and feature fused detector | DCFF-Net | Huazhong University of Science and Technology | Jingjuan Guo, Caihong Yuan, Zhiqiang Zhao, Ping Feng | Our network architecture is motivated by DSOD, and do not need pre-training on Imagenet but from scratch. we simplify the basic framework and introduce a novel feature fusion module that can extract more context feature maps. | 2018-06-29 02:57:22 |
DDT augmentation | DDT augmentation based on web images | Nanjing University, The University Of Adelaide | Xiu-Shen Wei, Chen-Lin Zhang, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou | This entry is based on Faster RCNN and our web-based object detection dataset (i.e., WebVOC [R1]) as an external dataset. Specifically, for WebVOC, we first collect web images from the Internet by Google using the categories of PASCAL VOC. In total, we collect 12,776 noisy web images, which has a similar scale as the original PASCAL VOC dataset. Then, we employ our Deep Descriptor Transforming (DDT) method [R1] to remove the noisy images, and moreover automatically annotate object bounding boxes. 10,081 images with their automatically generated boxes are remaining as valid images. For training detection models, we firstly fine-tune VGG-16 on WebVOC. Then, the WebVOC fine-tuned model is used for the VOC task. The training data of VOC is VOC 2007 trainval, test and VOC 2012 trainval. [R1] Xiu-Shen Wei, Chen-Lin Zhang, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou. Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming, arXiv:1707.06397, 2017 | 2017-07-26 10:55:14 |
An Ensemble of CNNs with COCO Augmentation | DEEP_ENSEMBLE_COCO | Australian National University (ANU) | Jian(Edison) Guo Stephen Gould | We follow mainly through the RCNN pipeline with the following innovations. 1) We trained an ensemble of CNNs for feature extraction. Our ensemble consists of GoogleNet and VGG-16 networks trained on different subsets of PASCAL VOC 2007/2012 and COCO. 2) We trained an ensemble of one-vs-all SVMs and bounding box regressors corresponding to each model of the CNN ensemble. 3) We averaged the SVM scores across the ensemble and sent the averaged SVM scores through the post-processing pipeline to obtain the indices of the selective search boxes retained after post-processing. 4) With the box indices, we ran box regression for each of the boxes for each of the models in the ensemble and then averaged the boxes across the ensemble to obtain the final results. (please see http://arxiv.org/abs/1506.07224) | 2015-05-03 15:40:02 |
DES512_COCO | DES512_COCO | JHU | Zhishuai Zhang | DES512_COCO | 2018-03-09 23:23:45 |
DOH_512 (single VGG16, COCO+VOC07++12) | DOH_512 (single VGG16, COCO+VOC07++12) | CVIP, Korea UNIV., Korea. | Younghyun Kim et al. | 'DOH: Decoupled Object Detection Network via Hidden State Top-Down' DOH consists of a novel Hidden State Top-Down (HSTD) architecture with Recursive Prediction Module (RPM). In this work, the multi-scale features are regarded as a sequential data, and they are integrated by using the hidden state akin to recurrent neural network that is suitable for handling the sequential data. Moreover, HSTD decouples overlapped functions between feature extraction and prediction. To correct the results, RPM derives the attention mask from the result calculated in the previous iteration and repeats the process of refining the feature map for prediction using the attention mask. We train our network on VOC07++12 and COCO (COCO use_difficult_gt: false). | 2017-11-07 02:40:26 |
Learning DSOD from Scratch | DSOD300 | Intel Labs China | Zhiqiang Shen, Jianguo Li, Zhuang Liu, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue, Thomas Huang | We train DSOD for object detection. The training data is VOC 2007 trainval, test and VOC 2012 trainval without ImageNet pre-trained models. The input image size is 300x300. | 2017-03-17 00:42:36 |
Learning DSOD from Scratch | DSOD300+ | Intel Labs China | Zhiqiang Shen, Jianguo Li, Zhuang Liu, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue, Thomas Huang | We train DSOD for object detection. The training data is VOC 2007 trainval, test, VOC 2012 trainval and MS COCO without ImageNet pre-trained models. The input image size is 300x300. | 2017-03-16 23:06:59 |
DSSD513 ResNet-101 07++12 | DSSD513_ResNet101_07++12 | UNC Chapel Hill, Amazon | Cheng-Yang Fu*, Wei Liu*, Ananth Ranga, Ambrish Tyagi, Alexander C. Berg (* equal contribution) | We first train SSD513 model using ResNet-101 on VOC07 trainval + test and VOC12 trainval for the 20 PASCAL classes. Then we use that SSD513 as the pre-trained model to train the DSSD513 on same training data. We only test a single model on a single scale image (513x513), and don't have any post-processing steps. Details can be found at : https://arxiv.org/abs/1701.06659 | 2017-02-15 18:02:47 |
Deformable R-FCN, ResNet-101 (VOC+COCO) | Deformable R-FCN, ResNet-101 (VOC+COCO) | Microsoft Research Asia | Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei | This entry is based on Deformable Convlutional Networks [a], R-FCN [b] and ResNet-101 [c]. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. OHEM and multi-scale training are applied on our model. Multi-scale testing and horizontal flipping are applied during inference. [a] "Deformable Convolutional Networks", Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei (https://arxiv.org/abs/1703.06211) [b] "R-FCN: Object Detection via Region-based Fully Convolutional Networks", Jifeng Dai, Yi Li, Kaiming He, Jian Sun (http://arxiv.org/abs/1605.06409). [c] "Deep Residual Learning for Image Recognition", Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (https://arxiv.org/abs/1512.03385) | 2017-03-23 03:46:36 |
DenseSSD-300 07++12 | DenseSSD-300 07++12 | CASIA | pei xu | DenseSSD-300 07++12 | 2017-11-29 03:38:58 |
DenseSSD-512 07++12 | DenseSSD-512 07++12 | CASIA | Pei Xu | DenseSSD-512 07++12 | 2017-12-05 09:54:59 |
EAC-Net | EAC-Net | Jiangnan University | Wenlong Gao | EAC-Net | 2021-11-16 01:03:31 |
Object detector with enriched global context | EGCI-Net | Huazhong University of Science and Technology | Jingjuan Guo, Caihong Yuan, Zhiqiang Zhao, Ping Feng | Our network architecture is motivated by DSOD, and do not need pre-training on Imagenet but from scratch. we simplify the basic framework and introduce a novel pyramid features pool module that can extract more context feature maps. | 2019-02-26 14:06:46 |
ESNet | ESNet | PKU | Zhisheng Lu | a new feature pyramid | 2019-02-26 09:16:02 |
ESNet | ESNet | pku | luzhisheng | a new feature pyramid | 2019-02-08 12:37:19 |
ESNet | ESNet | PKU | luis | a new feature pyramid | 2019-02-23 06:29:13 |
FFD300+ | FFD_07++12 | Zhejiang University | zuwei huang | We train FFD for object detection. The training data is VOC 2007 trainval, test, VOC 2012 trainval without ImageNet pre-trained models. The input image size is 300x300. | 2018-04-16 07:50:25 |
FF_CSSD512(07++12+coco), ResNet101 | FF_CSSD(VOC+COCO, one-stage, single model) | Alibaba Group, Machine Intelligence Technology Lab | Zhibin Wang, Hao Li | The FF_CSSD model with ResNet101 as backbone is enhanced by feature fusion and context information. The model is pre-trained on the ImageNet 1K classification training set, fine-tuned on the COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale testing is applied during inference. | 2018-05-28 15:06:49 |
Deformable R-FCN, Focal Loss, ResNet152(VOC+COCO) | FOCAL_DRFCN(VOC+COCO, single model) | PingAn AI Lab | Zhuzhenwen | This entry is based on ResNet-152, Deformable R-FCN and Focal Loss. The model is pre-trained on the ImageNet training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale traing is applied on our model. Multi-scale testing and horizontal flipping are applied during inference. | 2018-03-01 03:09:20 |
FSD for Weakly supervised object detection | FSD | JiangnanUniversity | wenlong gao | FSD for Weakly supervised object detection | 2021-05-08 15:17:03 |
FSSD300 | FSSD300 | Beihang University | Li Zuoxin | Feature fusion SSD which is based on VGG16. It can run at 68FPS on a single 1080Ti. | 2017-11-10 03:05:03 |
FSSD512 | FSSD512 | Beihang University | Li Zuoxin | Feature fusion SSD with 512x512 input image. It can run at 35 fps on a 1080Ti.(VOC07++12+COCO) | 2017-11-07 13:46:58 |
FXRCNN | FXRCNN (single model) | Yi+AI Lab | Hang Zhang, Boyuan Sun, Zhaonan Wang, Hao Zhao, ZiXuan Guan, Wei Miao | 1) Our model is pre-trained on ImageNet, fine-tuned on the MS COCO; 2) Fine-tuned on Pascal VOC. 3) ResNeXt with FPN is used as our backbone; 4) SoftNMS is used in post processing; 5) We also use multi-scale training strategy; | 2018-07-13 03:54:02 |
Fast R-CNN with YOLO Rescoring | Fast R-CNN + YOLO | University of Washington | Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi | We use the YOLO detection method to rescore the bounding boxes from Fast R-CNN. This helps mitigate false background detections and improve overall performance. For more information and example code see: http://pjreddie.com/darknet/yolo/ | 2015-11-06 08:03:59 |
Fast R-CNN VGG16 extra data | Fast R-CNN VGG16 extra data | Microsoft Research | Ross Girshick | Fast R-CNN is a new algorithm for training R-CNNs. The training process is a single fine-tuning run that jointly trains for softmax classification and bounding-box regression. Training took ~22 hours on a single GPU and testing takes ~330ms / image. A tech report describing the method is forthcoming. Open source code will be release. This entry was trained on VOC 2012 train+val union with VOC 2007 train+val+test. | 2015-04-17 17:32:25 |
Faster-rcnn Resnet50m soft-nms linear | Fast-rcnn | HIT | ixhorse | pass | 2017-10-24 02:43:14 |
Faster RCNN baseline (VOC+COCO) | Faster RCNN baseline (VOC+COCO) | Microsoft Research | Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun | This entry is an baseline implementation of the system described in " Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" (arXiv 2015). We use an ImageNet-pre-trained model (VGG-16), and fine-tune it on COCO trainval detection task. Then the COCO fine-tuned model is used for VOC task. The training data of VOC is VOC 2007 trainval, test and VOC 2012 trainval. The entire system takes <200ms per image, including proposal and detection. | 2015-11-24 03:56:56 |
Faster RCNN, ResNet (VOC+COCO) | Faster RCNN, ResNet (VOC+COCO) | Microsoft Research | Shaoqing Ren, Xiangyu Zhang, Kaiming He, Jian Sun | This entry is based on an improved Faster R-CNN system [a] and an extremely deep Residual Net [b] with a depth of over 100 layers. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. [a] "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. NIPS 2015. [b] "Deep Residual Learning for Image Recognition", Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Tech Report 2015. | 2015-12-10 14:47:49 |
FM+CRPN+global context | Faster+resnet101+07++12 | Harbin Institute of Technology | Chu Mengdie | Our work is based on Faster R-CNN and ResNet101. (1) use FPN to merge features (2) The context features are extracted from the entire image’s feature maps using ROI pooling layer, and then merged with the region’s features maps.(3) use cascade RPN to fine-tuned the bounding boxs (4) The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets only. | 2017-11-14 03:07:04 |
FasterRCNN | FasterRCNN | FasterRCNN | FasterRCNN | FasterRCNN | 2017-07-23 13:42:44 |
FasterRCNN | FasterRCNN | FasterRCNN | FasterRCNN | FasterRCNN | 2017-07-23 13:38:24 |
FasterRcnn-ResNeXt101(COCO+07++12, single model) | FasterRcnn-ResNeXt101(COCO+07++12, single model) | Beijing University of Posts and Telecommunications, (BUPT-PRIV) | Lu Yang; Qing Song; Zhihui Wang; Min Yang | Our network based on ResNeXt101-32x4d and faster rcnn, multi-scale training / multi-scale testing / image flipping are applied on this submittion. We first train our network on COCO and VOC0712trainval sets, then finetune on VOC07trainvaltest and VOC12trainval sets. | 2017-05-04 10:57:08 |
Diamond Frame Bicycle Recognition | Geometric shape | National Cheng Kung University | Chung-Ping Young, Yen-Bor Lin, Kuan-Yu Chen | Bicycle of diamond frame detector for side-view image is proposed based on the observation that a bicycle consists of two wheels in the form of ellipse shapes and a frame in the form of two triangles. Through the design of geometric constraints on the relationship between the triangles and ellipses, the computation is fast comparing to the feature-based classifiers. Besides, the training process is unnecessary and only single image is required for our algorithm. The experimental results are also given in this paper to show the practicability and the performance of the proposed bicycle model and bicycle detection algorithm. | 2016-06-19 10:06:33 |
Hierarchical Feature Model | HFM_VGG16 | Inha University | Byungjae Lee, Enkhbayar Erdenee, Sungyul Kim, Phill Kyu Rhee | We are motivated from the observations that many object detectors are degraded in performance due to ambiguities in inter-class and variations in intra-class appearances; deep features extracted from visual objects show strong hierarchical clustering property. We partition the deep features into unsupervised super-categories in the inter-class level, augmented categories in the object level to discover deep-feature-driven knowledge. We build Hierarchical Feature Model (HFM) using the Latent Topic Model (LTM) algorithm, ensemble one-versus-all SVMs at each node, and constitute hierarchical classification ensemble (HCE). In detection phase, object categorization and localization are processed based on the hypothesis of HCE with hierarchical mechanism. | 2016-03-21 10:59:33 |
Faster R-CNN with cascade RPN and global context | HIK_FRCN | Hikvision Research Institute | Qiaoyong Zhong, Chao Li, Yingying Zhang, Di Xie, Shiliang Pu | Our work on object detection is based on Faster R-CNN. We design and validate the following improvements: * Better network. We find that the identity-mapping variant of ResNet-101 is superior for object detection over the original version. * Better RPN proposals. A novel cascade RPN is proposed to refine proposals' scores and location. A constrained neg/pos anchor ratio further increases proposal recall dramatically. * Pretraining matters. We find that a pretrained global context branch increases mAP by over 3 points. * Training strategies. To attack the imbalance problem, we design a balanced sampling strategy over different classes. Other training strategies, like multi-scale training and online hard example mining are also applied. * Testing strategies. During inference, multi-scale testing, horizontal flipping and weighted box voting are applied. Based on an ImageNet DET pretrained model, we first finetune on COCO+VOC dataset, then finetune on VOC dataset only. | 2016-09-19 05:50:00 |
HyperNet_SP | HyperNet_SP | Intel Labs China | Tao Kong, Anbang Yao, Yurong Chen, Fuchun Sun | We train hyperNet for object detection. An ImageNet-pre-trained model (VGG-16) is used for training HyperNet, both for proposal and detection. The training data is VOC 2007 trainval, test and VOC 2012 trainval. The proposal num is 100 for each image. This is a speed up version of the basic HyperNet. We move the 3×3×4 convolutional layer to the front of ROI pooling layer. This slight change has two advantages: (a) The channel number of Hyper Feature maps has been significantly reduced (from 126 to 4). (b) The sliding window classifier is more simple (from Conv-FC to FC). Both two characteristics can speed up proposal generation process. The speed is 5 fps using VGG16. | 2015-10-28 07:36:14 |
HyperNet_VGG16 | HyperNet_VGG | Intel Labs China | Tao Kong, Anbang Yao, Yurong Chen, Fuchun Sun | We train hyperNet for object detection. An ImageNet-pre-trained model (VGG-16) is used for training HyperNet, both for proposal and detection. The training data is VOC 2007 trainval, test and VOC 2012 trainval. The proposal num is 100 for each image. | 2015-10-12 02:52:03 |
Implicit+Sink+Dilation | ICT_360_ISD | Institute of Computing Technology, Chinese Academy of Science | Yu Li, Min Lin, Sheng Tang, Shuicheng Yan | We update the method before | 2016-11-18 03:34:32 |
Improved Feature RCNN | IFRN_07+12 | Tsinghua MIG | Haofeng Zou, Guiguang Ding | add improved global & local feature in RCNN and use a iterative detection method | 2016-06-07 07:47:00 |
Inside-Outside Net | ION | Cornell University | Sean Bell, Larry Zitnick, Kavita Bala, Ross Girshick | Our "Inside-Outside Net" (ION) detector will be described soon in an arXiv submission. The method is based on Fast R-CNN with VGG16 and was trained on VOC 2012 train+val union VOC 2007 train+val (not VOC 2007 test), as well as the segmentations from SDS (Simultaneous Detection and Segmentation) on the training set images. We use the selective search boxes published with Fast R-CNN. Runtime: ~1.15s/image on a Titan X GPU (excluding proposal generation). | 2015-11-23 04:37:20 |
Light R-CNN | Light R-CNN | IGD | Peng | Light R-CNN | 2020-01-07 16:05:43 |
Light R-CNN | Light R-CNN | IGD | Light R-CNN | Light R-CNN | 2020-02-06 06:06:46 |
Light R-CNN | Light R-CNN | IGD | Peng | Light R-CNN with400 500 600 700 800 1500 proposals | 2020-01-15 15:17:15 |
Improving Localization Accuracy for Object Detecti | LocNet | ENPC | Spyros Gidaris, Nikos Komodakis | We propose a novel object localization methodology with the purpose of boosting the localization accuracy of state-of-the-art object detection systems. Our model, given a search region, aims at returning the bounding box of an object of interest inside this region. To accomplish its goal, it relies on assigning conditional probabilities to each row and column of this region, where these probabilities provide useful information regarding the location of the boundaries of the object inside the search region and allow the accurate inference of the object bounding box under a simple probabilistic framework. For implementing our localization model, we make use of a convolutional neural network architecture that is properly adapted for this task, called LocNet. We show experimentally that LocNet achieves a very significant improvement on the mAP for high IoU thresholds on PASCAL VOC2007 test set and that it can be very easily coupled with recent state-of-the-art object detection systems, helping them to boost their performance. | 2015-11-06 22:59:43 |
FRCN with multi-level feature and global context | MCC_FRCN, ResNet101, 07++12 | Harbin Institute of Technology Shenzhen Graduate School | Wang Yuan, You Lei | Our work is based on Faster R-CNN and ResNet101. (1) The low-level features are down-sampled using the convolution layer (stride 2), adjusted to the same size as the high-level features, and then merged for proposal and detection. (2) The context features are extracted from the entire image’s feature maps using ROI pooling layer, and then merged with the region’s features maps. (3) Weighted box voting are applied. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets only. | 2016-11-21 03:34:12 |
span | MLANet | HFUT | Jeremy | span | 2020-03-26 23:16:44 |
Multi-task Network Cascades | MNC baseline | Microsoft Research Asia | Jifeng Dai, Kaiming He, Jian Sun | Our Multi-task Network Cascades (MNCs) is described in arxiv paper "Multi-task Network Cascades for Instance-aware Semantic Segmentation" (http://arxiv.org/abs/1512.04412). The entry is based on MNCs and VGG-16 net. The training data is VOC 2007 trainval, test, and VOC 2012 trainval, augmented with the segmentation annotations from SBD ("Semantic contours from inverse detectors"). The overall runtime is 0.36sec/image on a K40 GPU. | 2015-12-15 14:06:18 |
MONet(VOC+COCO) | MONet(VOC+COCO) | USTC | Tao Gong | This entry is based on MONet and ResNet-101. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale training is applied on our model. Multi-scale testing and horizontal flipping are applied during inference. | 2018-04-01 13:02:52 |
Multi-Region & Semantic Segmentation-Aware CNN | MR_CNN_S_CNN | Universite Paris Est, Ecole des Ponts ParisTech | Spyros Gidaris, Nikos Komodakis | This entry is an implementation of the system described in "Object detection via a multi-region & semantic segmentation-aware CNN model" (http://arxiv.org/abs/1505.01749). The training data used for this entry are: 1) ImageNet for pre-training (of the 16-layers VGG-Net), 2) VOC2012 train set for fine-tuning of the deep models, and 3) VOC2012 train+val for training the detection SVMs. Abstract of "Object detection via a multi-region & semantic segmentation-aware CNN model": "We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization. We exploit the above properties of our recognition module by integrating it on an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model." | 2015-05-09 23:15:56 |
Multi-Region & Semantic Segmentation-Aware CNN | MR_CNN_S_CNN_MORE_DATA | Universite Paris Est, Ecole des Ponts ParisTech | Spyros Gidaris, Nikos Komodakis | This entry is an implementation of the system described in "Object detection via a multi-region & semantic segmentation-aware CNN model" (http://arxiv.org/abs/1505.01749). The training data used for this entry are: 1) ImageNet for pre-training (of the 16-layers VGG-Net), 2) VOC2007 train+val and VOC2012 train+val sets for fine-tuning the deep models and training the detection SVMs. Abstract of "Object detection via a multi-region & semantic segmentation-aware CNN model": "We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization. We exploit the above properties of our recognition module by integrating it on an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model." | 2015-06-06 15:49:11 |
Multi-Task Learning for Human Pose Estimation | Metu_Unified_Net | Middle East Technical University | Salih Karagoz, Muhammed Kocabas, Emre Akbas | Multi-Task Learning for Multi-Person Pose Estimation, Human Semantic Segmentation and Human Detection. The model works simultaneously. We just only trained with coco-dataset. No additional data has used. | 2018-03-10 12:39:37 |
The NIN extension of RCNN | NUS_NIN | NUS | Jian Dong, Qiang Chen, Min Lin, Shuicheng Yan | The entry is based on Ross Girshick's RCNN framework. We employ a single Network in Network [1] as the feature extractor to improve the model discriminative capability. We follow Girshick's RCNN protocal for training: (1) We used data from ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. This entry is used as the baseline for the journal version of [2]. [1] Min Lin, Qiang Chen, Shuicheng Yan. Network In Network. In ICLR 2014. [2] Jian Dong, Qiang Chen, Min Lin, Shuicheng Yan, Alan Yuille: Towards Unified Object Detection and Semantic Segmentation. | 2014-10-30 15:47:28 |
The NIN extension of RCNN | NUS_NIN_c2000 | NUS | Jian Dong, Qiang Chen, Min Lin, Shuicheng Yan | The entry is based on Ross Girshick's RCNN framework. We employ a single Network in Network [1] as the feature extractor to improve the model discriminative capability. We follow Girshick's RCNN protocal for training: (1) We used data from ILSVRC 2012 + 1000 extra categories of ImageNet to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. This entry is used as the baseline for the journal version of [2]. [1] Min Lin, Qiang Chen, Shuicheng Yan. Network In Network. In ICLR 2014. [2] Jian Dong, Qiang Chen, Min Lin, Shuicheng Yan, Alan Yuille: Towards Unified Object Detection and Semantic Segmentation. | 2014-10-30 15:45:29 |
NoC | Networks on Convolutional Feature Maps | Microsoft Research | Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun | This entry is an implementation of the system described in “Object Detection Networks on Convolutional Feature Maps” (http://arxiv.org/abs/1504.06066). We train a “Network on Convolutional feature maps” (NoC) for fast and accurate object detection. Training data for this entry include: (i) ImageNet data for pre-training (VGG-16); (ii) VOC 2007 trainval and 2012 trainval for training the NoC on pooled region features. Selective Search and EdgeBoxes are used for proposal. | 2015-04-17 17:21:10 |
Online Hard Example Mining for Fast R-CNN (VGG16) | OHEM+FRCN, VGG16 | Carnegie Mellon University, Facebook AI Research | Abhinav Shrivastava, Abhinav Gupta, Ross Girshick | We propose an online hard example mining (OHEM) algorithm to train region-based ConvNet detectors. This entry uses OHEM to train the Fast R-CNN (FRCN) object detection system. We use an ImageNet pre-trained VGG16 model and fine-tune it on VOC 2012 trainval dataset. For more details, please refer to 'Training Region-based Object Detectors with Online Hard Example Mining', CVPR 2016 (http://arxiv.org/abs/1604.03540). | 2016-04-18 05:16:35 |
Online Hard Example Mining for Fast R-CNN (VGG16) | OHEM+FRCN, VGG16, VOC+COCO | Carnegie Mellon University, Facebook AI Research | Abhinav Shrivastava, Abhinav Gupta, Ross Girshick | We propose an online hard example mining (OHEM) algorithm to train region-based ConvNet detectors. This entry uses OHEM to train the Fast R-CNN (FRCN) object detection system. We use an ImageNet pre-trained VGG16 model, use OHEM to fine-tune on COCO trainval set and further fine-tune on VOC 2012 trainval, VOC 2007 trainval and VOC 2007 test dataset. For more details, please refer to 'Training Region-based Object Detectors with Online Hard Example Mining', CVPR 2016 (http://arxiv.org/abs/1604.03540). | 2016-04-18 05:18:28 |
PACITYAI Detection | PACITYAIDetection | Ping An International Smart City Technology Co., Ltd. | Zhenxing Zhao | Faster RCNN(The backbone network is ResNeXt101 + DCN + FPN) pretrained on coco, Multi-scale train and test | 2019-09-26 04:05:36 |
PFPNet512 VGG16 07++12+COCO | PFPNet512 VGG16 07++12+COCO | Korea University | Seung-Wook Kim, Hyong-Keun Kook, Young-Hyun Kim, Ji-Young Sun, Sang-Won Lee, and Sung-Jea Ko | Our network model constructs a feature-pyramid along the network width via the spatial pyramid pooling (SPP) network. Different from object detectors using a feature pyramid across the network height, the feature-maps in the proposed feature pyramid are abstracted in parallel, and thus the detection performance on small-sized objects can be improved. The base network of our model is VGG-16 pretrained on the 1,000-class ImageNet classification training set. From this, the model is fine-tuned on the MS COCO trainval35k set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. | 2017-10-18 14:01:43 |
PFPNet512_ECCV | PFPNet512_ECCV | Korea Univ. | Seung-Wook Kim | VOC07++12+COCO multi-testing th: 0.01 | 2018-03-22 09:35:27 |
PLN | PLN | XXXX | Kaibing Chen, Xinggang Wang, Zilong Huang | Point Linking Network, trained only on pascal voc 07++12 dataset. | 2017-03-27 07:53:57 |
PSSNet(VOC+COCO) | PSSNet(VOC+COCO) | USTC | Tao Gong | This entry is based on PSSNet and ResNet-101. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale training is applied on our model. Multi-scale testing and horizontal flipping are applied during inference. | 2018-03-30 15:40:25 |
Faster R-CNN with PVANet (VOC+COCO) | PVANet+ | Intel Imaging and Camera Technology | Sanghoon Hong, Byungseok Roh, Kye-Hyeon Kim, Yeongjae Cheon, Minje Park | Based on Faster R-CNN with a network designed from scratch. The network is designed for efficiency and it takes less than 50 ms including proposal generation and detection (tested with 200 proposals on Titan X). The network is pre-trained with the ImageNet classification training set and fine-tuned with VOC2007/2012/MSCOCO trainval sets and VOC2007 test set. Only single-scale images are used while testing. Please refer to “PVANet: Lightweight Deep Neural Networks for Real-time Object Detection” (https://arxiv.org/abs/1611.08588) and https://github.com/sanghoon/pva-faster-rcnn for more details. | 2016-10-26 09:25:07 |
Faster R-CNN with PVANet (VOC+COCO) | PVANet+ (compressed) | Intel Imaging and Camera Technology | Sanghoon Hong, Byungseok Roh, Kye-Hyeon Kim, Yeongjae Cheon, Minje Park | Based on Faster R-CNN with a network designed from scratch. The network is designed for efficiency and it takes only 32 ms (30 fps) including proposal generation and detection (tested with 200 proposals on Titan X). The network is pre-trained with the ImageNet classification training set and fine-tuned with VOC2007/2012/MSCOCO trainval sets and VOC2007 test set. Only single-scale images are used while testing. Please refer to “PVANet: Lightweight Deep Neural Networks for Real-time Object Detection” (https://arxiv.org/abs/1611.08588) and https://github.com/sanghoon/pva-faster-rcnn for more details. | 2016-11-18 07:05:29 |
Region-based CNN | R-CNN | UC Berkeley | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | This entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524 version 5). Code is available at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. The same detection SVMs were used for the 2012 and 2010 results. For this submission, we used the 16-layer ConvNet from Simonyan & Zisserman instead of Krizhevsky et al.'s ConvNet. | 2014-10-25 21:09:52 |
Regions with Convolutional Neural Network Features | R-CNN | UC Berkeley | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | This entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524). We made two small changes relative to the arXiv tech report that are responsible for improved performance: (1) we added a small amount of context around each region proposal (16px at the warped size) and (2) we used a higher learning rate while fine-tuning (starting at 0.001). Aside from non-maximum suppression no additional post-processing (e.g., detector or image classification context) was applied. Code will be made available soon at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 train (3) We trained object detector SVMs using 2012 train+val The same detection SVMs were used for the 2012 and 2010 results. | 2014-01-30 01:46:58 |
Regions with Convolutional Neural Network Features | R-CNN (bbox reg) | UC Berkeley | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | This entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524). We made two small changes relative to the arXiv tech report that are responsible for improved performance: (1) we added a small amount of context around each region proposal (16px at the warped size) and (2) we used a higher learning rate while fine-tuning (starting at 0.001). Aside from non-maximum suppression no additional post-processing (e.g., detector or image classification context) was applied. Code will be made available soon at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 train (3) We trained object detector SVMs using 2012 train+val The same detection SVMs were used for the 2012 and 2010 results. This submission includes a simple regression from pool5 features to bounding box coordinates. | 2014-03-13 18:08:18 |
Region-based CNN | R-CNN (bbox reg) | UC Berkeley | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | This entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524 version 5). Code is available at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. The same detection SVMs were used for the 2012 and 2010 results. For this submission, we used the 16-layer ConvNet from Simonyan & Zisserman instead of Krizhevsky et al.'s ConvNet. | 2014-10-26 03:29:27 |
R-DAD (VOC07++12) | R-DAD (VOC07++12) | Incheon National University (INU) | Seung-Hwan Bae | We only use the VOC dataset for training (without using the COCO dataset). We use our region decomposition and assembly detector (R-DAD) based on ResNet152 for this evaluation. | 2018-03-06 01:15:31 |
R-FCN, ResNet (VOC+COCO) | R-FCN, ResNet (VOC+COCO) | Microsoft Research | Haozhi Qi*, Yi Li*, Jifeng Dai* (* equal contribution) | This entry is based on R-FCN [a] and ResNet-101. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. OHEM and multi-scale training are applied on our model. Multi-scale testing and horizontal flipping are applied during inference. [a] “R-FCN: Object Detection via Region-based Fully Convolutional Networks”, Jifeng Dai, Yi Li, Kaiming He, Jian Sun (http://arxiv.org/abs/1605.06409). | 2016-10-09 08:33:08 |
R-FCN, ResNet Ensemble(VOC+COCO) | R-FCN, ResNet Ensemble(VOC+COCO) | Microsoft Research | Haozhi Qi*, Yi Li*, Jifeng Dai* (* equal contribution) | This entry is based on R-FCN [a] and ResNet models. We utilize an ensemble of R-FCN models pre-trained on 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. OHEM and multi-scale training are applied on our model. Multi-scale testing and horizontal flipping are applied during inference. [a] “R-FCN: Object Detection via Region-based Fully Convolutional Networks”, Jifeng Dai, Yi Li, Kaiming He, Jian Sun (http://arxiv.org/abs/1605.06409). | 2016-10-09 08:45:02 |
R4D_faster_rcnn | R4D_faster_rcnn | Tsinghua university | zeming li gang yu | R4D_faster_rcnn | 2016-11-20 00:54:51 |
RFCN_DCN | RFCN_DCN | XXX | tester | RFCN_DCN | 2017-06-27 12:55:51 |
Region Proposal Network | RPN | Microsoft Research | Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun | This entry is an implementation of the system described in " Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" (arXiv 2015). An ImageNet-pre-trained model (VGG-16) is used for training a Region Proposal Network (RPN) and Fast R-CNN detector. The training data is VOC 2007 trainval, test and VOC 2012 trainval. The entire system takes <200ms per image, including proposal and detection. | 2015-06-01 10:29:23 |
RUN300_3WAY, VGG16, 07++12 | RUN300_3WAY, VGG16, 07++12 | Seoul National University | Kyoungmin Lee, Jaeseok Choi, Jisoo Jeong, Nojun Kwak | We focused on solving a structural contradiction and enhancing the contextual information of the multi-scale feature maps. We propose a network, based on SSD, using ResBlock and deconvolution layers to enrich the representation power of feature maps. In addition, a unified prediction module is applied to generalize output result. It takes 15.6ms for Titan X Pascal GPU, which indicates that it maintains the advantage of fast computation of a single stage detector.(https://arxiv.org/abs/1707.05031) | 2017-09-26 04:26:07 |
RUN_3WAY_300, VGG16, 07++12+COCO | RUN_3WAY_300, VGG16, 07++12+COCO | Seoul National University | Kyoungmin Lee, Jaeseok Choi, Jisoo Jeong, Nojun Kwak | We fine-tuned RUN 3WAY model trained using VGG16 on MS COCO. (https://arxiv.org/abs/1707.05031) | 2017-10-13 03:17:59 |
RUN_3WAY_512, VGG16, 07++12 | RUN_3WAY_512, VGG16, 07++12 | Seoul National University | Kyoungmin Lee, Jaeseok Choi, Jisoo Jeong, Nojun Kwak | We focused on solving a structural contradiction and enhancing the contextual information of the multi-scale feature maps. We propose a network, based on SSD, using ResBlock and deconvolution layers to enrich the representation power of feature maps. In addition, a unified prediction module is applied to generalize output result. It takes 15.6ms for Titan X Pascal GPU, which indicates that it maintains the advantage of fast computation of a single stage detector.(https://arxiv.org/abs/1707.05031) | 2017-10-22 04:10:01 |
Rank of experts (VOC07++12) | Rank of experts (VOC07++12) | Incheon National University (INU) and Electronics and Telecommunications Research Institute (ETRI) | Seung-Hwan Bae (INU), Youngjoo Jo (ETRI) and Youngwan Lee (ETRI) | We only use the VOC dataset for training (without using the COCO dataset). We train the three types of convolutional detectors for this challenge: (1) Faster RCNN type 1: We use the pre-trained resnet101/152/269 models as CLS-Net. We then add region proposal networks to the CLS-Net. (2) Faster RCNN type 2: We apply a resizing method with bilinear interpolation on resnet152 model instead of ROI Pooling. The method also is used to make new hyper-feature layer. (3) SSD type: We use DSSD with VGGNet and SSDwith WR-Inception Network. Network ensemble: To ensemble the results, we combined the detections results of models our Rank of Experts algorithm. After that, the soft-NMS has been performed. | 2017-11-15 14:31:16 |
Single-Shot Refinement Neural Network | RefineDet (VOC+COCO,single model,VGG16,one-stage) | CASIA | Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li | We propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to (1) filter out negative anchors to reduce search space for the classifier, and (2) coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as the input from the former to further improve the regression and predict multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes and class labels of objects in the object detection module. The multi-task loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency. Code is available at https://github.com/sfzhang15/RefineDet. | 2018-03-16 05:52:32 |
Res101+FasterRCNN | Res101+FasterRCNN(COCO+0712trainval) | Meitu | Kang Yang | I use ResNet-101 + FasterRCNN train on COCO, fine tuning on voc_2007_tranval+voc_2012_trainval, test on voc_2012_test | 2017-02-05 03:16:39 |
Res101+hyper+FasterRCNN(COCO+0712trainval) | Res101+hyper+FasterRCNN(COCO+0712trainval) | Meitu | Kang Yang | I use Res101+hyper+FasterRCNN(COCO+0712trainval) | 2017-02-10 03:03:50 |
SDS | SDS | UC Berkeley | Bharath Hariharan Pablo Arbelaez Ross Girshick Jitendra Malik | We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [1]), introducing a novel architecture tailored for SDS. We then use category-specific, top-down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 4 point boost (8% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work. | 2014-07-21 22:46:22 |
cascaded mil for wsod | SGCM | Institute of computing, Chinese Academy of Sciences | Yan Gao,Boxiao Liu | SGCM is a segmentation guided cascade MIL for weakly supervised object detection method. which use a cascade mil architecture to detect more complete data | 2019-03-09 09:03:43 |
Modified_FasterRCNN_v2 | SHS_Faster_RCNN_Upgrade_v2 | SHS | zhg peng | A modified faster RCNN is used as backbone, and multi-scale voting is applied. The model is pre-trained on ImageNet 1K dataset, fine-tuned on COCO detection base, and finally fine-tuned on VOC 0712 data sets. | 2019-02-25 23:58:50 |
feature refinement SSD | SSD based method | DLUT | Novak | We propose to use roi align to extract proposal features in SSD | 2018-10-24 02:25:54 |
SSD300 | SSD300 VGG16 07++12 | Google, UNC Chapel Hill, Zoox | Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg | We train SSD model using VGG16 on 300 x 300 input image. The training data is VOC07 trainval + test and VOC12 trainval. The inference speed is 59 FPS on Titan X with batch size 8 or 46 FPS with batch size 1. We only test a single model on a single scale image (300x300), and don't have any post-processing steps. Check out our code and more details at: https://github.com/weiliu89/caffe/tree/ssd | 2016-10-18 17:53:04 |
SSD300ft | SSD300 VGG16 07++12+COCO | Google, UNC Chapel Hill, Zoox | Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg | We first train SSD300 model using VGG16 on MS COCO trainval35k, then fine-tune it on VOC07 trainval + test and VOC12 trainval for the 20 PASCAL classes. | 2016-10-03 07:08:37 |
SSD512 | SSD512 VGG16 07++12 | Google, UNC Chapel Hill, Zoox | Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg | We train SSD model using VGG16 on 512 x 512 input image. The training data is VOC07 trainval + test and VOC12 trainval. The inference speed is 22 FPS on Titan X with batch size 8 or 19 FPS with batch size 1. We only test a single model on a single scale image (512x512), and don't have any post-processing steps. Check out our code and more details at: https://github.com/weiliu89/caffe/tree/ssd | 2016-10-13 17:28:35 |
SSD512ft | SSD512 VGG16 07++12+COCO | Google, UNC Chapel Hill, Zoox | Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg | We first train SSD512 model using VGG16 on MS COCO trainval35k, then fine-tune it on VOC07 trainval + test and VOC12 trainval for the 20 PASCAL classes. We only test a single model on a single scale image (512x512), and don't have any post-processing steps. | 2016-10-10 19:35:42 |
GCFE_RCNN | Sogou_MM_GCFE_RCNN(ensemble model) | Sogou Inc | Hongyuan Zhang, Bin Li | We proposed “Global concatenating feature enhancement network for instance segementation”, 1) Our model is pre-trained on ImageNet, fine-tuned on the MS COCO; 2) Fine-tuned on Pascal VOC. 3) ResNeXt152 with FPN is used as our backbone; 4) We also use multi-scale training strategy | 2018-09-25 03:44:24 |
GCFE_RCNN | Sogou_MM_GCFE_RCNN(single model) | Sogou Inc | Hongyuan Zhang, Bin Li | We proposed “Global concatenating feature enhancement network for instance segementation”, 1) Our model is pre-trained on ImageNet, fine-tuned on the MS COCO; 2) Fine-tuned on Pascal VOC. 3) ResNeXt152 with FPN is used as our backbone; 4) We also use multi-scale training strategy | 2018-09-25 03:43:13 |
Fine-grained search using R-CNN with StructObj | UMICH_FGS_STRUCT | University of Michigan & Zhejiang University | Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak Lee | We performed the Bayesian optimization based fine-grained search (FGS) using the R-CNN detector trained with structured objective: (1) We used the 16-layer network pretrained by VGG group. (2) We finetuned the network with softmax classifier using VOC2012 detection trainval set. (3) Structured SVMs are trained using VOC2012 trainval as object detector. (4) FGS is applied based on the R-CNN initial solutions. (5) Bounding box regression is adopted. Please refer to this paper for details: Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak Lee, “Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction”, CVPR 2015. | 2015-06-20 21:39:43 |
VIM_SSD(COCO+07++12, single model) | VIM_SSD | VimicroAI | Min Yang, Guo Ai, YunDong Zhang | This entry is based on SSD and VGG16. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale testing and horizontal flipping are applied during inference. | 2018-05-11 10:10:59 |
VIM_SSD | VIM_SSD(COCO+07++12, single model, one-stage) | VimicroAI | Min Yang, Guo Ai, YunDong Zhang | This entry is based on SSD and VGG16. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale testing and horizontal flipping are applied during inference. | 2018-06-27 14:09:40 |
Imorove the detection performance of WSOD with edg | WSODE | Jiangnan University | Wenlong Gao, Ying Chen, Yong Peng | Improve WSOD with edge information | 2020-12-17 14:26:31 |
detection | WithoutFR_CEP | zzu | suhuqi | detection | 2021-09-23 06:25:21 |
You Only Look Once: Unified, Real-Time Detection | YOLO | University of Washington | Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi | We train a convolutional neural network to perform end-to-end object detection. Our network processes the full image and outputs multiple bounding boxes and class probabilities. At test time we process images in real-time at 45fps. For more information and example code see: http://pjreddie.com/darknet/yolo/ | 2015-11-06 07:36:38 |
YOLOv1 | YOLOv1 | Jiangxi University of Science and Technology | lijiajun | office YOLOv1 from pjreddie | 2021-09-16 10:03:33 |
YOLOv1-resnet-18-50 | YOLOv1-resnet-18-50 | personal | Haoyun Qin | reimplementation of yolo v1 with tricks applied. switched backbone to resnet18-cmp3 and resnet50-cmp4. | 2022-05-13 12:24:19 |
YOLOv2 | YOLOv2 | University of Washington | Joe Redmon, Ali Farhadi | We use a variety of tricks to increase the performance of YOLO including dimension cluster priors and multi-scale training. Details at https://pjreddie.com/yolo/ | 2017-02-23 16:37:58 |
YOLOv2 (VOC + COCO) | YOLOv2 (VOC + COCO) | University of Washington | Joseph Redmon, Ali Farhadi | We use a variety of tricks to increase the performance of YOLO including dimension cluster priors and multi-scale training. Details at https://pjreddie.com/yolo/ | 2017-10-21 18:07:57 |
YOLOv2-resnet-18-101 | YOLOv2-resnet-18-101 | personal | Haoyun Qin | reimplementation of yolo v2 using pytorch and resnet | 2022-05-18 10:34:21 |
as | as | as | as | as | 2019-11-14 07:27:50 |
COCO+VOC | fasterRCNN+COCO+VOC+MCC | none | fasterRCNN+COCO+VOC+MCC | fasterRCNN+COCO+VOC+MCC | 2017-07-23 13:54:24 |
innovisgroup | innovisgroup Faster R-CNN | innovisgroup | yanjichen | This network is based on Faster R-CNN. | 2018-05-22 14:56:57 |
CNN with Segmentation and Context Cues | segDeepM | University of Toronto | Yukun Zhu, Ruslan Salakhutdinov, Raquel Urtasun, Sanja Fidler | segDeepM on PASCAL2012, w/ bounding box regression | 2016-03-04 19:28:43 |
shufflenetv2_yolov3 | shufflenetv2_yolov3 | PQLabs | Xiuyang Lei | optimal yolov3 with adjusted shufflenetv2. trained with 07++12 data, backbone pre-trained with imagenet. the whole model only has 3.0bflops | 2020-02-25 06:25:18 |
semi supervised pcl | ss-pcl | Huazhong University of Science and Technology | WanYusen | semi supervised pcl detection results after 17499 training iterates while attenuation coefficient is 0.9 and compensation coefficient is 1.1 | 2021-12-20 08:25:57 |
semi-supervised pcl | ss-pcl | Huazhong University of Science and Technology | WanYusen | semi-supervised pcl after 24999 training iterates while attenuation coefficient is 0.9, compensation coefficient is 1.1 | 2021-12-20 02:45:31 |
semi supervised pcl | ss-pcl | Huazhong University of Science and Technology | Wan Yusen | semi supervised pcl detection results after 19999 training iterates while attenuation coefficient is 0.9 and compensation coefficient is 1.1. | 2021-12-20 02:37:55 |
semi-supervised pcl | ss-pcl | Huazhong University of Science and Technology | Yusen Wan | semi-supervised pcl detection results after 17499 iterates training | 2021-12-15 07:53:33 |
semi-supervised pcl | ss-pcl | Huazhong University of Science and Technology | WanYusen | semi-supervised pcl detection results after 24999 training iterate | 2021-12-18 04:06:21 |
tencent_retail_ft:DET | tencent_retail_ft:DET | tencent_retail_ft | XingXing Wang | muti-test and muti-train,using voc2007+voc2012+mscoco daataset,firstly i train model using mscoco,then funetuning on the voc2007 and voc2012 dataset,resnet152 as backbone using feature map fusion, using focal loss and so on. | 2019-01-21 15:43:45 |
CloudMinds CV&AR Detection | CM-CV&AR: DET | CloudMinds | Xiaoya Zhu, Yibing Nan, Wenqi Wang | CMDET is pre-trained on ImageNet dataset, fine-tuned on the MS COCO detection dataset. We use ResNeXt-101 as the backbone network, we adopt deformable convolution in the last stage of the backbone. Multi-scale + random flip techniques are used during training, in each iteration, the scale of short edge is randomly sampled from [400,1400], and the scale of long edge is fixed as 1600; while in the testing phase, Multi-scale techniques are used, and we use NMS to combine different scale results. | 2019-08-20 10:47:35 |
Feature Edit with CNN features | Feature Edit | The University of FUDAN | Zhiqiang Shen, Xiangyang Xue et al. | We edit 5th CNN features with the network defined by Krizhevsky(2012), then add the new features to original feature set. Two stages are contained to find out the variables to inhibit. Step one is to find out the largest variance of subset within a class and step two is to find out ones with smallest inter-class variance. This edit operation is to handle the separation of different properties. A linear-SVM is boosted to classify the proposal regions and a bounding-box regression is also employed to reduce the localization errors. | 2014-09-06 15:58:29 |
Deep poselets | Poselets2 | Fei Yang, Rob Fergus | Poselets trained with CNN. Ran original poselets on a large set of images, collected weakly labelled training data, trained a convolutional neural net and applied it to the test data. This method allows for training deep poselets without the need of lots of manual keypoint annotations. Poselets trained with CNN. Ran original poselets on a large set of images, collected weakly labelled training data, trained a convolutional neural net and applied it to the test data. THis field seems to be broken. Really you want that long of a description?? | 2014-06-06 14:02:45 |