PASCAL VOC Challenge performance evaluation and download server |
|
Home | Leaderboard |
mean | aero plane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | dining table | dog | horse | motor bike | person | potted plant | sheep | sofa | train | tv/ monitor | submission date | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NAS Yolo [?] | 86.5 | 92.9 | 92.7 | 88.4 | 78.0 | 78.1 | 90.8 | 89.7 | 94.5 | 74.3 | 92.8 | 71.9 | 93.2 | 94.5 | 92.9 | 92.3 | 67.0 | 92.1 | 77.7 | 92.4 | 84.9 | 09-May-2020 | |
Conical R-CNN [?] | 85.8 | 92.9 | 91.1 | 85.5 | 79.5 | 75.6 | 87.0 | 88.7 | 95.3 | 71.2 | 89.8 | 72.8 | 94.4 | 93.1 | 92.8 | 92.2 | 71.0 | 90.7 | 78.5 | 92.1 | 82.2 | 29-Oct-2020 | |
RTPnet [?] | 84.4 | 92.0 | 89.6 | 86.8 | 75.3 | 74.0 | 87.1 | 88.5 | 95.6 | 67.3 | 90.4 | 68.1 | 94.4 | 91.8 | 91.8 | 91.6 | 69.3 | 90.5 | 73.7 | 90.7 | 79.9 | 23-Feb-2022 | |
BOE_IOT_AIBD_method_improved [?] | 83.8 | 90.4 | 90.0 | 82.8 | 77.4 | 76.8 | 89.5 | 85.9 | 93.3 | 73.0 | 86.7 | 68.4 | 92.7 | 92.5 | 90.6 | 90.3 | 69.1 | 84.1 | 73.3 | 90.3 | 78.9 | 27-Nov-2019 | |
Improved yolo-v3 [?] | 83.7 | 91.8 | 89.3 | 86.3 | 73.9 | 71.1 | 87.1 | 88.0 | 95.1 | 68.7 | 88.6 | 68.7 | 93.2 | 91.0 | 90.9 | 89.9 | 63.2 | 89.8 | 74.3 | 90.2 | 83.3 | 15-Nov-2019 | |
Model_ori_1 [?] | 83.3 | 92.3 | 89.6 | 85.0 | 76.3 | 78.1 | 86.2 | 89.0 | 91.1 | 68.5 | 86.3 | 66.3 | 91.0 | 91.5 | 90.2 | 91.6 | 67.0 | 86.7 | 71.2 | 88.7 | 78.5 | 28-Oct-2021 | |
Stronger-yolo [?] | 83.3 | 91.9 | 89.1 | 82.5 | 75.2 | 72.9 | 87.3 | 87.8 | 91.0 | 71.3 | 85.1 | 70.0 | 90.0 | 90.8 | 90.3 | 91.4 | 67.5 | 86.4 | 74.6 | 89.9 | 81.5 | 12-Jun-2019 | |
SSOD_07_12_unlabel_07_12 [?] | 82.6 | 91.0 | 88.8 | 84.2 | 71.8 | 71.4 | 87.0 | 88.0 | 94.0 | 65.7 | 86.6 | 66.8 | 93.0 | 90.4 | 90.8 | 90.3 | 63.2 | 88.2 | 72.7 | 90.5 | 78.2 | 22-Apr-2021 | |
FCASA-detection [?] | 82.4 | 90.9 | 87.2 | 83.8 | 72.3 | 72.0 | 86.3 | 87.7 | 90.2 | 69.8 | 85.1 | 71.2 | 89.7 | 90.0 | 89.3 | 90.6 | 61.1 | 85.3 | 75.1 | 89.5 | 80.1 | 05-Aug-2019 | |
DOLO [?] | 81.3 | 91.7 | 87.3 | 83.1 | 69.1 | 71.1 | 85.7 | 86.6 | 93.4 | 64.4 | 85.5 | 65.9 | 92.2 | 88.5 | 89.0 | 88.7 | 61.0 | 86.0 | 71.0 | 87.4 | 77.4 | 21-Sep-2018 | |
COS-DET [?] | 81.3 | 91.7 | 87.2 | 82.1 | 71.6 | 68.6 | 86.9 | 85.3 | 93.1 | 63.8 | 86.8 | 66.0 | 92.0 | 90.4 | 88.4 | 88.8 | 61.2 | 86.8 | 73.8 | 88.1 | 73.7 | 26-Apr-2019 | |
ASSD513 [?] | 81.3 | 92.1 | 89.2 | 82.5 | 71.5 | 60.4 | 85.5 | 84.8 | 93.9 | 63.7 | 88.6 | 67.4 | 92.6 | 90.2 | 89.0 | 86.5 | 60.4 | 88.2 | 73.4 | 88.6 | 77.0 | 18-Aug-2018 | |
FastX-RCNN [?] | 81.1 | 89.7 | 86.4 | 84.1 | 70.9 | 73.1 | 84.6 | 85.5 | 94.3 | 64.7 | 85.3 | 62.2 | 93.4 | 90.2 | 88.8 | 89.9 | 62.1 | 83.8 | 71.2 | 88.7 | 73.9 | 06-Jul-2018 | |
SSOD_25 [?] | 81.0 | 90.5 | 87.8 | 81.9 | 69.7 | 69.4 | 86.6 | 87.1 | 93.1 | 63.6 | 86.0 | 65.9 | 91.7 | 88.2 | 88.5 | 89.4 | 59.8 | 85.3 | 70.8 | 89.0 | 75.8 | 12-Apr-2021 | |
DFL-Net [?] | 80.2 | 91.4 | 88.5 | 80.6 | 67.3 | 58.0 | 86.1 | 84.2 | 94.5 | 64.9 | 85.6 | 62.0 | 92.8 | 89.1 | 89.6 | 86.1 | 59.1 | 85.5 | 75.4 | 87.4 | 76.2 | 22-Jun-2020 | |
RockDetector-1 [?] | 79.9 | 88.6 | 85.6 | 81.7 | 69.3 | 64.0 | 82.0 | 80.9 | 94.2 | 64.1 | 84.3 | 65.9 | 93.7 | 88.8 | 86.6 | 87.2 | 61.6 | 83.5 | 72.9 | 88.1 | 75.4 | 08-Nov-2019 | |
SSOD_100 [?] | 78.6 | 89.4 | 85.3 | 80.3 | 64.4 | 66.8 | 84.2 | 85.9 | 92.8 | 60.3 | 82.8 | 62.2 | 89.3 | 86.3 | 88.7 | 88.2 | 56.9 | 82.2 | 69.8 | 84.2 | 71.6 | 12-Apr-2021 | |
FMFPD [?] | 78.0 | 88.1 | 84.9 | 82.8 | 64.8 | 62.8 | 82.2 | 82.2 | 94.1 | 59.7 | 81.0 | 59.8 | 92.9 | 86.2 | 83.2 | 86.4 | 57.1 | 83.1 | 68.5 | 84.5 | 74.8 | 19-May-2020 | |
SSOD_25_real [?] | 77.8 | 88.3 | 86.5 | 80.6 | 66.3 | 65.4 | 83.7 | 84.5 | 91.8 | 57.5 | 81.2 | 61.6 | 91.2 | 84.4 | 86.5 | 87.3 | 53.5 | 82.3 | 66.8 | 87.1 | 69.0 | 22-Apr-2021 | |
refine_denseSSD [?] | 77.5 | 89.8 | 85.8 | 77.0 | 64.4 | 56.7 | 83.7 | 81.8 | 92.1 | 60.9 | 83.8 | 63.2 | 89.6 | 85.9 | 88.1 | 85.3 | 54.7 | 82.3 | 64.6 | 88.2 | 72.4 | 14-May-2018 | |
FPNSSD [?] | 77.0 | 90.3 | 78.8 | 81.7 | 67.1 | 53.4 | 79.5 | 80.5 | 93.8 | 59.9 | 85.8 | 61.8 | 92.5 | 81.7 | 84.1 | 80.8 | 56.1 | 84.8 | 69.2 | 87.4 | 71.2 | 29-Mar-2018 | |
TCnet [?] | 76.6 | 86.6 | 83.1 | 78.5 | 65.6 | 61.1 | 80.8 | 80.3 | 91.7 | 56.3 | 80.1 | 61.8 | 90.5 | 86.1 | 84.0 | 83.4 | 56.6 | 79.7 | 70.0 | 84.5 | 71.9 | 02-May-2018 | |
TCnet [?] | 76.5 | 86.8 | 82.7 | 78.5 | 65.3 | 60.2 | 79.6 | 80.0 | 91.0 | 56.9 | 80.9 | 61.3 | 90.2 | 86.8 | 84.2 | 83.1 | 55.4 | 80.3 | 70.0 | 84.7 | 71.7 | 29-Mar-2018 | |
ASSD321 [?] | 76.4 | 89.6 | 84.3 | 76.7 | 64.5 | 49.3 | 81.7 | 77.0 | 92.2 | 57.8 | 81.3 | 64.0 | 91.6 | 86.5 | 85.8 | 82.1 | 53.0 | 80.0 | 70.9 | 87.2 | 71.8 | 20-Aug-2018 | |
ATLSSD [?] | 74.8 | 87.6 | 82.7 | 72.0 | 62.2 | 57.5 | 83.1 | 83.8 | 86.9 | 56.2 | 76.3 | 60.6 | 84.4 | 80.4 | 84.9 | 85.9 | 50.1 | 81.1 | 65.5 | 84.9 | 70.1 | 26-Mar-2018 | |
DSD [?] | 74.5 | 87.9 | 82.0 | 74.8 | 61.9 | 51.5 | 82.1 | 81.1 | 89.8 | 55.8 | 78.5 | 58.3 | 86.8 | 82.3 | 82.7 | 83.4 | 49.2 | 79.5 | 69.1 | 85.0 | 69.2 | 19-Jul-2018 | |
Augment_part1 [?] | 74.0 | 88.8 | 81.6 | 74.8 | 61.9 | 68.5 | 82.0 | 84.8 | 87.0 | 53.2 | 77.0 | 51.5 | 82.9 | 79.2 | 82.3 | 85.6 | 54.3 | 77.5 | 55.3 | 83.5 | 69.4 | 21-Oct-2021 | |
dsa_1050 [?] | 73.9 | 87.4 | 82.0 | 72.9 | 60.7 | 51.8 | 80.7 | 76.8 | 90.1 | 54.0 | 78.7 | 60.0 | 89.1 | 83.5 | 83.3 | 81.4 | 49.7 | 75.7 | 64.2 | 85.2 | 70.5 | 18-Nov-2017 | |
DSOD v2 [?] | 72.9 | 86.8 | 82.5 | 69.0 | 57.4 | 47.1 | 81.2 | 77.8 | 88.7 | 54.8 | 75.5 | 60.4 | 85.2 | 82.0 | 85.4 | 82.4 | 45.0 | 75.3 | 68.2 | 84.3 | 69.2 | 24-Jun-2018 | |
MA-SSD [?] | 72.9 | 87.0 | 81.4 | 71.2 | 59.4 | 49.0 | 81.3 | 74.4 | 88.2 | 55.5 | 78.2 | 61.2 | 85.9 | 82.7 | 82.7 | 80.3 | 46.5 | 76.6 | 66.8 | 83.7 | 66.2 | 01-Aug-2018 | |
GRP-DSOD320 [?] | 72.5 | 87.2 | 82.0 | 67.1 | 57.3 | 46.1 | 81.1 | 78.0 | 88.6 | 54.0 | 75.1 | 58.7 | 84.5 | 82.6 | 85.4 | 82.3 | 45.8 | 75.9 | 67.1 | 84.3 | 66.7 | 19-Nov-2017 | |
ssd [?] | 72.2 | 86.9 | 80.1 | 68.9 | 57.2 | 47.4 | 81.0 | 73.2 | 89.1 | 53.8 | 75.5 | 61.5 | 86.4 | 81.9 | 84.2 | 79.1 | 46.1 | 75.7 | 66.6 | 84.1 | 65.1 | 01-Aug-2018 | |
Origin_pretrain_40k [?] | 71.9 | 89.2 | 76.3 | 73.7 | 61.5 | 66.3 | 81.3 | 83.1 | 86.1 | 49.3 | 63.4 | 47.7 | 84.3 | 75.9 | 79.3 | 84.2 | 52.5 | 79.3 | 55.0 | 81.6 | 68.4 | 22-Oct-2021 | |
DSOD (single model) [?] | 70.8 | 86.4 | 80.2 | 65.5 | 55.7 | 42.4 | 80.3 | 75.3 | 86.6 | 51.1 | 72.3 | 60.5 | 83.9 | 80.5 | 83.6 | 80.4 | 42.7 | 72.4 | 67.3 | 83.1 | 66.2 | 21-Jan-2018 | |
Attention-SSD-vgg [?] | 69.0 | 85.1 | 76.7 | 67.7 | 55.2 | 43.8 | 77.3 | 69.2 | 85.9 | 52.2 | 72.9 | 56.5 | 83.0 | 78.3 | 80.6 | 75.8 | 44.2 | 73.0 | 61.3 | 80.9 | 60.8 | 20-May-2018 | |
SSD [?] | 64.0 | 78.9 | 72.3 | 61.8 | 42.8 | 27.9 | 73.1 | 69.4 | 84.9 | 42.5 | 68.4 | 52.2 | 80.9 | 76.5 | 77.2 | 68.2 | 31.6 | 67.0 | 66.6 | 77.3 | 60.9 | 10-Jun-2017 | |
DCONV_SSD_FCN [?] | 62.8 | 77.9 | 70.6 | 62.9 | 46.5 | 28.6 | 69.7 | 63.1 | 83.6 | 42.1 | 66.6 | 52.3 | 79.6 | 72.8 | 77.2 | 67.7 | 33.0 | 66.0 | 60.2 | 78.1 | 57.9 | 17-Mar-2018 | |
sd [?] | 62.7 | 80.2 | 86.6 | 74.3 | 46.8 | 17.7 | 82.3 | 72.0 | 83.7 | 30.2 | 75.8 | 54.3 | 83.2 | 87.2 | 84.8 | 53.8 | 22.4 | 78.0 | 43.5 | 84.8 | 12.4 | 10-Apr-2024 | |
THU_ML_class [?] | 62.4 | 78.0 | 71.0 | 64.5 | 47.4 | 45.3 | 70.1 | 70.6 | 82.0 | 37.9 | 65.4 | 44.2 | 77.4 | 69.6 | 74.4 | 75.5 | 37.9 | 62.0 | 45.5 | 73.8 | 56.3 | 03-Jun-2017 | |
yolo [?] | 62.1 | 79.8 | 72.1 | 55.3 | 44.9 | 43.1 | 71.5 | 72.3 | 75.1 | 42.1 | 61.3 | 45.8 | 73.4 | 70.9 | 76.2 | 79.3 | 35.2 | 67.4 | 49.1 | 71.5 | 56.1 | 28-Sep-2019 | |
yolo [?] | 59.4 | 76.0 | 68.1 | 51.3 | 40.0 | 39.1 | 69.8 | 66.7 | 74.0 | 39.8 | 56.2 | 47.8 | 70.5 | 70.4 | 75.1 | 75.7 | 31.9 | 61.6 | 52.4 | 68.0 | 54.4 | 28-Sep-2019 | |
YOLOv2-resnet-18-101 [?] | 56.1 | 74.3 | 66.4 | 59.4 | 37.0 | 34.4 | 65.1 | 63.3 | 74.4 | 38.5 | 53.3 | 40.9 | 68.4 | 61.7 | 68.0 | 68.9 | 30.2 | 51.7 | 47.7 | 66.7 | 52.0 | 18-May-2022 | |
YOLOv2 [?] | 48.8 | 69.5 | 61.6 | 37.6 | 28.2 | 18.8 | 63.2 | 53.2 | 65.6 | 27.5 | 44.4 | 35.9 | 61.4 | 57.9 | 66.9 | 63.8 | 16.8 | 52.8 | 39.5 | 65.4 | 46.2 | 01-Dec-2016 | |
DENSE_BOX [?] | 45.9 | 64.7 | 64.1 | 28.8 | 26.7 | 30.7 | 60.6 | 54.9 | 47.4 | 29.3 | 41.8 | 34.6 | 42.6 | 59.3 | 64.2 | 62.5 | 24.3 | 53.7 | 27.1 | 50.9 | 50.7 | 07-Jul-2015 | |
PITT_WSOD_INC2 [?] | 45.1 | 74.2 | 49.8 | 56.0 | 32.5 | 22.0 | 55.1 | 49.8 | 73.4 | 20.4 | 47.8 | 32.0 | 39.7 | 48.0 | 62.6 | 8.6 | 23.7 | 52.1 | 52.5 | 42.9 | 59.1 | 14-Mar-2019 | |
YOLOv1-resnet-18-50 [?] | 44.5 | 64.3 | 54.2 | 47.4 | 26.8 | 16.6 | 55.4 | 44.3 | 66.5 | 23.1 | 38.1 | 38.5 | 62.9 | 57.6 | 60.8 | 45.0 | 15.2 | 33.3 | 43.9 | 60.0 | 37.2 | 13-May-2022 | |
NoC [?] | 42.2 | 62.8 | 60.4 | 26.7 | 22.3 | 25.7 | 56.9 | 55.2 | 52.1 | 21.5 | 38.3 | 34.2 | 43.9 | 51.2 | 58.8 | 40.7 | 20.4 | 42.0 | 37.4 | 52.6 | 41.6 | 26-Apr-2015 | |
HybridCodingApe [?] | 40.9 | 61.8 | 52.0 | 24.6 | 24.8 | 20.2 | 57.1 | 44.5 | 53.6 | 17.4 | 33.0 | 38.3 | 42.8 | 48.8 | 59.4 | 35.7 | 22.8 | 40.3 | 39.5 | 51.1 | 49.5 | 23-Sep-2012 | |
Data Decomposition and Distinctive Context [?] | 40.9 | 55.0 | 58.1 | 22.5 | 18.8 | 33.9 | 57.6 | 54.5 | 42.6 | 20.2 | 40.3 | 29.3 | 37.1 | 54.6 | 58.3 | 51.6 | 14.7 | 44.8 | 32.1 | 51.7 | 41.0 | 13-Oct-2011 | |
segDPM [?] | 40.7 | 59.1 | 54.3 | 28.2 | 24.4 | 34.5 | 53.4 | 48.1 | 51.3 | 18.1 | 37.8 | 29.9 | 40.4 | 48.9 | 52.9 | 46.4 | 16.1 | 39.5 | 35.4 | 50.8 | 44.9 | 24-Feb-2014 | |
NYU-UCLA_Hierarchy [?] | 40.6 | 56.3 | 55.9 | 23.4 | 20.3 | 27.2 | 56.6 | 48.1 | 53.8 | 23.3 | 32.9 | 33.4 | 39.2 | 53.0 | 56.9 | 43.6 | 14.3 | 37.9 | 39.4 | 52.6 | 43.7 | 13-Oct-2011 | |
Fisher with FLAIR [?] | 40.6 | 61.7 | 52.0 | 27.9 | 24.0 | 18.9 | 56.5 | 45.3 | 53.4 | 15.5 | 34.6 | 36.3 | 42.3 | 48.4 | 57.9 | 36.6 | 24.3 | 40.6 | 38.0 | 49.8 | 49.0 | 17-Jun-2014 | |
DenseYolo [?] | 39.4 | 60.2 | 48.7 | 26.1 | 18.0 | 18.1 | 54.3 | 47.6 | 50.0 | 23.1 | 37.2 | 28.9 | 43.1 | 47.3 | 56.3 | 56.0 | 11.9 | 41.8 | 28.5 | 50.1 | 41.1 | 15-May-2017 | |
DPM-MKL [?] | 39.1 | 59.6 | 54.5 | 21.9 | 21.6 | 32.1 | 52.5 | 49.3 | 40.8 | 19.1 | 35.2 | 28.9 | 37.2 | 50.9 | 49.9 | 46.1 | 15.6 | 39.3 | 35.6 | 48.9 | 42.8 | 23-Sep-2012 | |
DPM-MK [?] | 38.3 | 56.0 | 53.3 | 19.2 | 17.3 | 25.8 | 53.1 | 45.4 | 44.5 | 20.1 | 32.1 | 28.1 | 37.2 | 52.3 | 56.6 | 43.3 | 12.1 | 34.3 | 37.6 | 51.8 | 45.2 | 13-Oct-2011 | |
NEC_STANFORD_OCP [?] | 36.7 | 65.1 | 46.8 | 25.0 | 24.6 | 16.0 | 51.0 | 44.9 | 51.5 | 13.0 | 26.6 | 31.0 | 40.2 | 39.7 | 51.5 | 32.8 | 12.6 | 35.7 | 33.5 | 48.0 | 44.8 | 23-Sep-2012 | |
Detector-Merging [?] | 36.5 | 47.2 | 50.2 | 18.3 | 21.4 | 25.2 | 53.3 | 46.3 | 46.3 | 17.5 | 27.8 | 30.3 | 35.0 | 41.6 | 52.1 | 43.2 | 18.0 | 35.2 | 31.1 | 45.4 | 44.4 | 23-Sep-2012 | |
MISSOURI_HOGLBP_MDPM_CONTEXT [?] | 36.4 | 51.4 | 53.7 | 18.3 | 15.6 | 31.6 | 56.5 | 47.1 | 38.6 | 19.5 | 32.0 | 22.1 | 25.0 | 50.3 | 51.9 | 44.9 | 11.9 | 37.7 | 30.6 | 50.9 | 39.3 | 23-Sep-2012 | |
NUS_Context_SVM [?] | 36.2 | 51.4 | 52.9 | 20.1 | 15.8 | 26.9 | 53.0 | 45.6 | 37.6 | 15.3 | 36.0 | 25.1 | 32.6 | 50.4 | 55.8 | 36.8 | 12.3 | 37.6 | 30.5 | 48.1 | 41.0 | 05-Oct-2011 | |
SelectiveSearchMonkey [?] | 35.5 | 56.9 | 43.4 | 16.6 | 15.8 | 18.0 | 52.3 | 38.3 | 49.0 | 12.2 | 29.7 | 32.8 | 36.7 | 45.7 | 54.4 | 30.4 | 16.2 | 37.2 | 34.7 | 45.9 | 44.2 | 13-Oct-2011 | |
CVC_DET [?] | 34.1 | 45.4 | 49.8 | 15.7 | 16.0 | 26.3 | 54.6 | 44.8 | 35.1 | 16.8 | 31.3 | 23.6 | 26.0 | 45.6 | 49.6 | 42.2 | 14.5 | 30.5 | 28.5 | 45.7 | 40.0 | 23-Sep-2012 | |
UOCTTI_LSVM_MDPM [?] | 33.6 | 53.2 | 53.9 | 13.1 | 13.5 | 30.5 | 55.5 | 51.2 | 31.7 | 14.5 | 29.0 | 16.0 | 22.1 | 43.1 | 50.3 | 46.4 | 8.8 | 33.0 | 22.9 | 45.8 | 38.2 | 12-Oct-2011 | |
TREE--MAX-POOLING [?] | 32.9 | 43.8 | 51.7 | 13.7 | 12.7 | 27.3 | 51.5 | 43.7 | 32.9 | 18.3 | 27.3 | 18.5 | 23.1 | 45.2 | 48.6 | 42.9 | 11.6 | 32.4 | 27.5 | 47.0 | 39.3 | 13-Oct-2011 | |
LCC-TREE-CODING [?] | 32.4 | 41.1 | 51.7 | 13.7 | 11.9 | 27.3 | 52.1 | 41.7 | 32.9 | 17.6 | 27.3 | 18.5 | 23.1 | 45.2 | 48.6 | 41.9 | 11.6 | 32.4 | 27.5 | 44.2 | 38.3 | 13-Oct-2011 | |
SVM-HOG [?] | 31.5 | 47.5 | 51.7 | 14.2 | 12.6 | 27.3 | 51.8 | 44.2 | 25.3 | 17.8 | 30.2 | 18.1 | 16.9 | 46.9 | 50.9 | 43.0 | 9.5 | 31.2 | 23.6 | 44.3 | 22.1 | 22-Sep-2012 | |
Configurable And-Or Tree Model [?] | 29.5 | 50.2 | 47.0 | 7.9 | 3.8 | 24.8 | 47.2 | 42.8 | 31.2 | 17.5 | 24.2 | 10.0 | 21.3 | 43.5 | 46.4 | 37.5 | 7.9 | 26.4 | 21.5 | 43.1 | 36.7 | 23-Sep-2012 | |
lSVM-Viewpoint [?] | 20.9 | 42.5 | 43.7 | 5.4 | 4.8 | 18.1 | 28.6 | 36.6 | 24.2 | 12.6 | 20.6 | 4.5 | 17.5 | 15.2 | 38.2 | 7.9 | 1.7 | 23.2 | 7.1 | 41.0 | 25.7 | 13-Oct-2011 | |
Geometric shape [?] | - | - | 3.8 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 19-Jun-2016 | |
UOCTTI_WL-SSVM_GRAMMAR [?] | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 49.2 | - | - | - | - | - | 12-Oct-2011 | |
CMIC-GS-DPM [?] | - | - | - | - | 13.3 | 26.4 | - | 41.5 | - | - | - | 12.2 | - | - | 41.6 | - | 8.3 | 31.4 | - | - | - | 13-Oct-2011 | |
CMIC-Synthetic-DPM [?] | - | 40.4 | 47.8 | - | 11.4 | 23.7 | 48.9 | 40.9 | 23.5 | 11.9 | 25.5 | - | 10.9 | 42.0 | 38.7 | 40.7 | 7.5 | 30.4 | - | 38.4 | 34.8 | 13-Oct-2011 | |
Struct_Det_CRF [?] | - | 37.1 | 42.6 | 2.0 | - | 16.0 | 43.8 | 38.6 | 17.0 | 10.3 | 7.7 | 2.4 | 1.5 | 34.3 | 41.1 | 38.4 | 1.5 | 14.7 | 5.3 | 35.4 | 27.1 | 13-Oct-2011 |
Title | Method | Affiliation | Contributors | Description | Date |
---|---|---|---|---|---|
ASSD321 | ASSD321 | rutgers | jingru yi, pengxiang wu | input resolution: 321x321 | 2018-08-20 02:34:00 |
ASSD513 | ASSD513 | Rutgers | Jingru Yi, pengxiang wu | input resolution: 513x513 | 2018-08-18 12:26:28 |
ATLSSD | ATLSSD | ATL(Alibaba Turing Labs) | Xuan Jin | SSD-based method trained on VOC2012 | 2018-03-26 07:48:08 |
softmax with Attention on vgg for detection | Attention-SSD-vgg | CSUST | Jia | We select the box which boxes >0.5. we added the attention on the SSD model | 2018-05-20 11:12:40 |
Augment_part1 | Augment_part1 | University of Information Technology VNU-HCM | Phan Tung Huynh Thi My Duyen | Augment_part1 | 2021-10-21 18:57:30 |
BOE_IOT_AIBD_method_improved | BOE_IOT_AIBD_method_improved | BOE_IOT_AIBD | Xu Jingtao | BOE_IOT_AIBD_method_improved | 2019-11-27 03:29:33 |
Single-stage detector trained by step-SGDR. | COS-DET | ZUIYOU Inc | Tabsun, Ma Baoyuan, Li Yong, Li Xiaosong | I designed a new step-SGDR method which is the most important innovation and it boosts the mAP almost 0.6 compared with step-decay strategy. An important point is how to judge the overfit point. As for the backbone I used the darknet-53 while some common methods like distort/random crop/random flip/mix-up for the data augmentation. Also multi-scale testing and horizontal flip test really help. Some common methods like softNMS do not make sense in my experiments. On a single 1080Ti the model runs at almost 15fps. | 2019-04-26 12:04:21 |
Color_HOG based detector with BOW classifier | CVC_DET | Computer Vision Center Barcelona | Fahad Khan, Camp Davesa, Joost van de Weijer, Rao Muhammad Anwer, Albert Gordo, Pep Gonfaus, Ramon Baldrich, Antonio Lopez | We use our Color-HOG based part detector [1]. The detection results are combined with our CVC_CLS submission. References: 1. Fahad shahbaz khan, Rao Muhammad Anwer, Joost van de Weijer, Andrew D. Bagdanov, Maria Vanrell, Antonio M. Lopez. Color Attributes for Object Detection. In CVPR 2012. | 2012-09-23 18:53:20 |
Dynamic And-Or Tree Learning For Object Detection | Configurable And-Or Tree Model | Sun Yat-Sen University | Xiaolong Wang, Liang Lin, Lichao Huang, Xinhui Zhang, Zechao Yang | We propose a novel hierarchical model for object detection, namely "And-Or tree", which is a configurable by introducing the “switch” variables (i.e. the or-nodes) accounting for intra-class object variance. This model comprises three layers: a batch of leaf-nodes in bottom for localizing object parts; the or-nodes for activating several leaf-nodes to specify a composition of parts; a root-node verifying object holistic distortion. For model training , a novel discriminative learning algorithm is proposed to explicitly determine the structural configuration (e.g., the production of leaf-nodes associated with the or-nodes) along with the optimization of multi-layer parameters. The response of model integrates the bottom-up testings via the leaf-nodes and or-nodes with the global verification via the root-node. In the implementation, we apply the histograms of gradients(HOG) as the image feature. Object detection is achieved by scanning the sub-windows over different scales and locations of the image. The final decisions are further rescored by a context model encoding the inter-object spatial interactions. | 2012-09-23 16:02:13 |
A Conical R-CNN for object detections | Conical R-CNN | Xidian University | Yang Li, Licheng Jiao, Xu Liu, Fang Liu, Fanhua Shang, GouLiang Ma | Conical R-CNN employs conical features for detection. The spatial information can be exploited effectively. This model is fine-tuned on the COCO detection model. We use multi-scale training. | 2020-10-29 07:22:04 |
dssd style arch | DCONV_SSD_FCN | shanghai university | li junhao(jxlijunhao@163.com) | combine object detection and semantic segmentation in one forward pass | 2018-03-17 02:58:20 |
DenseBoxCNN | DENSE_BOX | Baidu IDL | Lichao Huang | I train a VGG16-liked convolutional neural network to perform end-to-end object detection. This network can processes the full image and outputs multiple bounding boxes and class confidence score simultaneously. The training data used in this entry is VOC2012 trianval only. | 2015-07-07 05:39:05 |
A Distinguishable Features Learning Network for On | DFL-Net | USTC | geroci@mail.ustc.edu.cn {wansh, jpq}@ustc.edu.cn | DFL-Net: One-Stage Anchor-Based Object Detection via Distinguishable Feature Learning | 2020-06-22 08:29:46 |
YOLO V3 with dynamic constraint for objectness | DOLO | Tencent MIG YYB & USTC BDAA LAB | Chen Joya, Bin Luo, XueZheng Peng, Tong Xu | We present DOLO, which is based on a state-of-the-art object detection method YOLO V3. We have improved it by our dynamic constraint strategy. Furthermore, we use a simple SNIP (Scale Normalization for Image Pyramids) strategy in our training. While inference, our square weaken method are adopted for multi-scale and flip testing. | 2018-09-21 10:34:36 |
The DPM-MKL baseline | DPM-MKL | Oxford | Ross Girshick, Andrea Vedaldi, Karen Simonyan | This method is similar to last year DPM-MKL entry. We updated several aspects of the implementation (e.g. th type of features). | 2012-09-23 23:05:18 |
DSD | DSD | Cainiao | Duliang Haiwa | DSD | 2018-07-19 14:43:34 |
DSOD | DSOD (single model) | Intel | Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong chen, Xiangyang Xue. | The training data is VOC 2012 trainval set without ImageNet pre-trained models or any other additional dataset. The input image size is 300x300. More details can be referred to our paper: "DSOD: Learning Deeply Supervised Object Detectors from Scratch". | 2018-01-21 06:13:56 |
DSOD v2 | DSOD v2 | UIUC | Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen and Xiangyang Xue | Training from scratch without pre-trained models. The input size is 300x300. | 2018-06-24 05:56:41 |
Yolo with dense grid and high level features | DenseYolo | University Politehnica Bucharest | Paul Urziceanu | N\A | 2017-05-15 10:54:16 |
Detector_Weighting | Detector-Merging | University of Amsterdam | Sezer Karaoglu, Fahad Shahbaz Khan, Koen van de Sande, Jan van Gemert, Rao Muhammad Anwer, , Jasper Uijlings, Camp Davesa, Joost van de Weijer, Theo Gevers, Cees Snoek | We use a bounding box merging scheme that exploits the results from different independent detectors. Each detector results in a ranked list of BB, which is not directly comparable with other detectors. We merge the detectors with a weighting scheme based on hold-out performance. For input, we use the standard Felzenszwalb gray HOG detector [1] ; the color-HOG detector of CVC [2] which introduces color information within the part based detection framework; and a slightly improved version of the SelectiveSearch detector [3] by the UvA submitted to VOC 2011. [1] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detection with Discriminatively Trained Part Based Models. In TPAMI, Vol. 32, No. 9, Sep. 2010 [2] Fahad shahbaz khan, Rao Muhammad Anwer, Joost van de Weijer, Andrew D. Bagdanov, Maria Vanrell, Antonio M. Lopez. Color Attributes for Object Detection. In CVPR 2012. [3] Segmentation As Selective Search for Object Recognition Koen E. A. van de Sande, Jasper R. R. Uijlings, Theo Gevers, Arnold W. M. Smeulders. In ICCV, 2011 | 2012-09-23 22:51:19 |
Full convolution attention selectiv | FCASA-detection | DHAI | xiangming.zhou kai.fu guodong.wu | We propose a novel architecture of object detection. We use full convolution networks as the multistep rpn networks. This kind of architecture proposes rois base on the previous step. So it avoids the unbalanced between positive and negative samples. Meanwhile,this kind of architecture can improve the recall of detection,because the rois are filtered by multistep rpn networks,the remaining rois are more reliable.And we also use soft-nms for scoring our objects,and GIOU loss for the location loss.Our architecture can apply to any single-stage detector. By using the same backbone networks we trained yolov3,ssd with and without our architecture , it shows that by using our architecture will boost mAp almost 5% on PASCAL VOC data set.. | 2019-08-05 10:43:51 |
Detection Network Based on Function Maintenance | FMFPD | University of Chinese Academy of Sciences | Chengqi Xu | This module maintains the high-level strong semantic information more effectively, so that the lower level feature maps also have strong semantic features and the presentation ability of small object is also greatly enhanced. At the same time, the accuracy of detection is improved by using the two-stage features of the network to describe the objects. | 2020-05-19 14:33:57 |
FPNSSD | FPNSSD | sogou.com | Kuang Liu | FPNSSD trained on VOC12 | 2018-03-29 10:38:04 |
Faster RCNN with ResNext | FastX-RCNN | Yi+AI Lab | Hang Zhang, Boyuan Sun, Zhaonan Wang, Hao Zhao, ZiXuan Guan, Wei Miao | Faster RCNN + RoIAlign + ResNeXt152 + SoftNMS + Multi-Scale Training + Multi-Scale Testing; | 2018-07-06 04:04:00 |
Fisher with FLAIR | Fisher with FLAIR | University of Amsterdam | Koen van de Sande, Cees Snoek, Arnold Smeulders | Run for our CVPR2014 paper "Fisher and VLAD with FLAIR", see http://koen.me/research/flair | 2014-06-17 11:47:29 |
Gated Recurrent Feature Pyramids | GRP-DSOD320 | UIUC | Zhiqiang Shen, Honghui Shi, Rogerio Feris, Liangliang Cao, Shuicheng Yan, Ding Liu, Xinchao Wang, Xiangyang Xue, Thomas S. Huang | We train GRP-DSOD for object detection. The training data is VOC 2012 trainval set without ImageNet pre-trained models or any other additional dataset. The input image size is 320x320. More details can be referred to our paper: "Learning Object Detection from Scratch with Gated Recurrent Feature Pyramids". | 2017-11-19 22:13:59 |
Diamond Frame Bicycle Recognition | Geometric shape | National Cheng Kung University | Chung-Ping Young, Yen-Bor Lin, Kuan-Yu Chen | Bicycle of diamond frame detector for side-view image is proposed based on the observation that a bicycle consists of two wheels in the form of ellipse shapes and a frame in the form of two triangles. Through the design of geometric constraints on the relationship between the triangles and ellipses, the computation is fast comparing to the feature-based classifiers. Besides, the training process is unnecessary and only single image is required for our algorithm. The experimental results are also given in this paper to show the practicability and the performance of the proposed bicycle model and bicycle detection algorithm. | 2016-06-19 10:06:33 |
Hybrid Coding for Selective Search | HybridCodingApe | ksande@uva.nl | Koen E. A. van de Sande Jasper R. R. Uijlings Cees G. M. Snoek Arnold W. M. Smeulders | We have improved significantly over last years method from [1] with a hybrid bag-of-words using average and difference coding, a first in object detection. Briefly, the method of [1], instead of exhaustive search, which was dominant in the Pascal VOC 2010 and 2011 detection challenge, uses segmentation as a sampling strategy for selective search (cf. the ICCV paper). We use a small set of data-driven, class-independent, high quality object locations (coverage of 96-99% of all objects in the VOC2007 test set). Because we have only a limited number of locations to evaluate, this enables the use of more computationally expensive features, such as bag-of-words using average and difference coding strategies. While difference coding is an order of magnitude more expensive than average, we are still able to efficiently train a detection system for it due to several optimizations in the descriptor coding and the kernel classification runtime. As low-level features, we use new complementary color descriptors. Finally, the detection system is fused with classification scores found using most telling example selection from [2]. [1] "Segmentation as Selective Search for Object Recognition"; Koen E. A. van de Sande, Jasper R. R. Uijlings, Theo Gevers, Arnold W. M. Smeulders; 13th International Conference on Computer Vision, 2011. [2] "The Most Telling Window for Image Classification"; Jasper R. R. Uijlings, Koen E. A. van de Sande, Arnold W. M. Smeulders, Theo Gevers, Nicu Sebe, Cees G. M. Snoek; PASCAL VOC Challenge Workshop 2011 at ICCV, 2011. | 2012-09-23 21:01:35 |
Improved yolo-v3 | Improved yolo-v3 | horizon | xianfeng tan | Improved yolo-v3 | 2019-11-15 10:30:19 |
MA-SSD | MA-SSD | MA-SSD | MA-SSD | MA-SSD | 2018-08-01 09:02:09 |
HOGLBP with Mixture DPM and Context | MISSOURI_HOGLBP_MDPM_CONTEXT | The University of Missouri-Columbia | Guang Chen, Miao Sun, Xutao Lv, Yan Li, Tony X. Han | HOG-LBP features [1] are incorporated in the deformable part model [2]. Deformable model is further improved by using the learned multiple anchor positions so that the possible locations for each part are modeled as a mixture of Gaussian distribution. For part and root filters, PCA is adopted to denoise and accelerate the detection speed. We proposed a permutation matrix method to add the model symmetry constraints during the feature selection, which effectively takes advantage of the symmetry property existing in most of the object categories and avoids the overfitting. Contextual information including image class label estimation, segmentation estimation, color histogram of ROI, and objects location priors, and correlations between the object detectors are used to leverage the final detection results to a very large extent: there are lots of contextual information and correlational information among objects that can be used to boost the detection performance. For example, trains and buses are objects bearing some visual similarities. But none of the large objects can coexist in the same location. So detection scores are correlated and we use the inference on Bayesian networks to further improve the detection results. [1] Xiaoyu Wang, Tony X. Han and Shuicheng Yan, “An HOG-LBP Human Detector with Partial Occlusion Handling,” IEEE International Conference on Computer ICCV 2009), Kyoto, 2009. [2] Girshick, R. B. and Felzenszwalb, P. F. and McAllester, D. : Discriminatively Trained Deformable Part Models, Release 5 | 2012-09-23 21:27:16 |
Resnet-101-FPN | Model_ori_1 | UIT | Phan Tung Hu?nh Th? M? Duyên | Resnet-101-FPN | 2021-10-28 10:07:36 |
Using NAS Enhance Yolo | NAS Yolo | PA-Occam-Platform | Jian Yang, Zhenhou Hong, Xiaoyang Qu, Jianzong Wang, Jing Xiao | NAS-YoLo is an objection detection model that introduces automatic data augmentation and neural architecture search(NAS) into a state-of-the-art YoLo model. The automatic data augmentation uses a reinforcement learning-based controller to find the best augmentation policies for the target data-set. The neural architecture search algorithm is developed from a one-shot NAS method with a parallel divide-and-conquer based evolutionary algorithm. Besides, an SMBO-based auto-tuning algorithm is used to yield better hyper-parameter combinations for the NAS-YoLo. | 2020-05-09 08:00:13 |
Object-centric pooling | NEC_STANFORD_OCP | NEC Laboratories America and Stanford University | Olga Russakovsky Xiaoyu Wang Shenghuo Zhu Li Fei-Fei Yuanqing Lin | Object-centric pooling (OCP) is a method which represents a bounding box by pooling the coded low-level descriptors on the foreground and background separately and then concatenating them (Russakovsky et al. ECCV 2012). This method exploits powerful classification features that have been developed in the past years. In this system, we used DHOG and LBP as low-level descriptors. We developed a discriminative LCC coding scheme in addition to traditional LCC coding. We make use of candidate bounding boxes (van de Sande et al. ICCV 2011). | 2012-09-23 22:47:43 |
Networks on Convolutional Feature Maps | NoC | Microsoft Research | Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun | This entry is an implementation of the system described in “Object Detection Networks on Convolutional Feature Maps” (http://arxiv.org/abs/1504.06066). This model is trained on HoG feature only. Training data for this entry is voc 2012 trainval set. Selective Search is used for proposal. | 2015-04-26 09:47:29 |
Origin_pretrain_40k | Origin_pretrain_40k | University of Information Technology VNU-HCM | Phan Tung Huynh Thi My Duyen | Origin_pretrain_40k | 2021-10-22 09:14:20 |
Weakly supervised detection using inception-v2 | PITT_WSOD_INC2 | University of Pittsburgh | Keren Ye, Mingda Zhang, Wei Li, Danfeng Qin, Adriana Kovashka, Jesse Berent | Weakly supervised detection using inception-v2 | 2019-03-14 05:19:35 |
Region transformer pyramid network | RTPnet | Xidian University | Li Yang | RTPNet contains positional embedding units (PEU), self region transformers (Self RT) and down region transformers (Down RT). We adopt multi-scales training strategy. Specifically, we first randomly sample a scale from 600 to 1000 with step 100, and then the shorter edge of an input image is resized to the sampled scale. We constrain the longer edge of the resized image within 1666. | 2022-02-23 08:06:29 |
RockDetector-1 | RockDetector-1 | RocKontrol | Chen Li, Hui Wan | RockDetector-1-based method trained on VOC2012 | 2019-11-08 14:56:51 |
SSD | SSD | THU | SSD | SSD | 2017-06-10 04:47:11 |
SSOD_07_12_unlabel_07_12 | SSOD_07_12_unlabel_07_12 | HW Ascend | xuqiang | SSOD_07_12_unlabel_07_12 | 2021-04-22 13:12:55 |
SSOD_100 | SSOD_100 | HW Ascend | xuqiang | 07_100_12_100 | 2021-04-12 12:18:51 |
SSOD_25 | SSOD_25 | HW Ascend | xuqiang | 07_25_12_25 | 2021-04-12 12:21:26 |
SSOD_25_real | SSOD_25_real | HW Ascend | xuqiang | voc_07_25_12_25_real | 2021-04-22 13:11:55 |
SVM classifier using HOG?V2? | SVM-HOG | Orange Labs Beijing, France Telecom | Zhao Feng | Our object detection system is based on the Discriminatively Trained Deformable Part Models, Release 5. It is our first attempt for VOC challange. We do not make much modifications to the baseline system provided in http://people.cs.uchicago.edu/~rbg/latent/. The submitted results are obtained by applying post-processings of both bounding box prediction and contextual rescoring. | 2012-09-22 20:06:39 |
Stronger-yolo | Stronger-yolo | central south university | Zhihong Xiao | Improve yolov3 with focal loss?KL loss?mix up?anchor-free and so on. | 2019-06-12 07:08:06 |
resnet101+softmax | TCnet | Tsinghua University | Yulin Liu | This is a model based on mask rcnn | 2018-03-29 12:02:15 |
TCnet | TCnet | Tsinghua University | Liu Yulin | TCnet | 2018-05-02 08:02:45 |
faster rcnn | THU_ML_class | Tsinghua University | training | faster rcnn | 2017-06-03 10:55:37 |
YOLOv1-resnet-18-50 | YOLOv1-resnet-18-50 | personal | Haoyun Qin | reimplementation of yolo v1 with tricks applied. switched backbone to resnet18-cmp3 and resnet50-cmp4. | 2022-05-13 12:24:19 |
YOLOv2 | YOLOv2 | University of Washington | Joe Redmon, Ali Farhadi | YOLOv2 runs a single detection network once on an image to detect objects. It predicts bounding boxes and objectness as well as class probabilities across a convolutional feature map. For more information see: http://pjreddie.com/darknet/yolo/ | 2016-12-01 21:15:21 |
YOLOv2-resnet-18-101 | YOLOv2-resnet-18-101 | personal | Haoyun Qin | reimplementation of yolo v2 using pytorch and resnet | 2022-05-18 10:34:21 |
dsa_tes | dsa_1050 | Nanjing University | AD | add cs | 2017-11-18 11:34:21 |
refine_denseSSD | refine_denseSSD | BUPT | Yongqiang Yao | refine_denseSSD | 2018-05-14 02:23:40 |
sd410 | sd | NorthWest University | sd | sd | 2024-04-10 07:45:49 |
ssd | ssd | ssd | ssd | ssd | 2018-08-01 09:31:10 |
yolo-all | yolo | shou | hfq | yolo3-608 | 2019-09-28 05:08:50 |
yolo-all | yolo | shou | hfq0219 | yolo3 | 2019-09-28 04:14:52 |
Synthetic Trainining for deformable parts model | CMIC-GS-DPM | Cairo Microsoft Innovation Center | Dr. Motaz El-Saban , Osama Khalil, Mostafa Izz, Mohamed Fathi | We introduce dataset augmentation using synthetic examples as a method for introducing novel variations not present in the original set. We make use of deformable parts-based model (Felzenszwalb et al 2010). We augment the training set with examples obtained by applying global scaling of the dataset examples. Global scaling includes no, up and down scaling with varying performance across different object classes. Technique selection is based upon performance on the validation set. The augmented dataset is then used to train parts-based detectors using HOG features (Dalal & Triggs 2006) and latent SVM. The resulting class models are applied on test images in a “sliding-window” fashion. | 2011-10-13 22:01:23 |
Synthetic Trainining for deformable parts model | CMIC-Synthetic-DPM | Cairo Microsoft Innovation Center | Dr. Motaz El-Saban , Osama Khalil, Mostafa Izz, Mohamed Fathi | We introduce dataset augmentation using synthetic examples as a method for introducing novel variations not present in the original set. We make use of deformable parts-based model (Felzenszwalb et al 2010). We augment the training set with examples obtained by relocating objects (having segmentation masks) to new backgrounds. New backgrounds used for relocation are selected using a set of techniques (no relocation, same image, “different” image or image with co-occurring objects). Performance of those techniques varies across classes according to the object class properties. For every class, we select the technique that achieves the highest AP on the validation set. The augmented dataset is then used to train parts-based detectors using HOG features (Dalal & Triggs 2006) and latent SVM. The resulting class models are applied on test images in a “sliding-window” fashion. | 2011-10-13 21:54:09 |
DPM with basic rescoring | DPM-MK | Oxford VGG | Andrea Vedaldi and Andrew Zisserman | This method uses a Deformable Part Model (our own implementation) to generate an initial (and very good) list of 100 candidate bounding boxes per image. These are then rescored by a multiple features model combining DPM scores with dense SP-BOW, geometry, and context. The SP-BOW model are dense SIFT features (vl_phow in VLFeat) quantized into 1200 visual words, 6x6 spatial layout, cell-by-cell l2 normalization after raising the entries to the 1/4 power (1/4-homogeneous Hellinger's kernel). The geometric model is a second order polynomial kernel on the bounding box coordinates. The context model is a second order polynomial kernels mixing the candidate DPM score with twenty scores obtained as the maximum response of the DPMs for the 20 classes in that image (like Felzenszwalb). A second context model is also added, using 20 scores from a state-of-the-art Fisher kernel image classifier (also on dense SIFT features), as described in Chatfileld et al. 2010. The SVM scores are passed through a sigmoid for standardization in the 0-1 interval; the sigmoid model is fitted to the truing data. The model is trained by means of a large scale linear SVM using the one-slack bundle formulation (aka SVM^perf). The solver hence uses retraining implicitly, and we make sure it reaches full convergence. | 2011-10-13 10:20:29 |
NLPR-Detection | Data Decomposition and Distinctive Context | Institute of Automation, Chinese Academy of Sciences | Junge Zhang, Yinan Yu, Yongzhen Huang, Chong Wang, Weiqiang Ren, Jinchen Wu, Kaiqi Huang and Tieniu Tan | Part based model has achieved great success in recent years. To our understanding, the original deformable part based model has several limits: 1) the computational complexity is very large, especially when it is extended to enhanced models via multiple features, more mixtures or flexible part models. 2) The original part based model is not “deformable” enough. To tackle these problems, 1) we propose a data decomposition based feature representation scheme for part based model in an unsupervised manner. The submitted method takes about 1~2 seconds per image from PASCAL VOC datasets on average while keeping high performance. We learn the basis from samples without any label information. The specific label independent rule followed in the submitted methods can be adapted into other variants of part based model such as hierarchical model or flexible mixture models. 2) We found that, each part corresponds to multiple possible locations, which is not reflected in the original part-based model. Accordingly, we propose that the locations of parts should obey the multiple Gaussian distribution. Thus, for each part we learn its optimal locations by clustering which are used to update the original anchors of the part-based model. The proposed method above can more effectively describe the deformation (pose and location variety) of objects’ parts. 3) We rescored the initial results by our distinctive context model including global and local and intra-class context information. Besides, segmentation provides strong indication for object’s presence, therefore, the proposed segmentation aware semantic attribute is applied in the final reasoning which indeed shows promising performance. | 2011-10-13 16:20:59 |
SVM classifier with LCC and tree coding | LCC-TREE-CODING | University of Missouri | Xiaoyu Wang Miao Sun Xutao Lv Shuai Tang Guang Chen Yan Li Tony X. Han | A two layers cascade structure for object detection. The first layer employs deformable model to select possible candidates for the second layer. The later layer takes location and global context augmented with LBP feature to improve the accuracy. A bag of words model enhanced with spatial pyramid and local coordilate coding is used to model the global context information. A hierachical tree structure coding is used to take care of the intra-class variation for each detection window. Linear SVM is used for classification. | 2011-10-13 17:13:43 |
Context-SVM based submission for 3 tasks | NUS_Context_SVM | National University of Singapore | Zheng Song, Qiang Chen, Shuicheng Yan | Classification uses the BoW framework. Dense-SIFT, HOG^2, LBP and color moment features are extracted. We then use VQ and fisher vector for feature coding and SPM and Generalized Pyramid Matching(GPM) to generate image representations. Context-aware features are also extracted based on [1]. The classification models are learnt via kernel SVM. Then final classification scores are refined with kernel mapping[2]. Detection and segmentation results use the baseline of [3] using HOG and LBP feature. And then based on [1], we further learn context model and refine the detection results. The final segmentation result uses the learnt average masks for each detection component learnt using segmentation training set to substitute the rectangle detection boxes. [1] Zheng Song*, Qiang Chen*, Zhongyang Huang, Yang Hua, and Shuicheng Yan. Contextualizing Object Detection and Classification. [2] http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/workshop/nuspsl.pdf [3] http://people.cs.uchicago.edu/~pff/latent/ | 2011-10-05 09:01:23 |
Latent Hierarchical Learning | NYU-UCLA_Hierarchy | NYU and UCLA | Yuanhao Chen, Li Wan, Long Zhu, Rob Fergus, Alan Yuille | Based on two recent publications: "Latent Hierarchical Structural Learning for Object Detection". Long Zhu, Yuanhao Chen, Alan Yuille, William Freeman. CVPR 2010. "Active Mask Hierarchies for Object Detection". Yuanhao Chen, Long Zhu, Alan Yuille. ECCV 2010 We present a latent hierarchical structural learning method for object detection. An object is represented by a mixture of hierarchical tree models where the nodes represent object parts. The nodes can move spatially to allow both local and global shape deformations. The image features are histograms of words (HOWs) and oriented gradients (HOGs) which enable rich appearance representation of both structured (eg, cat face) and textured (eg,cat body) image regions. Learning the hierarchical model is a latent SVM problem which can be solved by the incremental concave-convex procedure (iCCCP). Object detection is performed by scanning sub-windows using dynamic programming. The detections are rescored by a context model which encodes the correlations of 20 object classes by using both object detection and image classification. | 2011-10-13 22:21:11 |
Selective Search Detection System | SelectiveSearchMonkey | University of Amsterdam and University of Trento | Jasper R. R. Uijlings Koen E. A. van de Sande Arnold W. M. Smeulders Theo Gevers Nicu Sebe Cees Snoek | Based on "Segmentation as Selective Search for Object Recognition"; Koen E. A. van de Sande, Jasper R. R. Uijlings, Theo Gevers, Arnold W. M. Smeulders; 13th International Conference on Computer Vision, 2011. Instead of exhaustive search, which was dominant in the Pascal VOC 2010 detection challenge, we use segmentation as a sampling strategy for selective search (cf. our ICCV paper). Like segmentation, we use the image structure to guide our sampling process. However, unlike segmentation, we propose to generate many approximate locations over few and precise object delineations, as the goal is to cover all object locations. Our sampling is diversified to deal with as many image conditions as possible. Specifically, we use a variety of hierarchical region grouping strategies by varying colour spaces and grouping criteria. This results in a small set of data-driven, class-indepent, high quality object locations (coverage of 96-99% of all objects in the VOC2007 test set). Because we have only a limited number of locations to evaluate, this enables the use of the more computationally expensive bag-of-words framework for classification. Our bag-of-words implementation uses densely sampled SIFT and ColorSIFT descriptors. | 2011-10-13 20:45:25 |
Structured Detection and Segmentation CRF | Struct_Det_CRF | Oxford Brookes University | Jonathan Warrell, Vibhav Vineet, Paul Sturgess, Philip Torr | We form a hierarchical CRF which jointly models a pool of candidate detections and the multiclass pixel segmentation of an image. Attractive and repulsive pairwise terms are allowed between detection nodes (cf Desai et al, ICCV 2009), which are integrated into a Pn-Potts based hierarchical segmentation energy (cf Ladicky et al, ECCV 2010). A cutting-plane algorithm is used to train the model, using approximate MAP inference. We form a joint loss which combines segmentation and detection components (i.e. paying a penalty both for each pixel incorrectly labelled, and each false detection node which is active in a solution), and use different weightings of this loss to train the model to perform detection and segmentation. The segmentation results thus make use of the bounding box annotations. The candidate detections are generated using the Felzenschwalb et al. CVPR 2008/2010 detector, and as features for segmentation we use textons, SIFT, LBPs and the detection response surfaces themselves. | 2011-10-13 03:27:02 |
SVM classifier with tree max-pooling | TREE--MAX-POOLING | University of Missouri | Xiaoyu Wang, Miao Sun, Xutao Lv, Shuai Tang, Guang Chen, Yan Li ,Tony X. Han | A two layers cascade structure for object detection. The first layer employs deformable model to select possible candidates for the second layer. The later layer takes location and global context augmented with LBP feature to improve the accuracy. A bag of words model enhanced with spatial pyramid and local coordilate coding is used to model the global context information. A hierachical tree structure coding is used to take care of the intra-class variation for each detection window. Max-pooling is used for tree node assignment. Linear SVM is used for classification. | 2011-10-13 20:50:30 |
LSVM trained mixtures of deformable part models | UOCTTI_LSVM_MDPM | University of Chicago | Ross Girshick (University of Chicago), Pedro Felzenszwalb (Brown), David McAllester (TTI-Chicago) | Based on [1] http://people.cs.uchicago.edu/~pff/latent-release4 and [2] "Object Detection with Discriminatively Trained Part Based Models"; Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan; IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, September 2010. This entry is a minor modification of our publicly available "voc-release4" object detection system [1]. The system uses latent SVM to train mixtures of deformable part models using HOG features [2]. Final detections are refined using a context rescoring mechanism [2]. We extended [1] to detect smaller objects by adding an extra high-resolution octave to the HOG feature pyramid. The HOG features in this extra octave are computed using 2x2 pixel cells. Additional bias parameters are learned to help calibrate scores from detections in the extra octave with the scores of detections above this octave. This entry is the same as UOCTTI_LSVM_MDPM from the 2010 competition. Detection results are reported for all 20 object classes to provide a baseline for the 2011 competition. | 2011-10-12 16:09:55 |
Person grammar model trained with WL-SSVM | UOCTTI_WL-SSVM_GRAMMAR | University of Chicago | Ross Girshick (University of Chicago), Pedro Felzenszwalb (Brown), David McAllester (TTI-Chicago) | This entry is described in [1] "Object Detection with Grammar Models"; Ross B. Girshick, Pedro F. Felzenszwalb, David McAllester. Neural Information Processing Systems 2011 (to appear). We define a grammar model for detecting people and train the model’s parameters from bounding box annotations using a formalism that we call weak-label structural SVM (WL-SSVM). The person grammar uses a set of productions that represent varying degrees of visibility/occlusion. Object parts, such as the head and shoulder, are shared across all interpretations of object visibility. Each part is represented by a deformable mixture model that includes deformable subparts. An "occluder" part (itself a deformable mixture of parts) is used to capture the nontrivial appearance of the stuff that typically occludes people from below. We further refine detections using the context rescoring mechanism from the UOCTTI_LSVM_MDPM entry, using the results of that entry for the 19 non-person classes. | 2011-10-12 16:13:33 |
Using viewpoint cues to improve object recognition | lSVM-Viewpoint | Cornell | Joshua Schwartz Noah Snavely Daniel Huttenlocher | Our system is based on the Latent SVM framework of [1], including their context rescoring method. We train 6 component models with 8 parts. However, unlike [1], components are trained using a clustering based on an unsupervised estimation of 3D object viewpoint. In this sense, our approach is similar to the unsupervised approach in [2], which also seeks to estimate viewpoint, but our clustering is based on explicit reasoning about 3D geometry. Additionally, we add features based on estimated 3D scene geometry for context rescoring. Of note is the fact that a detection with our method gives rise to an explicit estimation of object viewpoint within a scene, rather than just a bounding box. [1] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. PAMI 2010 [2] C. Gu and X. Ren. Discriminative Mixture-of-Templates for Viewpoint Classification. ECCV 2010 | 2011-10-13 02:33:13 |
DPM that uses region segmentation features | segDPM | UofT, TTI-C, UCLA | Sanja Fidler, Roozbeh Mottaghi, Allan Yuille, Raquel Urtasun | DPM-style model that exploits bottom-up segmentation. We use CPMC to extract regions and CPMC-o2p to classify them. The output of the CPMC-o2p is then used as segmentation in our model. We propose a new model that blends between DPM (HOG appearance model) and segmentation. The model encourages each detection to fit tightly around a region. If there is no region, the detector will just go with the typical HOG score. In addition, we use context re-scoring based on object presence classifiers provided by NUS. Project page: http://www.cs.toronto.edu/~fidler/projects/segDPM.html | 2014-02-24 20:22:19 |