PASCAL VOC Challenge performance evaluation server

Detection Results: VOC2012 ^BETA

Competition "comp3" (train on VOC2012 data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.

The highest scoring entry in each column is shown in bold.
Clicking on the blue arrow symbol () at the top of a column will order the submissions from high to low wrt performance on that column.
Clicking on the orange arrow symbol () at the left hand side of a table will identify entries with performance equivalent to the selected submission. For each object class all entries equivalent to the selected submission are marked by orange highlighting.
gray highlighting with double stars (**) indicate that equivalent entries are not available for this submission yet.

Entries equivalent to a selected submission are determined by bootstrapping the performance measure, and assessing if the differences between the selected submission and the others are not statistically significant (see sec 3.5 in VOC 2014 paper).

Average Precision (AP %)

	mean	aero plane	bicycle	bird	boat	bottle	bus	car	cat	chair	cow	dining table	dog	horse	motor bike	person	potted plant	sheep	sofa	train	tv/ monitor	submission date
NAS Yolo ^[?]	86.5	92.9	92.7	88.4	78.0	78.1	90.8	89.7	94.5	74.3	92.8	71.9	93.2	94.5	92.9	92.3	67.0	92.1	77.7	92.4	84.9	09-May-2020
Conical R-CNN ^[?]	85.8	92.9	91.1	85.5	79.5	75.6	87.0	88.7	95.3	71.2	89.8	72.8	94.4	93.1	92.8	92.2	71.0	90.7	78.5	92.1	82.2	29-Oct-2020
RTPnet ^[?]	84.4	92.0	89.6	86.8	75.3	74.0	87.1	88.5	95.6	67.3	90.4	68.1	94.4	91.8	91.8	91.6	69.3	90.5	73.7	90.7	79.9	23-Feb-2022
BOE_IOT_AIBD_method_improved ^[?]	83.8	90.4	90.0	82.8	77.4	76.8	89.5	85.9	93.3	73.0	86.7	68.4	92.7	92.5	90.6	90.3	69.1	84.1	73.3	90.3	78.9	27-Nov-2019
Improved yolo-v3 ^[?]	83.7	91.8	89.3	86.3	73.9	71.1	87.1	88.0	95.1	68.7	88.6	68.7	93.2	91.0	90.9	89.9	63.2	89.8	74.3	90.2	83.3	15-Nov-2019
Model_ori_1 ^[?]	83.3	92.3	89.6	85.0	76.3	78.1	86.2	89.0	91.1	68.5	86.3	66.3	91.0	91.5	90.2	91.6	67.0	86.7	71.2	88.7	78.5	28-Oct-2021
Stronger-yolo ^[?]	83.3	91.9	89.1	82.5	75.2	72.9	87.3	87.8	91.0	71.3	85.1	70.0	90.0	90.8	90.3	91.4	67.5	86.4	74.6	89.9	81.5	12-Jun-2019
SSOD_07_12_unlabel_07_12 ^[?]	82.6	91.0	88.8	84.2	71.8	71.4	87.0	88.0	94.0	65.7	86.6	66.8	93.0	90.4	90.8	90.3	63.2	88.2	72.7	90.5	78.2	22-Apr-2021
FCASA-detection ^[?]	82.4	90.9	87.2	83.8	72.3	72.0	86.3	87.7	90.2	69.8	85.1	71.2	89.7	90.0	89.3	90.6	61.1	85.3	75.1	89.5	80.1	05-Aug-2019
DOLO ^[?]	81.3	91.7	87.3	83.1	69.1	71.1	85.7	86.6	93.4	64.4	85.5	65.9	92.2	88.5	89.0	88.7	61.0	86.0	71.0	87.4	77.4	21-Sep-2018
ASSD513 ^[?]	81.3	92.1	89.2	82.5	71.5	60.4	85.5	84.8	93.9	63.7	88.6	67.4	92.6	90.2	89.0	86.5	60.4	88.2	73.4	88.6	77.0	18-Aug-2018
COS-DET ^[?]	81.3	91.7	87.2	82.1	71.6	68.6	86.9	85.3	93.1	63.8	86.8	66.0	92.0	90.4	88.4	88.8	61.2	86.8	73.8	88.1	73.7	26-Apr-2019
FastX-RCNN ^[?]	81.1	89.7	86.4	84.1	70.9	73.1	84.6	85.5	94.3	64.7	85.3	62.2	93.4	90.2	88.8	89.9	62.1	83.8	71.2	88.7	73.9	06-Jul-2018
SSOD_25 ^[?]	81.0	90.5	87.8	81.9	69.7	69.4	86.6	87.1	93.1	63.6	86.0	65.9	91.7	88.2	88.5	89.4	59.8	85.3	70.8	89.0	75.8	12-Apr-2021
DFL-Net ^[?]	80.2	91.4	88.5	80.6	67.3	58.0	86.1	84.2	94.5	64.9	85.6	62.0	92.8	89.1	89.6	86.1	59.1	85.5	75.4	87.4	76.2	22-Jun-2020
RockDetector-1 ^[?]	79.9	88.6	85.6	81.7	69.3	64.0	82.0	80.9	94.2	64.1	84.3	65.9	93.7	88.8	86.6	87.2	61.6	83.5	72.9	88.1	75.4	08-Nov-2019
SSOD_100 ^[?]	78.6	89.4	85.3	80.3	64.4	66.8	84.2	85.9	92.8	60.3	82.8	62.2	89.3	86.3	88.7	88.2	56.9	82.2	69.8	84.2	71.6	12-Apr-2021
FMFPD ^[?]	78.0	88.1	84.9	82.8	64.8	62.8	82.2	82.2	94.1	59.7	81.0	59.8	92.9	86.2	83.2	86.4	57.1	83.1	68.5	84.5	74.8	19-May-2020
SSOD_25_real ^[?]	77.8	88.3	86.5	80.6	66.3	65.4	83.7	84.5	91.8	57.5	81.2	61.6	91.2	84.4	86.5	87.3	53.5	82.3	66.8	87.1	69.0	22-Apr-2021
refine_denseSSD ^[?]	77.5	89.8	85.8	77.0	64.4	56.7	83.7	81.8	92.1	60.9	83.8	63.2	89.6	85.9	88.1	85.3	54.7	82.3	64.6	88.2	72.4	14-May-2018
FPNSSD ^[?]	77.0	90.3	78.8	81.7	67.1	53.4	79.5	80.5	93.8	59.9	85.8	61.8	92.5	81.7	84.1	80.8	56.1	84.8	69.2	87.4	71.2	29-Mar-2018
TCnet ^[?]	76.6	86.6	83.1	78.5	65.6	61.1	80.8	80.3	91.7	56.3	80.1	61.8	90.5	86.1	84.0	83.4	56.6	79.7	70.0	84.5	71.9	02-May-2018
TCnet ^[?]	76.5	86.8	82.7	78.5	65.3	60.2	79.6	80.0	91.0	56.9	80.9	61.3	90.2	86.8	84.2	83.1	55.4	80.3	70.0	84.7	71.7	29-Mar-2018
ASSD321 ^[?]	76.4	89.6	84.3	76.7	64.5	49.3	81.7	77.0	92.2	57.8	81.3	64.0	91.6	86.5	85.8	82.1	53.0	80.0	70.9	87.2	71.8	20-Aug-2018
ATLSSD ^[?]	74.8	87.6	82.7	72.0	62.2	57.5	83.1	83.8	86.9	56.2	76.3	60.6	84.4	80.4	84.9	85.9	50.1	81.1	65.5	84.9	70.1	26-Mar-2018
DSD ^[?]	74.5	87.9	82.0	74.8	61.9	51.5	82.1	81.1	89.8	55.8	78.5	58.3	86.8	82.3	82.7	83.4	49.2	79.5	69.1	85.0	69.2	19-Jul-2018
Augment_part1 ^[?]	74.0	88.8	81.6	74.8	61.9	68.5	82.0	84.8	87.0	53.2	77.0	51.5	82.9	79.2	82.3	85.6	54.3	77.5	55.3	83.5	69.4	21-Oct-2021
dsa_1050 ^[?]	73.9	87.4	82.0	72.9	60.7	51.8	80.7	76.8	90.1	54.0	78.7	60.0	89.1	83.5	83.3	81.4	49.7	75.7	64.2	85.2	70.5	18-Nov-2017
MA-SSD ^[?]	72.9	87.0	81.4	71.2	59.4	49.0	81.3	74.4	88.2	55.5	78.2	61.2	85.9	82.7	82.7	80.3	46.5	76.6	66.8	83.7	66.2	01-Aug-2018
DSOD v2 ^[?]	72.9	86.8	82.5	69.0	57.4	47.1	81.2	77.8	88.7	54.8	75.5	60.4	85.2	82.0	85.4	82.4	45.0	75.3	68.2	84.3	69.2	24-Jun-2018
GRP-DSOD320 ^[?]	72.5	87.2	82.0	67.1	57.3	46.1	81.1	78.0	88.6	54.0	75.1	58.7	84.5	82.6	85.4	82.3	45.8	75.9	67.1	84.3	66.7	19-Nov-2017
ssd ^[?]	72.2	86.9	80.1	68.9	57.2	47.4	81.0	73.2	89.1	53.8	75.5	61.5	86.4	81.9	84.2	79.1	46.1	75.7	66.6	84.1	65.1	01-Aug-2018
Origin_pretrain_40k ^[?]	71.9	89.2	76.3	73.7	61.5	66.3	81.3	83.1	86.1	49.3	63.4	47.7	84.3	75.9	79.3	84.2	52.5	79.3	55.0	81.6	68.4	22-Oct-2021
DSOD (single model) ^[?]	70.8	86.4	80.2	65.5	55.7	42.4	80.3	75.3	86.6	51.1	72.3	60.5	83.9	80.5	83.6	80.4	42.7	72.4	67.3	83.1	66.2	21-Jan-2018
Attention-SSD-vgg ^[?]	69.0	85.1	76.7	67.7	55.2	43.8	77.3	69.2	85.9	52.2	72.9	56.5	83.0	78.3	80.6	75.8	44.2	73.0	61.3	80.9	60.8	20-May-2018
SSD ^[?]	64.0	78.9	72.3	61.8	42.8	27.9	73.1	69.4	84.9	42.5	68.4	52.2	80.9	76.5	77.2	68.2	31.6	67.0	66.6	77.3	60.9	10-Jun-2017
DCONV_SSD_FCN ^[?]	62.8	77.9	70.6	62.9	46.5	28.6	69.7	63.1	83.6	42.1	66.6	52.3	79.6	72.8	77.2	67.7	33.0	66.0	60.2	78.1	57.9	17-Mar-2018
sd ^[?]	62.7	80.2	86.6	74.3	46.8	17.7	82.3	72.0	83.7	30.2	75.8	54.3	83.2	87.2	84.8	53.8	22.4	78.0	43.5	84.8	12.4	10-Apr-2024
THU_ML_class ^[?]	62.4	78.0	71.0	64.5	47.4	45.3	70.1	70.6	82.0	37.9	65.4	44.2	77.4	69.6	74.4	75.5	37.9	62.0	45.5	73.8	56.3	03-Jun-2017
yolo ^[?]	62.1	79.8	72.1	55.3	44.9	43.1	71.5	72.3	75.1	42.1	61.3	45.8	73.4	70.9	76.2	79.3	35.2	67.4	49.1	71.5	56.1	28-Sep-2019
yolo ^[?]	59.4	76.0	68.1	51.3	40.0	39.1	69.8	66.7	74.0	39.8	56.2	47.8	70.5	70.4	75.1	75.7	31.9	61.6	52.4	68.0	54.4	28-Sep-2019
YOLOv2-resnet-18-101 ^[?]	56.1	74.3	66.4	59.4	37.0	34.4	65.1	63.3	74.4	38.5	53.3	40.9	68.4	61.7	68.0	68.9	30.2	51.7	47.7	66.7	52.0	18-May-2022
YOLOv2 ^[?]	48.8	69.5	61.6	37.6	28.2	18.8	63.2	53.2	65.6	27.5	44.4	35.9	61.4	57.9	66.9	63.8	16.8	52.8	39.5	65.4	46.2	01-Dec-2016
DENSE_BOX ^[?]	45.9	64.7	64.1	28.8	26.7	30.7	60.6	54.9	47.4	29.3	41.8	34.6	42.6	59.3	64.2	62.5	24.3	53.7	27.1	50.9	50.7	07-Jul-2015
PITT_WSOD_INC2 ^[?]	45.1	74.2	49.8	56.0	32.5	22.0	55.1	49.8	73.4	20.4	47.8	32.0	39.7	48.0	62.6	8.6	23.7	52.1	52.5	42.9	59.1	14-Mar-2019
YOLOv1-resnet-18-50 ^[?]	44.5	64.3	54.2	47.4	26.8	16.6	55.4	44.3	66.5	23.1	38.1	38.5	62.9	57.6	60.8	45.0	15.2	33.3	43.9	60.0	37.2	13-May-2022
NoC ^[?]	42.2	62.8	60.4	26.7	22.3	25.7	56.9	55.2	52.1	21.5	38.3	34.2	43.9	51.2	58.8	40.7	20.4	42.0	37.4	52.6	41.6	26-Apr-2015
Data Decomposition and Distinctive Context ^[?]	40.9	55.0	58.1	22.5	18.8	33.9	57.6	54.5	42.6	20.2	40.3	29.3	37.1	54.6	58.3	51.6	14.7	44.8	32.1	51.7	41.0	13-Oct-2011
HybridCodingApe ^[?]	40.9	61.8	52.0	24.6	24.8	20.2	57.1	44.5	53.6	17.4	33.0	38.3	42.8	48.8	59.4	35.7	22.8	40.3	39.5	51.1	49.5	23-Sep-2012
segDPM ^[?]	40.7	59.1	54.3	28.2	24.4	34.5	53.4	48.1	51.3	18.1	37.8	29.9	40.4	48.9	52.9	46.4	16.1	39.5	35.4	50.8	44.9	24-Feb-2014
Fisher with FLAIR ^[?]	40.6	61.7	52.0	27.9	24.0	18.9	56.5	45.3	53.4	15.5	34.6	36.3	42.3	48.4	57.9	36.6	24.3	40.6	38.0	49.8	49.0	17-Jun-2014
NYU-UCLA_Hierarchy ^[?]	40.6	56.3	55.9	23.4	20.3	27.2	56.6	48.1	53.8	23.3	32.9	33.4	39.2	53.0	56.9	43.6	14.3	37.9	39.4	52.6	43.7	13-Oct-2011
DenseYolo ^[?]	39.4	60.2	48.7	26.1	18.0	18.1	54.3	47.6	50.0	23.1	37.2	28.9	43.1	47.3	56.3	56.0	11.9	41.8	28.5	50.1	41.1	15-May-2017
DPM-MKL ^[?]	39.1	59.6	54.5	21.9	21.6	32.1	52.5	49.3	40.8	19.1	35.2	28.9	37.2	50.9	49.9	46.1	15.6	39.3	35.6	48.9	42.8	23-Sep-2012
DPM-MK ^[?]	38.3	56.0	53.3	19.2	17.3	25.8	53.1	45.4	44.5	20.1	32.1	28.1	37.2	52.3	56.6	43.3	12.1	34.3	37.6	51.8	45.2	13-Oct-2011
NEC_STANFORD_OCP ^[?]	36.7	65.1	46.8	25.0	24.6	16.0	51.0	44.9	51.5	13.0	26.6	31.0	40.2	39.7	51.5	32.8	12.6	35.7	33.5	48.0	44.8	23-Sep-2012
Detector-Merging ^[?]	36.5	47.2	50.2	18.3	21.4	25.2	53.3	46.3	46.3	17.5	27.8	30.3	35.0	41.6	52.1	43.2	18.0	35.2	31.1	45.4	44.4	23-Sep-2012
MISSOURI_HOGLBP_MDPM_CONTEXT ^[?]	36.4	51.4	53.7	18.3	15.6	31.6	56.5	47.1	38.6	19.5	32.0	22.1	25.0	50.3	51.9	44.9	11.9	37.7	30.6	50.9	39.3	23-Sep-2012
NUS_Context_SVM ^[?]	36.2	51.4	52.9	20.1	15.8	26.9	53.0	45.6	37.6	15.3	36.0	25.1	32.6	50.4	55.8	36.8	12.3	37.6	30.5	48.1	41.0	05-Oct-2011
SelectiveSearchMonkey ^[?]	35.5	56.9	43.4	16.6	15.8	18.0	52.3	38.3	49.0	12.2	29.7	32.8	36.7	45.7	54.4	30.4	16.2	37.2	34.7	45.9	44.2	13-Oct-2011
CVC_DET ^[?]	34.1	45.4	49.8	15.7	16.0	26.3	54.6	44.8	35.1	16.8	31.3	23.6	26.0	45.6	49.6	42.2	14.5	30.5	28.5	45.7	40.0	23-Sep-2012
UOCTTI_LSVM_MDPM ^[?]	33.6	53.2	53.9	13.1	13.5	30.5	55.5	51.2	31.7	14.5	29.0	16.0	22.1	43.1	50.3	46.4	8.8	33.0	22.9	45.8	38.2	12-Oct-2011
TREE--MAX-POOLING ^[?]	32.9	43.8	51.7	13.7	12.7	27.3	51.5	43.7	32.9	18.3	27.3	18.5	23.1	45.2	48.6	42.9	11.6	32.4	27.5	47.0	39.3	13-Oct-2011
LCC-TREE-CODING ^[?]	32.4	41.1	51.7	13.7	11.9	27.3	52.1	41.7	32.9	17.6	27.3	18.5	23.1	45.2	48.6	41.9	11.6	32.4	27.5	44.2	38.3	13-Oct-2011
SVM-HOG ^[?]	31.5	47.5	51.7	14.2	12.6	27.3	51.8	44.2	25.3	17.8	30.2	18.1	16.9	46.9	50.9	43.0	9.5	31.2	23.6	44.3	22.1	22-Sep-2012
Configurable And-Or Tree Model ^[?]	29.5	50.2	47.0	7.9	3.8	24.8	47.2	42.8	31.2	17.5	24.2	10.0	21.3	43.5	46.4	37.5	7.9	26.4	21.5	43.1	36.7	23-Sep-2012
lSVM-Viewpoint ^[?]	20.9	42.5	43.7	5.4	4.8	18.1	28.6	36.6	24.2	12.6	20.6	4.5	17.5	15.2	38.2	7.9	1.7	23.2	7.1	41.0	25.7	13-Oct-2011
UOCTTI_WL-SSVM_GRAMMAR ^[?]	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	49.2	-	-	-	-	-	12-Oct-2011
CMIC-GS-DPM ^[?]	-	-	-	-	13.3	26.4	-	41.5	-	-	-	12.2	-	-	41.6	-	8.3	31.4	-	-	-	13-Oct-2011
Geometric shape ^[?]	-	-	3.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	19-Jun-2016
DETR ^[?]	-	-	-	-	17.7	-	-	-	-	-	-	-	-	-	-	-	-	-	7.1	-	-	28-Nov-2024
CMIC-Synthetic-DPM ^[?]	-	40.4	47.8	-	11.4	23.7	48.9	40.9	23.5	11.9	25.5	-	10.9	42.0	38.7	40.7	7.5	30.4	-	38.4	34.8	13-Oct-2011
Struct_Det_CRF ^[?]	-	37.1	42.6	2.0	-	16.0	43.8	38.6	17.0	10.3	7.7	2.4	1.5	34.3	41.1	38.4	1.5	14.7	5.3	35.4	27.1	13-Oct-2011

Abbreviations

Title	Method	Affiliation	Contributors	Description	Date
ASSD321	ASSD321	rutgers	jingru yi, pengxiang wu	input resolution: 321x321	2018-08-20 02:34:00
ASSD513	ASSD513	Rutgers	Jingru Yi, pengxiang wu	input resolution: 513x513	2018-08-18 12:26:28
ATLSSD	ATLSSD	ATL(Alibaba Turing Labs)	Xuan Jin	SSD-based method trained on VOC2012	2018-03-26 07:48:08
softmax with Attention on vgg for detection	Attention-SSD-vgg	CSUST	Jia	We select the box which boxes >0.5. we added the attention on the SSD model	2018-05-20 11:12:40
Augment_part1	Augment_part1	University of Information Technology VNU-HCM	Phan Tung Huynh Thi My Duyen	Augment_part1	2021-10-21 18:57:30
BOE_IOT_AIBD_method_improved	BOE_IOT_AIBD_method_improved	BOE_IOT_AIBD	Xu Jingtao	BOE_IOT_AIBD_method_improved	2019-11-27 03:29:33
Single-stage detector trained by step-SGDR.	COS-DET	ZUIYOU Inc	Tabsun, Ma Baoyuan, Li Yong, Li Xiaosong	I designed a new step-SGDR method which is the most important innovation and it boosts the mAP almost 0.6 compared with step-decay strategy. An important point is how to judge the overfit point. As for the backbone I used the darknet-53 while some common methods like distort/random crop/random flip/mix-up for the data augmentation. Also multi-scale testing and horizontal flip test really help. Some common methods like softNMS do not make sense in my experiments. On a single 1080Ti the model runs at almost 15fps.	2019-04-26 12:04:21
Color_HOG based detector with BOW classifier	CVC_DET	Computer Vision Center Barcelona	Fahad Khan, Camp Davesa, Joost van de Weijer, Rao Muhammad Anwer, Albert Gordo, Pep Gonfaus, Ramon Baldrich, Antonio Lopez	We use our Color-HOG based part detector [1]. The detection results are combined with our CVC_CLS submission. References: 1. Fahad shahbaz khan, Rao Muhammad Anwer, Joost van de Weijer, Andrew D. Bagdanov, Maria Vanrell, Antonio M. Lopez. Color Attributes for Object Detection. In CVPR 2012.	2012-09-23 18:53:20
Dynamic And-Or Tree Learning For Object Detection	Configurable And-Or Tree Model	Sun Yat-Sen University	Xiaolong Wang, Liang Lin, Lichao Huang, Xinhui Zhang, Zechao Yang	We propose a novel hierarchical model for object detection, namely "And-Or tree", which is a configurable by introducing the �switch� variables (i.e. the or-nodes) accounting for intra-class object variance. This model comprises three layers: a batch of leaf-nodes in bottom for localizing object parts; the or-nodes for activating several leaf-nodes to specify a composition of parts; a root-node verifying object holistic distortion. For model training , a novel discriminative learning algorithm is proposed to explicitly determine the structural configuration (e.g., the production of leaf-nodes associated with the or-nodes) along with the optimization of multi-layer parameters. The response of model integrates the bottom-up testings via the leaf-nodes and or-nodes with the global verification via the root-node. In the implementation, we apply the histograms of gradients(HOG) as the image feature. Object detection is achieved by scanning the sub-windows over different scales and locations of the image. The final decisions are further rescored by a context model encoding the inter-object spatial interactions.	2012-09-23 16:02:13
A Conical R-CNN for object detections	Conical R-CNN	Xidian University	Yang Li, Licheng Jiao, Xu Liu, Fang Liu, Fanhua Shang, GouLiang Ma	Conical R-CNN employs conical features for detection. The spatial information can be exploited effectively. This model is fine-tuned on the COCO detection model. We use multi-scale training.	2020-10-29 07:22:04
dssd style arch	DCONV_SSD_FCN	shanghai university	li junhao(jxlijunhao@163.com)	combine object detection and semantic segmentation in one forward pass	2018-03-17 02:58:20
DenseBoxCNN	DENSE_BOX	Baidu IDL	Lichao Huang	I train a VGG16-liked convolutional neural network to perform end-to-end object detection. This network can processes the full image and outputs multiple bounding boxes and class confidence score simultaneously. The training data used in this entry is VOC2012 trianval only.	2015-07-07 05:39:05
DETR original	DETR	Korea University	Jiye Jihyeong Hyuncheol	detr	2024-11-28 08:04:44
A Distinguishable Features Learning Network for On	DFL-Net	USTC	geroci@mail.ustc.edu.cn {wansh, jpq}@ustc.edu.cn	DFL-Net: One-Stage Anchor-Based Object Detection via Distinguishable Feature Learning	2020-06-22 08:29:46
YOLO V3 with dynamic constraint for objectness	DOLO	Tencent MIG YYB & USTC BDAA LAB	Chen Joya, Bin Luo, XueZheng Peng, Tong Xu	We present DOLO, which is based on a state-of-the-art object detection method YOLO V3. We have improved it by our dynamic constraint strategy. Furthermore, we use a simple SNIP (Scale Normalization for Image Pyramids) strategy in our training. While inference, our square weaken method are adopted for multi-scale and flip testing.	2018-09-21 10:34:36
The DPM-MKL baseline	DPM-MKL	Oxford	Ross Girshick, Andrea Vedaldi, Karen Simonyan	This method is similar to last year DPM-MKL entry. We updated several aspects of the implementation (e.g. th type of features).	2012-09-23 23:05:18
DSD	DSD	Cainiao	Duliang Haiwa	DSD	2018-07-19 14:43:34
DSOD	DSOD (single model)	Intel	Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong chen, Xiangyang Xue.	The training data is VOC 2012 trainval set without ImageNet pre-trained models or any other additional dataset. The input image size is 300x300. More details can be referred to our paper: "DSOD: Learning Deeply Supervised Object Detectors from Scratch".	2018-01-21 06:13:56
DSOD v2	DSOD v2	UIUC	Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen and Xiangyang Xue	Training from scratch without pre-trained models. The input size is 300x300.	2018-06-24 05:56:41
Yolo with dense grid and high level features	DenseYolo	University Politehnica Bucharest	Paul Urziceanu	N\A	2017-05-15 10:54:16
Detector_Weighting	Detector-Merging	University of Amsterdam	Sezer Karaoglu, Fahad Shahbaz Khan, Koen van de Sande, Jan van Gemert, Rao Muhammad Anwer, , Jasper Uijlings, Camp Davesa, Joost van de Weijer, Theo Gevers, Cees Snoek	We use a bounding box merging scheme that exploits the results from different independent detectors. Each detector results in a ranked list of BB, which is not directly comparable with other detectors. We merge the detectors with a weighting scheme based on hold-out performance. For input, we use the standard Felzenszwalb gray HOG detector [1] ; the color-HOG detector of CVC [2] which introduces color information within the part based detection framework; and a slightly improved version of the SelectiveSearch detector [3] by the UvA submitted to VOC 2011. [1] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detection with Discriminatively Trained Part Based Models. In TPAMI, Vol. 32, No. 9, Sep. 2010 [2] Fahad shahbaz khan, Rao Muhammad Anwer, Joost van de Weijer, Andrew D. Bagdanov, Maria Vanrell, Antonio M. Lopez. Color Attributes for Object Detection. In CVPR 2012. [3] Segmentation As Selective Search for Object Recognition Koen E. A. van de Sande, Jasper R. R. Uijlings, Theo Gevers, Arnold W. M. Smeulders. In ICCV, 2011	2012-09-23 22:51:19
Full convolution attention selectiv	FCASA-detection	DHAI	xiangming.zhou kai.fu guodong.wu	We propose a novel architecture of object detection. We use full convolution networks as the multistep rpn networks. This kind of architecture proposes rois base on the previous step. So it avoids the unbalanced between positive and negative samples. Meanwhile,this kind of architecture can improve the recall of detection,because the rois are filtered by multistep rpn networks,the remaining rois are more reliable.And we also use soft-nms for scoring our objects,and GIOU loss for the location loss.Our architecture can apply to any single-stage detector. By using the same backbone networks we trained yolov3,ssd with and without our architecture , it shows that by using our architecture will boost mAp almost 5% on PASCAL VOC data set..	2019-08-05 10:43:51
Detection Network Based on Function Maintenance	FMFPD	University of Chinese Academy of Sciences	Chengqi Xu	This module maintains the high-level strong semantic information more effectively, so that the lower level feature maps also have strong semantic features and the presentation ability of small object is also greatly enhanced. At the same time, the accuracy of detection is improved by using the two-stage features of the network to describe the objects.	2020-05-19 14:33:57
FPNSSD	FPNSSD	sogou.com	Kuang Liu	FPNSSD trained on VOC12	2018-03-29 10:38:04
Faster RCNN with ResNext	FastX-RCNN	Yi+AI Lab	Hang Zhang, Boyuan Sun, Zhaonan Wang, Hao Zhao, ZiXuan Guan, Wei Miao	Faster RCNN + RoIAlign + ResNeXt152 + SoftNMS + Multi-Scale Training + Multi-Scale Testing;	2018-07-06 04:04:00
Fisher with FLAIR	Fisher with FLAIR	University of Amsterdam	Koen van de Sande, Cees Snoek, Arnold Smeulders	Run for our CVPR2014 paper "Fisher and VLAD with FLAIR", see http://koen.me/research/flair	2014-06-17 11:47:29
Gated Recurrent Feature Pyramids	GRP-DSOD320	UIUC	Zhiqiang Shen, Honghui Shi, Rogerio Feris, Liangliang Cao, Shuicheng Yan, Ding Liu, Xinchao Wang, Xiangyang Xue, Thomas S. Huang	We train GRP-DSOD for object detection. The training data is VOC 2012 trainval set without ImageNet pre-trained models or any other additional dataset. The input image size is 320x320. More details can be referred to our paper: "Learning Object Detection from Scratch with Gated Recurrent Feature Pyramids".	2017-11-19 22:13:59
Diamond Frame Bicycle Recognition	Geometric shape	National Cheng Kung University	Chung-Ping Young, Yen-Bor Lin, Kuan-Yu Chen	Bicycle of diamond frame detector for side-view image is proposed based on the observation that a bicycle consists of two wheels in the form of ellipse shapes and a frame in the form of two triangles. Through the design of geometric constraints on the relationship between the triangles and ellipses, the computation is fast comparing to the feature-based classifiers. Besides, the training process is unnecessary and only single image is required for our algorithm. The experimental results are also given in this paper to show the practicability and the performance of the proposed bicycle model and bicycle detection algorithm.	2016-06-19 10:06:33
Hybrid Coding for Selective Search	HybridCodingApe	ksande@uva.nl	Koen E. A. van de Sande Jasper R. R. Uijlings Cees G. M. Snoek Arnold W. M. Smeulders	We have improved significantly over last years method from [1] with a hybrid bag-of-words using average and difference coding, a first in object detection. Briefly, the method of [1], instead of exhaustive search, which was dominant in the Pascal VOC 2010 and 2011 detection challenge, uses segmentation as a sampling strategy for selective search (cf. the ICCV paper). We use a small set of data-driven, class-independent, high quality object locations (coverage of 96-99% of all objects in the VOC2007 test set). Because we have only a limited number of locations to evaluate, this enables the use of more computationally expensive features, such as bag-of-words using average and difference coding strategies. While difference coding is an order of magnitude more expensive than average, we are still able to efficiently train a detection system for it due to several optimizations in the descriptor coding and the kernel classification runtime. As low-level features, we use new complementary color descriptors. Finally, the detection system is fused with classification scores found using most telling example selection from [2]. [1] "Segmentation as Selective Search for Object Recognition"; Koen E. A. van de Sande, Jasper R. R. Uijlings, Theo Gevers, Arnold W. M. Smeulders; 13th International Conference on Computer Vision, 2011. [2] "The Most Telling Window for Image Classification"; Jasper R. R. Uijlings, Koen E. A. van de Sande, Arnold W. M. Smeulders, Theo Gevers, Nicu Sebe, Cees G. M. Snoek; PASCAL VOC Challenge Workshop 2011 at ICCV, 2011.	2012-09-23 21:01:35
Improved yolo-v3	Improved yolo-v3	horizon	xianfeng tan	Improved yolo-v3	2019-11-15 10:30:19
MA-SSD	MA-SSD	MA-SSD	MA-SSD	MA-SSD	2018-08-01 09:02:09
HOGLBP with Mixture DPM and Context	MISSOURI_HOGLBP_MDPM_CONTEXT	The University of Missouri-Columbia	Guang Chen, Miao Sun, Xutao Lv, Yan Li, Tony X. Han	HOG-LBP features [1] are incorporated in the deformable part model [2]. Deformable model is further improved by using the learned multiple anchor positions so that the possible locations for each part are modeled as a mixture of Gaussian distribution. For part and root filters, PCA is adopted to denoise and accelerate the detection speed. We proposed a permutation matrix method to add the model symmetry constraints during the feature selection, which effectively takes advantage of the symmetry property existing in most of the object categories and avoids the overfitting. Contextual information including image class label estimation, segmentation estimation, color histogram of ROI, and objects location priors, and correlations between the object detectors are used to leverage the final detection results to a very large extent: there are lots of contextual information and correlational information among objects that can be used to boost the detection performance. For example, trains and buses are objects bearing some visual similarities. But none of the large objects can coexist in the same location. So detection scores are correlated and we use the inference on Bayesian networks to further improve the detection results. [1] Xiaoyu Wang, Tony X. Han and Shuicheng Yan, �An HOG-LBP Human Detector with Partial Occlusion Handling,� IEEE International Conference on Computer ICCV 2009), Kyoto, 2009. [2] Girshick, R. B. and Felzenszwalb, P. F. and McAllester, D. : Discriminatively Trained Deformable Part Models, Release 5	2012-09-23 21:27:16
Resnet-101-FPN	Model_ori_1	UIT	Phan Tung Hu?nh Th? M? Duy�n	Resnet-101-FPN	2021-10-28 10:07:36
Using NAS Enhance Yolo	NAS Yolo	PA-Occam-Platform	Jian Yang, Zhenhou Hong, Xiaoyang Qu, Jianzong Wang, Jing Xiao	NAS-YoLo is an objection detection model that introduces automatic data augmentation and neural architecture search(NAS) into a state-of-the-art YoLo model. The automatic data augmentation uses a reinforcement learning-based controller to find the best augmentation policies for the target data-set. The neural architecture search algorithm is developed from a one-shot NAS method with a parallel divide-and-conquer based evolutionary algorithm. Besides, an SMBO-based auto-tuning algorithm is used to yield better hyper-parameter combinations for the NAS-YoLo.	2020-05-09 08:00:13
Object-centric pooling	NEC_STANFORD_OCP	NEC Laboratories America and Stanford University	Olga Russakovsky Xiaoyu Wang Shenghuo Zhu Li Fei-Fei Yuanqing Lin	Object-centric pooling (OCP) is a method which represents a bounding box by pooling the coded low-level descriptors on the foreground and background separately and then concatenating them (Russakovsky et al. ECCV 2012). This method exploits powerful classification features that have been developed in the past years. In this system, we used DHOG and LBP as low-level descriptors. We developed a discriminative LCC coding scheme in addition to traditional LCC coding. We make use of candidate bounding boxes (van de Sande et al. ICCV 2011).	2012-09-23 22:47:43
Networks on Convolutional Feature Maps	NoC	Microsoft Research	Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun	This entry is an implementation of the system described in �Object Detection Networks on Convolutional Feature Maps� (http://arxiv.org/abs/1504.06066). This model is trained on HoG feature only. Training data for this entry is voc 2012 trainval set. Selective Search is used for proposal.	2015-04-26 09:47:29
Origin_pretrain_40k	Origin_pretrain_40k	University of Information Technology VNU-HCM	Phan Tung Huynh Thi My Duyen	Origin_pretrain_40k	2021-10-22 09:14:20
Weakly supervised detection using inception-v2	PITT_WSOD_INC2	University of Pittsburgh	Keren Ye, Mingda Zhang, Wei Li, Danfeng Qin, Adriana Kovashka, Jesse Berent	Weakly supervised detection using inception-v2	2019-03-14 05:19:35
Region transformer pyramid network	RTPnet	Xidian University	Li Yang	RTPNet contains positional embedding units (PEU), self region transformers (Self RT) and down region transformers (Down RT). We adopt multi-scales training strategy. Specifically, we first randomly sample a scale from 600 to 1000 with step 100, and then the shorter edge of an input image is resized to the sampled scale. We constrain the longer edge of the resized image within 1666.	2022-02-23 08:06:29
RockDetector-1	RockDetector-1	RocKontrol	Chen Li, Hui Wan	RockDetector-1-based method trained on VOC2012	2019-11-08 14:56:51
SSD	SSD	THU	SSD	SSD	2017-06-10 04:47:11
SSOD_07_12_unlabel_07_12	SSOD_07_12_unlabel_07_12	HW Ascend	xuqiang	SSOD_07_12_unlabel_07_12	2021-04-22 13:12:55
SSOD_100	SSOD_100	HW Ascend	xuqiang	07_100_12_100	2021-04-12 12:18:51
SSOD_25	SSOD_25	HW Ascend	xuqiang	07_25_12_25	2021-04-12 12:21:26
SSOD_25_real	SSOD_25_real	HW Ascend	xuqiang	voc_07_25_12_25_real	2021-04-22 13:11:55
SVM classifier using HOG?V2?	SVM-HOG	Orange Labs Beijing, France Telecom	Zhao Feng	Our object detection system is based on the Discriminatively Trained Deformable Part Models, Release 5. It is our first attempt for VOC challange. We do not make much modifications to the baseline system provided in http://people.cs.uchicago.edu/~rbg/latent/. The submitted results are obtained by applying post-processings of both bounding box prediction and contextual rescoring.	2012-09-22 20:06:39
Stronger-yolo	Stronger-yolo	central south university	Zhihong Xiao	Improve yolov3 with focal loss?KL loss?mix up?anchor-free and so on.	2019-06-12 07:08:06
resnet101+softmax	TCnet	Tsinghua University	Yulin Liu	This is a model based on mask rcnn	2018-03-29 12:02:15
TCnet	TCnet	Tsinghua University	Liu Yulin	TCnet	2018-05-02 08:02:45
faster rcnn	THU_ML_class	Tsinghua University	training	faster rcnn	2017-06-03 10:55:37
YOLOv1-resnet-18-50	YOLOv1-resnet-18-50	personal	Haoyun Qin	reimplementation of yolo v1 with tricks applied. switched backbone to resnet18-cmp3 and resnet50-cmp4.	2022-05-13 12:24:19
YOLOv2	YOLOv2	University of Washington	Joe Redmon, Ali Farhadi	YOLOv2 runs a single detection network once on an image to detect objects. It predicts bounding boxes and objectness as well as class probabilities across a convolutional feature map. For more information see: http://pjreddie.com/darknet/yolo/	2016-12-01 21:15:21
YOLOv2-resnet-18-101	YOLOv2-resnet-18-101	personal	Haoyun Qin	reimplementation of yolo v2 using pytorch and resnet	2022-05-18 10:34:21
dsa_tes	dsa_1050	Nanjing University	AD	add cs	2017-11-18 11:34:21
refine_denseSSD	refine_denseSSD	BUPT	Yongqiang Yao	refine_denseSSD	2018-05-14 02:23:40
sd410	sd	NorthWest University	sd	sd	2024-04-10 07:45:49
ssd	ssd	ssd	ssd	ssd	2018-08-01 09:31:10
yolo-all	yolo	shou	hfq0219	yolo3	2019-09-28 04:14:52
yolo-all	yolo	shou	hfq	yolo3-608	2019-09-28 05:08:50
Synthetic Trainining for deformable parts model	CMIC-GS-DPM	Cairo Microsoft Innovation Center	Dr. Motaz El-Saban , Osama Khalil, Mostafa Izz, Mohamed Fathi	We introduce dataset augmentation using synthetic examples as a method for introducing novel variations not present in the original set. We make use of deformable parts-based model (Felzenszwalb et al 2010). We augment the training set with examples obtained by applying global scaling of the dataset examples. Global scaling includes no, up and down scaling with varying performance across different object classes. Technique selection is based upon performance on the validation set. The augmented dataset is then used to train parts-based detectors using HOG features (Dalal & Triggs 2006) and latent SVM. The resulting class models are applied on test images in a �sliding-window� fashion.	2011-10-13 22:01:23
Synthetic Trainining for deformable parts model	CMIC-Synthetic-DPM	Cairo Microsoft Innovation Center	Dr. Motaz El-Saban , Osama Khalil, Mostafa Izz, Mohamed Fathi	We introduce dataset augmentation using synthetic examples as a method for introducing novel variations not present in the original set. We make use of deformable parts-based model (Felzenszwalb et al 2010). We augment the training set with examples obtained by relocating objects (having segmentation masks) to new backgrounds. New backgrounds used for relocation are selected using a set of techniques (no relocation, same image, �different� image or image with co-occurring objects). Performance of those techniques varies across classes according to the object class properties. For every class, we select the technique that achieves the highest AP on the validation set. The augmented dataset is then used to train parts-based detectors using HOG features (Dalal & Triggs 2006) and latent SVM. The resulting class models are applied on test images in a �sliding-window� fashion.	2011-10-13 21:54:09
DPM with basic rescoring	DPM-MK	Oxford VGG	Andrea Vedaldi and Andrew Zisserman	This method uses a Deformable Part Model (our own implementation) to generate an initial (and very good) list of 100 candidate bounding boxes per image. These are then rescored by a multiple features model combining DPM scores with dense SP-BOW, geometry, and context. The SP-BOW model are dense SIFT features (vl_phow in VLFeat) quantized into 1200 visual words, 6x6 spatial layout, cell-by-cell l2 normalization after raising the entries to the 1/4 power (1/4-homogeneous Hellinger's kernel). The geometric model is a second order polynomial kernel on the bounding box coordinates. The context model is a second order polynomial kernels mixing the candidate DPM score with twenty scores obtained as the maximum response of the DPMs for the 20 classes in that image (like Felzenszwalb). A second context model is also added, using 20 scores from a state-of-the-art Fisher kernel image classifier (also on dense SIFT features), as described in Chatfileld et al. 2010. The SVM scores are passed through a sigmoid for standardization in the 0-1 interval; the sigmoid model is fitted to the truing data. The model is trained by means of a large scale linear SVM using the one-slack bundle formulation (aka SVM^perf). The solver hence uses retraining implicitly, and we make sure it reaches full convergence.	2011-10-13 10:20:29
NLPR-Detection	Data Decomposition and Distinctive Context	Institute of Automation, Chinese Academy of Sciences	Junge Zhang, Yinan Yu, Yongzhen Huang, Chong Wang, Weiqiang Ren, Jinchen Wu, Kaiqi Huang and Tieniu Tan	Part based model has achieved great success in recent years. To our understanding, the original deformable part based model has several limits: 1) the computational complexity is very large, especially when it is extended to enhanced models via multiple features, more mixtures or flexible part models. 2) The original part based model is not �deformable� enough. To tackle these problems, 1) we propose a data decomposition based feature representation scheme for part based model in an unsupervised manner. The submitted method takes about 1~2 seconds per image from PASCAL VOC datasets on average while keeping high performance. We learn the basis from samples without any label information. The specific label independent rule followed in the submitted methods can be adapted into other variants of part based model such as hierarchical model or flexible mixture models. 2) We found that, each part corresponds to multiple possible locations, which is not reflected in the original part-based model. Accordingly, we propose that the locations of parts should obey the multiple Gaussian distribution. Thus, for each part we learn its optimal locations by clustering which are used to update the original anchors of the part-based model. The proposed method above can more effectively describe the deformation (pose and location variety) of objects� parts. 3) We rescored the initial results by our distinctive context model including global and local and intra-class context information. Besides, segmentation provides strong indication for object�s presence, therefore, the proposed segmentation aware semantic attribute is applied in the final reasoning which indeed shows promising performance.	2011-10-13 16:20:59
SVM classifier with LCC and tree coding	LCC-TREE-CODING	University of Missouri	Xiaoyu Wang Miao Sun Xutao Lv Shuai Tang Guang Chen Yan Li Tony X. Han	A two layers cascade structure for object detection. The first layer employs deformable model to select possible candidates for the second layer. The later layer takes location and global context augmented with LBP feature to improve the accuracy. A bag of words model enhanced with spatial pyramid and local coordilate coding is used to model the global context information. A hierachical tree structure coding is used to take care of the intra-class variation for each detection window. Linear SVM is used for classification.	2011-10-13 17:13:43
Context-SVM based submission for 3 tasks	NUS_Context_SVM	National University of Singapore	Zheng Song, Qiang Chen, Shuicheng Yan	Classification uses the BoW framework. Dense-SIFT, HOG^2, LBP and color moment features are extracted. We then use VQ and fisher vector for feature coding and SPM and Generalized Pyramid Matching(GPM) to generate image representations. Context-aware features are also extracted based on [1]. The classification models are learnt via kernel SVM. Then final classification scores are refined with kernel mapping[2]. Detection and segmentation results use the baseline of [3] using HOG and LBP feature. And then based on [1], we further learn context model and refine the detection results. The final segmentation result uses the learnt average masks for each detection component learnt using segmentation training set to substitute the rectangle detection boxes. [1] Zheng Song, Qiang Chen, Zhongyang Huang, Yang Hua, and Shuicheng Yan. Contextualizing Object Detection and Classification. [2] http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/workshop/nuspsl.pdf [3] http://people.cs.uchicago.edu/~pff/latent/	2011-10-05 09:01:23
Latent Hierarchical Learning	NYU-UCLA_Hierarchy	NYU and UCLA	Yuanhao Chen, Li Wan, Long Zhu, Rob Fergus, Alan Yuille	Based on two recent publications: "Latent Hierarchical Structural Learning for Object Detection". Long Zhu, Yuanhao Chen, Alan Yuille, William Freeman. CVPR 2010. "Active Mask Hierarchies for Object Detection". Yuanhao Chen, Long Zhu, Alan Yuille. ECCV 2010 We present a latent hierarchical structural learning method for object detection. An object is represented by a mixture of hierarchical tree models where the nodes represent object parts. The nodes can move spatially to allow both local and global shape deformations. The image features are histograms of words (HOWs) and oriented gradients (HOGs) which enable rich appearance representation of both structured (eg, cat face) and textured (eg,cat body) image regions. Learning the hierarchical model is a latent SVM problem which can be solved by the incremental concave-convex procedure (iCCCP). Object detection is performed by scanning sub-windows using dynamic programming. The detections are rescored by a context model which encodes the correlations of 20 object classes by using both object detection and image classification.	2011-10-13 22:21:11
Selective Search Detection System	SelectiveSearchMonkey	University of Amsterdam and University of Trento	Jasper R. R. Uijlings Koen E. A. van de Sande Arnold W. M. Smeulders Theo Gevers Nicu Sebe Cees Snoek	Based on "Segmentation as Selective Search for Object Recognition"; Koen E. A. van de Sande, Jasper R. R. Uijlings, Theo Gevers, Arnold W. M. Smeulders; 13th International Conference on Computer Vision, 2011. Instead of exhaustive search, which was dominant in the Pascal VOC 2010 detection challenge, we use segmentation as a sampling strategy for selective search (cf. our ICCV paper). Like segmentation, we use the image structure to guide our sampling process. However, unlike segmentation, we propose to generate many approximate locations over few and precise object delineations, as the goal is to cover all object locations. Our sampling is diversified to deal with as many image conditions as possible. Specifically, we use a variety of hierarchical region grouping strategies by varying colour spaces and grouping criteria. This results in a small set of data-driven, class-indepent, high quality object locations (coverage of 96-99% of all objects in the VOC2007 test set). Because we have only a limited number of locations to evaluate, this enables the use of the more computationally expensive bag-of-words framework for classification. Our bag-of-words implementation uses densely sampled SIFT and ColorSIFT descriptors.	2011-10-13 20:45:25
Structured Detection and Segmentation CRF	Struct_Det_CRF	Oxford Brookes University	Jonathan Warrell, Vibhav Vineet, Paul Sturgess, Philip Torr	We form a hierarchical CRF which jointly models a pool of candidate detections and the multiclass pixel segmentation of an image. Attractive and repulsive pairwise terms are allowed between detection nodes (cf Desai et al, ICCV 2009), which are integrated into a Pn-Potts based hierarchical segmentation energy (cf Ladicky et al, ECCV 2010). A cutting-plane algorithm is used to train the model, using approximate MAP inference. We form a joint loss which combines segmentation and detection components (i.e. paying a penalty both for each pixel incorrectly labelled, and each false detection node which is active in a solution), and use different weightings of this loss to train the model to perform detection and segmentation. The segmentation results thus make use of the bounding box annotations. The candidate detections are generated using the Felzenschwalb et al. CVPR 2008/2010 detector, and as features for segmentation we use textons, SIFT, LBPs and the detection response surfaces themselves.	2011-10-13 03:27:02
SVM classifier with tree max-pooling	TREE--MAX-POOLING	University of Missouri	Xiaoyu Wang, Miao Sun, Xutao Lv, Shuai Tang, Guang Chen, Yan Li ,Tony X. Han	A two layers cascade structure for object detection. The first layer employs deformable model to select possible candidates for the second layer. The later layer takes location and global context augmented with LBP feature to improve the accuracy. A bag of words model enhanced with spatial pyramid and local coordilate coding is used to model the global context information. A hierachical tree structure coding is used to take care of the intra-class variation for each detection window. Max-pooling is used for tree node assignment. Linear SVM is used for classification.	2011-10-13 20:50:30
LSVM trained mixtures of deformable part models	UOCTTI_LSVM_MDPM	University of Chicago	Ross Girshick (University of Chicago), Pedro Felzenszwalb (Brown), David McAllester (TTI-Chicago)	Based on [1] http://people.cs.uchicago.edu/~pff/latent-release4 and [2] "Object Detection with Discriminatively Trained Part Based Models"; Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan; IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, September 2010. This entry is a minor modification of our publicly available "voc-release4" object detection system [1]. The system uses latent SVM to train mixtures of deformable part models using HOG features [2]. Final detections are refined using a context rescoring mechanism [2]. We extended [1] to detect smaller objects by adding an extra high-resolution octave to the HOG feature pyramid. The HOG features in this extra octave are computed using 2x2 pixel cells. Additional bias parameters are learned to help calibrate scores from detections in the extra octave with the scores of detections above this octave. This entry is the same as UOCTTI_LSVM_MDPM from the 2010 competition. Detection results are reported for all 20 object classes to provide a baseline for the 2011 competition.	2011-10-12 16:09:55
Person grammar model trained with WL-SSVM	UOCTTI_WL-SSVM_GRAMMAR	University of Chicago	Ross Girshick (University of Chicago), Pedro Felzenszwalb (Brown), David McAllester (TTI-Chicago)	This entry is described in [1] "Object Detection with Grammar Models"; Ross B. Girshick, Pedro F. Felzenszwalb, David McAllester. Neural Information Processing Systems 2011 (to appear). We define a grammar model for detecting people and train the model�s parameters from bounding box annotations using a formalism that we call weak-label structural SVM (WL-SSVM). The person grammar uses a set of productions that represent varying degrees of visibility/occlusion. Object parts, such as the head and shoulder, are shared across all interpretations of object visibility. Each part is represented by a deformable mixture model that includes deformable subparts. An "occluder" part (itself a deformable mixture of parts) is used to capture the nontrivial appearance of the stuff that typically occludes people from below. We further refine detections using the context rescoring mechanism from the UOCTTI_LSVM_MDPM entry, using the results of that entry for the 19 non-person classes.	2011-10-12 16:13:33
Using viewpoint cues to improve object recognition	lSVM-Viewpoint	Cornell	Joshua Schwartz Noah Snavely Daniel Huttenlocher	Our system is based on the Latent SVM framework of [1], including their context rescoring method. We train 6 component models with 8 parts. However, unlike [1], components are trained using a clustering based on an unsupervised estimation of 3D object viewpoint. In this sense, our approach is similar to the unsupervised approach in [2], which also seeks to estimate viewpoint, but our clustering is based on explicit reasoning about 3D geometry. Additionally, we add features based on estimated 3D scene geometry for context rescoring. Of note is the fact that a detection with our method gives rise to an explicit estimation of object viewpoint within a scene, rather than just a bounding box. [1] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. PAMI 2010 [2] C. Gu and X. Ren. Discriminative Mixture-of-Templates for Viewpoint Classification. ECCV 2010	2011-10-13 02:33:13
DPM that uses region segmentation features	segDPM	UofT, TTI-C, UCLA	Sanja Fidler, Roozbeh Mottaghi, Allan Yuille, Raquel Urtasun	DPM-style model that exploits bottom-up segmentation. We use CPMC to extract regions and CPMC-o2p to classify them. The output of the CPMC-o2p is then used as segmentation in our model. We propose a new model that blends between DPM (HOG appearance model) and segmentation. The model encourages each detection to fit tightly around a region. If there is no region, the detector will just go with the typical HOG score. In addition, we use context re-scoring based on object presence classifiers provided by NUS. Project page: http://www.cs.toronto.edu/~fidler/projects/segDPM.html	2014-02-24 20:22:19

PASCAL VOC Challenge performance evaluation and download server

Detection Results: VOC2012 BETA

Competition "comp3" (train on VOC2012 data)

Average Precision (AP %)

Abbreviations

Detection Results: VOC2012 ^BETA