Detection Results: VOC2012 BETA

Competition "comp4" (train on own data)

This leaderboard shows only those submissions that have been marked as public, and so the displayed rankings should not be considered as definitive.

Average Precision (AP %)

  mean

aero
plane
bicycle

bird

boat

bottle

bus

car

cat

chair

cow

dining
table
dog

horse

motor
bike
person

potted
plant
sheep

sofa

train

tv/
monitor
submission
date
ATLDETv2 [?] 92.997.496.394.289.089.095.595.798.084.796.482.197.497.696.696.179.496.287.096.292.526-Oct-2019
AInnoDetection [?] 92.396.695.394.487.387.594.494.198.482.696.582.997.996.996.295.079.895.786.996.591.001-Jul-2019
AccurateDET (ensemble) [?] 92.397.095.292.688.788.492.995.296.985.594.483.496.496.596.096.582.095.286.695.191.318-Jun-2019
AccurateDET [?] 91.396.695.191.587.287.092.294.096.583.494.180.096.196.495.895.779.795.185.194.690.117-Jun-2019
tencent_retail_ft:DET [?] 91.296.194.992.785.888.493.594.997.180.094.878.896.796.496.095.979.095.983.195.088.521-Jan-2019
Sogou_MM_GCFE_RCNN(ensemble model) [?] 91.195.994.693.386.287.193.295.197.181.194.477.196.596.695.895.477.995.484.195.089.525-Sep-2018
Sogou_MM_GCFE_RCNN(single model) [?] 91.095.994.193.386.287.093.195.197.181.194.477.196.596.695.895.477.995.483.494.989.525-Sep-2018
FXRCNN (single model) [?] 90.796.495.192.084.387.192.894.497.480.793.576.096.796.795.695.578.394.683.395.488.013-Jul-2018
ATLDET [?] 90.796.094.991.885.287.693.094.597.580.793.875.696.696.295.895.578.395.282.594.889.213-Aug-2018
PACITYAIDetection [?] 89.895.393.691.185.483.991.693.396.880.195.574.396.395.794.494.777.594.182.794.286.426-Sep-2019
Ali_DCN_SSD_ENSEMBLE [?] 89.295.493.791.882.881.792.493.497.675.794.174.296.495.194.293.372.594.182.894.687.728-May-2018
CM-CV&AR: DET [?] 89.195.794.492.081.182.993.890.097.174.695.470.496.796.295.393.473.794.881.196.088.220-Aug-2019
VIM_SSD(COCO+07++12, single model, one-stage) [?] 89.096.093.090.383.480.691.994.496.277.593.375.195.295.194.293.672.093.682.794.586.627-Jun-2018
FOCAL_DRFCN(VOC+COCO, single model) [?] 88.895.093.391.882.981.991.693.097.176.792.571.796.294.994.293.775.393.380.094.785.401-Mar-2018
R4D_faster_rcnn [?] 88.694.692.391.382.379.491.891.897.476.693.675.397.094.693.592.675.192.080.994.486.520-Nov-2016
R-FCN, ResNet Ensemble(VOC+COCO) [?] 88.494.892.990.682.481.889.991.797.176.093.471.996.694.393.992.875.791.980.893.686.409-Oct-2016
FF_CSSD(VOC+COCO, one-stage, single model) [?] 88.495.493.590.882.878.490.491.896.975.192.774.295.795.194.293.071.693.981.994.186.728-May-2018
CU-SuperDet [?] 88.194.894.191.080.381.392.588.596.173.294.869.095.595.395.192.272.894.180.194.987.416-Jan-2020
HIK_FRCN [?] 87.995.093.291.380.377.790.689.997.872.893.770.797.295.494.091.872.792.881.194.186.219-Sep-2016
PFPNet512_ECCV [?] 87.894.692.488.782.779.190.593.296.274.992.873.194.293.593.692.770.793.080.193.886.722-Mar-2018
VIM_SSD [?] 87.695.392.088.781.678.591.493.295.774.991.673.594.293.093.293.070.593.079.194.385.011-May-2018
Deformable R-FCN, ResNet-101 (VOC+COCO) [?] 87.194.091.788.579.478.089.790.896.974.293.171.395.994.893.292.571.791.878.393.283.323-Mar-2017
FasterRcnn-ResNeXt101(COCO+07++12, single model) [?] 86.893.993.488.380.272.689.489.396.873.091.572.395.494.593.891.770.790.681.292.683.904-May-2017
RefineDet (VOC+COCO,single model,VGG16,one-stage) [?] 86.894.791.588.880.477.690.492.395.672.591.669.993.993.592.492.668.892.478.593.685.216-Mar-2018
AngDet [?] 86.394.492.188.478.471.789.290.495.974.691.772.994.794.093.691.266.491.081.893.180.721-Oct-2018
AngDet [?] 85.593.991.688.076.670.788.589.995.672.292.372.194.993.693.190.765.490.578.391.980.204-Oct-2018
PSSNet(VOC+COCO) [?] 85.592.491.485.978.675.888.089.895.272.487.872.294.092.793.292.370.788.876.192.181.230-Mar-2018
MLANet [?] 85.393.891.286.878.371.688.889.994.672.291.271.692.392.490.590.966.092.276.991.382.426-Mar-2020
SHS_Faster_RCNN_Upgrade_v2 [?] 85.394.290.285.477.173.489.389.394.371.787.173.193.091.492.792.369.587.779.892.481.725-Feb-2019
ESNet [?] 85.293.991.986.175.870.489.589.194.271.791.671.092.992.192.990.863.691.179.491.384.526-Feb-2019
R-FCN, ResNet (VOC+COCO) [?] 85.092.389.986.774.775.286.789.095.870.290.466.595.093.292.191.171.089.776.092.083.409-Oct-2016
MONet(VOC+COCO) [?] 84.392.490.584.775.471.687.288.994.670.586.971.092.391.890.891.769.889.175.191.379.601-Apr-2018
FSSD512 [?] 84.292.890.086.275.967.788.989.095.068.890.968.792.892.191.490.263.190.176.991.582.707-Nov-2017
PVANet+ [?] 84.293.589.884.175.669.788.287.993.470.087.775.392.990.590.990.267.386.480.392.078.826-Oct-2016
PFPNet512 VGG16 07++12+COCO [?] 83.893.089.985.175.866.488.488.394.067.989.569.792.091.891.688.761.189.178.490.584.318-Oct-2017
BlitzNet512 [?] 83.893.189.484.775.565.086.687.494.569.988.871.792.591.691.188.961.290.479.291.883.019-Jul-2017
Faster RCNN, ResNet (VOC+COCO) [?] 83.892.188.484.875.971.486.387.894.266.889.469.293.991.990.989.667.988.276.890.380.010-Dec-2015
DES512_COCO [?] 83.792.690.083.774.566.388.588.694.570.287.471.592.291.292.389.060.289.579.690.182.609-Mar-2018
PVANet+ (compressed) [?] 83.792.888.983.474.768.788.287.893.569.587.374.393.189.589.990.266.886.479.891.978.218-Nov-2016
Cascaded_CrystalNet [?] 83.692.689.583.574.769.787.587.692.970.086.975.091.689.590.690.267.285.280.091.476.923-Dec-2017
ESNet [?] 83.593.290.084.473.870.288.187.793.968.288.869.591.591.390.889.563.689.274.389.981.823-Feb-2019
DOH_512 (single VGG16, COCO+VOC07++12) [?] 83.493.089.884.574.363.289.388.294.268.088.069.192.391.490.289.062.689.276.790.883.207-Nov-2017
innovisgroup Faster R-CNN [?] 83.293.187.083.374.170.187.988.592.368.186.372.590.489.390.989.966.787.476.591.279.222-May-2018
ICT_360_ISD [?] 82.690.789.487.075.870.186.086.596.265.386.862.194.690.690.589.763.587.372.790.777.118-Nov-2016
Rank of experts (VOC07++12) [?] 82.290.487.485.372.970.884.587.295.664.687.165.494.389.789.589.266.085.172.589.676.615-Nov-2017
SSD512 VGG16 07++12+COCO [?] 82.291.488.682.671.463.187.488.193.966.986.666.392.091.790.888.560.987.075.490.280.410-Oct-2016
R-DAD (VOC07++12) [?] 82.090.288.185.373.371.484.587.494.665.186.864.094.189.789.289.364.583.572.289.577.606-Mar-2018
FSSD300 [?] 82.092.289.281.872.359.787.484.493.566.887.770.492.190.989.687.756.986.879.090.781.310-Nov-2017
RUN_3WAY_300, VGG16, 07++12+COCO [?] 81.791.588.680.371.259.686.484.294.166.686.570.492.190.589.687.557.786.779.690.480.213-Oct-2017
YOLOv2 (VOC + COCO) [?] 81.590.088.682.271.765.585.584.292.967.287.670.091.290.590.088.662.583.870.788.879.421-Oct-2017
Light R-CNN [?] 81.190.488.783.171.764.184.584.294.963.885.065.894.088.088.988.362.785.073.189.475.806-Feb-2020
SSD based method [?] 81.091.887.582.571.265.685.486.292.864.085.964.791.689.088.787.959.287.573.588.876.824-Oct-2018
ESNet [?] 81.091.487.481.570.760.686.686.092.865.586.568.991.188.689.387.460.786.373.688.077.108-Feb-2019
Light R-CNN [?] 80.589.387.582.871.262.484.583.894.464.184.567.092.588.087.087.062.383.273.988.276.915-Jan-2020
DenseSSD-512 07++12 [?] 80.591.087.482.168.861.084.784.992.963.585.668.290.889.189.186.656.886.174.888.777.605-Dec-2017
BlitzNet300 [?] 80.291.086.580.070.154.784.484.192.565.183.569.291.288.188.585.755.885.479.389.878.219-Jul-2017
OHEM+FRCN, VGG16, VOC+COCO [?] 80.190.187.479.965.866.386.185.092.962.483.469.590.688.988.983.659.082.074.788.277.318-Apr-2016
Light R-CNN [?] 80.089.287.580.971.163.582.983.592.963.382.867.192.687.087.586.262.482.773.387.475.807-Jan-2020
DSSD513_ResNet101_07++12 [?] 80.092.186.680.368.758.284.385.094.663.385.965.693.088.587.886.457.485.273.487.876.815-Feb-2017
RUN_3WAY_512, VGG16, 07++12 [?] 79.890.087.380.267.462.484.985.692.961.884.966.290.989.188.086.555.485.072.687.776.822-Oct-2017
SSD300 VGG16 07++12+COCO [?] 79.391.086.078.165.055.484.984.093.462.183.667.391.388.988.685.654.783.877.388.376.503-Oct-2016
DSOD300+ [?] 79.390.587.477.567.457.784.783.692.664.881.366.490.187.888.187.357.980.375.688.176.716-Mar-2017
BlitzNet [?] 79.090.085.380.467.253.682.983.693.862.684.065.991.686.687.784.656.884.774.088.075.817-Mar-2017
Res101+hyper+FasterRCNN(COCO+0712trainval) [?] 78.988.985.379.968.463.884.183.991.062.083.264.388.887.685.987.160.880.770.588.073.010-Feb-2017
EGCI-Net [?] 78.589.686.875.664.553.785.382.692.863.583.067.590.187.087.985.156.779.575.787.075.226-Feb-2019
SSD512 VGG16 07++12 [?] 78.590.085.377.764.358.585.184.392.661.383.465.189.988.588.285.554.482.470.787.175.613-Oct-2016
DCFF-Net [?] 77.688.987.074.163.352.483.682.291.461.381.865.590.886.587.484.953.980.475.286.475.229-Jun-2018
HFM_VGG16 [?] 77.588.885.176.864.861.485.084.190.059.982.661.988.585.285.686.956.779.567.585.473.421-Mar-2016
Res101+FasterRCNN(COCO+0712trainval) [?] 77.386.983.776.565.959.581.982.690.960.181.064.288.084.986.285.258.779.572.686.471.305-Feb-2017
FFD_07++12 [?] 77.289.086.572.461.751.983.981.391.761.080.666.388.886.886.685.154.180.075.887.274.416-Apr-2018
shufflenetv2_yolov3 [?] 77.289.984.479.366.055.683.882.292.157.081.164.188.985.586.386.556.482.465.584.273.025-Feb-2020
DCFF-Net [?] 77.289.485.673.163.252.184.381.192.161.181.565.489.786.788.484.652.780.073.287.072.803-Jul-2018
RUN300_3WAY, VGG16, 07++12 [?] 77.188.284.476.263.853.182.979.590.960.782.564.189.686.586.683.351.583.074.087.674.426-Sep-2017
DenseSSD-300 07++12 [?] 77.087.584.977.064.049.684.379.391.660.082.664.890.388.187.282.551.181.874.086.872.629-Nov-2017
FasterRCNN [?] 76.884.485.581.465.460.384.983.893.462.085.755.590.888.481.485.750.582.765.289.060.023-Jul-2017
fasterRCNN+COCO+VOC+MCC [?] 76.884.485.581.465.460.384.983.893.462.085.755.590.888.481.485.750.582.765.289.060.023-Jul-2017
Fast-rcnn [?] 76.884.186.779.464.859.285.281.494.663.386.954.692.090.181.785.051.583.763.790.058.724-Oct-2017
IFRN_07+12 [?] 76.687.883.979.064.558.982.282.091.456.582.362.490.485.686.486.455.180.562.785.469.207-Jun-2016
ION [?] 76.487.584.776.863.858.382.679.090.957.882.064.788.986.584.782.351.478.269.285.273.523-Nov-2015
DSOD300 [?] 76.389.485.372.962.749.583.680.692.160.877.965.688.985.586.884.651.177.772.386.072.217-Mar-2017
PLN [?] 76.088.384.777.465.955.882.079.491.958.277.358.889.585.385.382.955.879.664.686.569.927-Mar-2017
Faster RCNN baseline (VOC+COCO) [?] 75.987.483.676.862.959.681.982.091.354.982.659.089.085.584.784.152.278.965.585.470.224-Nov-2015
MNC baseline [?] 75.986.481.176.464.357.881.180.392.055.282.661.089.986.484.685.453.179.866.184.769.915-Dec-2015
SSD300 VGG16 07++12 [?] 75.888.182.974.461.947.682.778.891.558.180.064.189.485.785.582.650.279.873.686.672.118-Oct-2016
Faster+resnet101+07++12 [?] 75.886.983.278.361.958.179.480.491.755.981.058.791.185.284.883.254.778.667.684.970.614-Nov-2017
RFCN_DCN [?] 75.785.783.076.963.657.879.479.592.958.279.660.990.385.385.183.555.779.664.584.668.127-Jun-2017
MCC_FRCN, ResNet101, 07++12 [?] 75.486.083.578.362.259.580.479.191.255.980.156.390.286.684.182.853.078.265.585.469.921-Nov-2016
YOLOv2 [?] 75.486.685.076.861.155.581.278.291.856.879.661.789.786.085.084.251.279.462.984.971.023-Feb-2017
BlitzNet [?] 75.487.582.274.661.646.081.578.491.458.280.364.989.183.685.881.550.679.974.884.971.217-Mar-2017
as [?] 75.085.182.078.563.258.079.981.291.256.779.059.089.683.282.083.254.779.863.482.368.314-Nov-2019
LocNet [?] 74.886.383.076.160.854.679.979.090.654.381.662.089.085.785.582.849.776.667.583.267.406-Nov-2015
DC-SPP-YOLO [?] 74.686.982.575.761.252.982.578.491.052.880.260.889.483.585.582.549.579.863.983.768.308-Oct-2018
DDT augmentation based on web images [?] 74.486.581.976.263.455.480.880.189.751.678.656.288.884.885.582.650.678.164.185.668.126-Jul-2017
MR_CNN_S_CNN_MORE_DATA [?] 73.985.582.976.657.862.779.477.286.655.079.162.287.083.484.778.945.373.465.880.374.006-Jun-2015
HyperNet_VGG [?] 71.484.278.573.655.653.778.779.887.749.674.952.186.081.783.381.848.673.559.479.965.712-Oct-2015
HyperNet_SP [?] 71.384.178.373.355.553.678.679.687.549.574.952.185.681.683.281.648.473.259.379.765.628-Oct-2015
MR_CNN_S_CNN [?] 70.785.079.671.555.357.776.073.984.650.574.361.785.579.981.776.441.069.061.277.772.109-May-2015
Fast R-CNN + YOLO [?] 70.783.478.573.555.843.479.173.189.449.475.557.087.580.981.074.741.871.568.582.167.206-Nov-2015
FasterRCNN [?] 70.482.178.672.654.352.177.376.687.449.876.150.586.580.182.080.846.770.658.880.565.323-Jul-2017
RPN [?] 70.484.979.874.353.949.877.575.988.545.677.155.386.981.780.979.640.172.660.981.261.501-Jun-2015
DEEP_ENSEMBLE_COCO [?] 70.184.079.471.651.951.174.172.188.648.373.457.886.180.080.770.446.669.668.875.971.403-May-2015
OHEM+FRCN, VGG16 [?] 69.881.578.969.652.346.577.472.188.248.873.858.386.979.781.475.043.069.564.878.568.918-Apr-2016
Networks on Convolutional Feature Maps [?] 68.882.879.071.652.353.774.169.084.946.974.353.185.081.379.572.238.972.459.576.768.117-Apr-2015
Fast R-CNN VGG16 extra data [?] 68.482.378.470.852.338.777.871.689.344.273.055.087.580.580.872.035.168.365.780.464.217-Apr-2015
segDeepM [?] 66.481.175.665.747.746.172.169.186.843.071.053.084.976.378.868.840.070.061.871.464.104-Mar-2016
UMICH_FGS_STRUCT [?] 66.482.976.164.144.649.470.371.284.642.768.655.882.777.179.968.741.469.060.072.066.220-Jun-2015
YOLOv2-resnet-18-101 [?] 64.180.271.867.750.545.372.371.979.645.561.947.677.166.675.175.442.463.355.673.758.018-May-2022
NUS_NIN_c2000 [?] 63.880.273.861.943.743.070.367.680.741.969.751.778.275.276.965.138.668.358.068.763.330-Oct-2014
BabyLearning [?] 63.278.074.261.345.742.768.266.880.240.670.049.879.074.577.964.035.367.955.768.762.612-Nov-2014
NUS_NIN [?] 62.477.973.162.639.543.369.166.478.939.168.150.077.271.376.164.738.466.956.266.962.730-Oct-2014
R-CNN (bbox reg) [?] 62.479.672.761.941.241.965.966.484.638.567.246.782.074.876.065.235.665.454.267.460.326-Oct-2014
YOLOv1 [?] 61.475.669.563.442.427.672.159.885.839.866.548.281.575.773.567.231.760.955.175.955.216-Sep-2021
R-CNN [?] 59.276.870.956.637.536.962.963.681.135.764.343.980.471.674.060.030.863.452.063.558.725-Oct-2014
YOLO [?] 57.977.067.257.738.322.768.355.981.436.260.848.577.272.371.363.528.952.254.873.950.806-Nov-2015
Feature Edit [?] 56.374.669.154.439.133.165.262.769.730.856.044.670.064.471.160.233.361.346.461.757.806-Sep-2014
CPE [?] 54.673.175.460.025.135.062.855.273.828.966.330.069.770.176.436.332.353.244.162.462.114-Sep-2021
WithoutFR_CEP [?] 54.374.375.056.827.829.862.655.176.830.464.529.471.867.777.631.433.056.344.363.358.923-Sep-2021
CEP [?] 53.376.374.261.432.435.565.461.479.025.168.522.775.670.076.74.028.256.229.866.757.016-Mar-2021
R-CNN (bbox reg) [?] 53.371.865.852.034.132.659.660.069.827.652.041.769.661.368.357.829.657.840.959.354.113-Mar-2014
ss-pcl [?] 52.674.475.058.736.234.963.064.370.022.867.930.566.872.776.84.724.360.751.343.853.218-Dec-2021
ss-pcl [?] 52.374.175.658.734.535.563.864.665.722.866.530.565.972.276.85.024.361.452.343.951.720-Dec-2021
ss-pcl [?] 52.374.975.559.532.135.363.064.568.221.966.730.867.672.076.35.124.360.454.341.251.920-Dec-2021
ss-pcl [?] 52.275.674.756.836.033.863.865.365.122.866.429.665.472.676.23.226.061.253.243.153.115-Dec-2021
ss-pcl [?] 51.674.674.457.431.435.362.565.266.422.466.330.064.871.875.24.025.760.554.240.350.220-Dec-2021
SDS [?] 50.769.758.448.528.328.861.357.570.824.150.735.964.959.165.857.126.058.838.658.950.721-Jul-2014
R-CNN [?] 49.668.163.846.129.427.956.657.065.926.548.739.566.257.365.453.226.254.538.150.651.630-Jan-2014
EAC-Net [?] 49.168.973.151.234.733.061.158.944.927.866.025.460.662.877.73.029.854.933.653.561.416-Nov-2021
FSD [?] 48.775.772.352.527.235.963.359.356.225.858.628.161.844.175.12.824.255.238.559.158.408-May-2021
SGCM [?] 47.660.168.551.726.427.060.357.166.923.057.425.052.258.471.615.127.954.035.955.659.009-Mar-2019
YOLOv1-resnet-18-50 [?] 47.366.756.149.525.917.860.245.970.626.143.041.167.559.262.447.617.635.645.764.642.413-May-2022
WSODE [?] 46.975.370.251.429.031.660.157.721.122.459.528.534.064.874.86.827.953.145.162.962.117-Dec-2020
Poselets2 [?] ---------------58.7-----06-Jun-2014
Metu_Unified_Net [?] ---------------89.9-----10-Mar-2018
Geometric shape [?] --3.8------------------19-Jun-2016

Abbreviations

TitleMethodAffiliationContributorsDescriptionDate
AInnovation DetectionAInnoDetectionAInnovation Co.Ltd.Faen Zhang, Jiahong Wu, Zhizheng Yang,Haotian Cao, Jianfei Song,Xinyu FanAll models are pre-trained on MSCOCO and then fine-tuned on VOC2012 datasets. We use ResNeXt152+DCN+FPN+CASCADE.Multi-scale train and test techniques are used during training and inference.Ensemble of four models used.2019-07-01 03:52:50
ATLDETATLDETATL(Alibaba Turing Lab)Xuan JinATLDET is pre-trained on ImageNet, fine-tuned on the MS COCO then Fine-tuned on Pascal VOC. Feature of instance segementation is concatenated.2018-08-13 08:13:19
ATLDETv2ATLDETv2ATL(Alibaba Turing Lab)Xuan Jin, Wei Su, Rong Zhang, Yuan He, Hui XueATLDETv2 is pre-trained on ImageNet then fine-tuned on the MSCOCO. Beyond fine-tuning, domain adaptive methods provide better results when we train on Pascal VOC. Backbone is ResneXt152_32x8d with DCN. Multi-scale strategy, soft-nms are also used. Final results come from ensemble of 2 models.2019-10-26 06:49:57
Accurate DetectionAccurateDET4Paradigm Data Intelligence LabFengfu LiI use adaptive method to generate high quality proposals for Faster RCNN. The backbone network is ResNeXt101 + DCN + FPN. Multi-scale + random flip techniques are used during training; while in the testing phase, only flip technique is used. The mAP on VOC07 test set is about 92.8. 2019-06-17 10:32:01
Accurate Detection (ensemble)AccurateDET (ensemble)4Paradigm Data Intelligence LabFengfu LiI use adaptive method to generate high quality proposals for Faster RCNN. The backbone network is ResNeXt101 + DCN + FPN. Multi-scale + random flip techniques are used during training; while in the testing phase, only flip technique is used. By using ensemble of three methods, the mAP on voc07 test set is about 93.8. 2019-06-18 01:25:20
DCN with SoftNMS and FF-SSD ensembleAli_DCN_SSD_ENSEMBLEAlibaba Group, Machine Intelligence Technology LabHongbin Wang, Zhibin Wang, Hao LiAll models are pre-trained on ImageNet 1K dataset, and then fine-tuned on COCO detection dataset. Deformable R-FCN are enhanced by SoftNMS. SSD are enhanced by feature fusion. Ensemble version uses DCN and SSD.2018-05-28 03:02:24
AngDet(ResV1-101,VOC07++12,One-Stage,MS-Test)AngDetNJUSTAng LiWe concentrate on fully attention to enhance the SSD Network.2018-10-04 13:47:12
AngDetAngDetNJUSTAng LiAngDet2018-10-21 03:33:19
Computational Baby LearningBabyLearningNational University of Singapore Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng YanThis entry is an implementation of the framework described in "Computational Baby Learning" (http://arxiv.org/abs/1411.2861). We build a computational model to interpret and mimic the baby learning process, based on prior knowledge modelling, exemplar learning, and learning with video contexts. Training data: (1) We used only two positive instances along with ~20,000 unlabelled videos to train the detector for each object category. (2) We used data from ILSVRC 2012 to pre-train the Network in Network [1] and fine-tuned the network with our newly mined instances. [1] Min Lin, Qiang Chen, Shuicheng Yan. Network In Network. In ICLR 2014.2014-11-12 03:50:50
Fully conv net for segmentation and detectionBlitzNetInriaNikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia SchmidCNN for joint segmentation and detection (based on SSD). Input resolution 512. Trained on VOC07 trainval + VOC12 trainval.2017-03-17 18:22:43
Fully conv net for segmentation and detectionBlitzNetInriaNikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia SchmidCNN for joint segmentation and detection (based on SSD). Input resolution 300. Trained on VOC07 trainval + VOC12 trainval. 2017-03-17 18:24:29
FCNBlitzNet300INRIANikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia SchmidCNN for joint segmentation and detection (based on SSD). Input resolution 300. Operates with speed 24 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO.2017-07-19 13:57:45
FCNBlitzNet512INRIANikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia SchmidCNN for joint segmentation and detection (based on SSD). Input resolution 512. Operates with speed 19 FPS. Trained on VOC07 trainval + VOC12 trainval, pretrained on COCO.2017-07-19 13:38:53
detectionCEPzzuhudetection2021-03-16 04:03:03
WSODCPEzzusuqihuWSOD2021-09-14 08:00:43
CU-SuperDet:HTC+DCN+SNIPERCU-SuperDetChinaUnicom-AIZhiang Hao, Shiguo LianSuperDet uses MS COCO as the pre training set and fine tunes the VOC data set. Resnext-101 is used as the backbone network. The network adopts DCN + HTC structure. In the process of training, multi-scale and random turning are used. The test set adopts multi-scale fusion, and the final result is fused with the result of SNIPER model2020-01-16 11:00:44
Cascaded deeply supervised CrystalNetCascaded_CrystalNetDevABeyondJian LiangCascaded deeply supervised CrystalNet which is derived from a tailed faster rcnn network and incorprate with transform branch between stages2017-12-23 14:30:19
DC-SPP-YOLODC-SPP-YOLOBeijing Uiversity of Chemistry and TechnologyZhanchao HuangDense Connecting and Spatial Pyramid Pooling YOLO, base network: darknet19, VOC 2007+2012 trainval, no COCO 2018-10-08 12:45:43
dense convolutional and feature fused detectorDCFF-NetHuazhong University of Science and TechnologyJingjuan Guo, Caihong Yuan, Zhiqiang Zhao, Ping FengOur network architecture is motivated by DSOD, and do not need pre-training on Imagenet but from scratch. we simplify the basic framework and introduce a novel feature fusion module that can extract more context feature maps.2018-07-03 08:16:35
dense convolutional and feature fused detectorDCFF-NetHuazhong University of Science and TechnologyJingjuan Guo, Caihong Yuan, Zhiqiang Zhao, Ping Feng Our network architecture is motivated by DSOD, and do not need pre-training on Imagenet but from scratch. we simplify the basic framework and introduce a novel feature fusion module that can extract more context feature maps.2018-06-29 02:57:22
DDT augmentationDDT augmentation based on web imagesNanjing University, The University Of AdelaideXiu-Shen Wei, Chen-Lin Zhang, Jianxin Wu, Chunhua Shen, Zhi-Hua ZhouThis entry is based on Faster RCNN and our web-based object detection dataset (i.e., WebVOC [R1]) as an external dataset. Specifically, for WebVOC, we first collect web images from the Internet by Google using the categories of PASCAL VOC. In total, we collect 12,776 noisy web images, which has a similar scale as the original PASCAL VOC dataset. Then, we employ our Deep Descriptor Transforming (DDT) method [R1] to remove the noisy images, and moreover automatically annotate object bounding boxes. 10,081 images with their automatically generated boxes are remaining as valid images. For training detection models, we firstly fine-tune VGG-16 on WebVOC. Then, the WebVOC fine-tuned model is used for the VOC task. The training data of VOC is VOC 2007 trainval, test and VOC 2012 trainval. [R1] Xiu-Shen Wei, Chen-Lin Zhang, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou. Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming, arXiv:1707.06397, 20172017-07-26 10:55:14
An Ensemble of CNNs with COCO AugmentationDEEP_ENSEMBLE_COCOAustralian National University (ANU)Jian(Edison) Guo Stephen GouldWe follow mainly through the RCNN pipeline with the following innovations. 1) We trained an ensemble of CNNs for feature extraction. Our ensemble consists of GoogleNet and VGG-16 networks trained on different subsets of PASCAL VOC 2007/2012 and COCO. 2) We trained an ensemble of one-vs-all SVMs and bounding box regressors corresponding to each model of the CNN ensemble. 3) We averaged the SVM scores across the ensemble and sent the averaged SVM scores through the post-processing pipeline to obtain the indices of the selective search boxes retained after post-processing. 4) With the box indices, we ran box regression for each of the boxes for each of the models in the ensemble and then averaged the boxes across the ensemble to obtain the final results. (please see http://arxiv.org/abs/1506.07224)2015-05-03 15:40:02
DES512_COCODES512_COCOJHUZhishuai ZhangDES512_COCO2018-03-09 23:23:45
DOH_512 (single VGG16, COCO+VOC07++12)DOH_512 (single VGG16, COCO+VOC07++12)CVIP, Korea UNIV., Korea.Younghyun Kim et al.'DOH: Decoupled Object Detection Network via Hidden State Top-Down' DOH consists of a novel Hidden State Top-Down (HSTD) architecture with Recursive Prediction Module (RPM). In this work, the multi-scale features are regarded as a sequential data, and they are integrated by using the hidden state akin to recurrent neural network that is suitable for handling the sequential data. Moreover, HSTD decouples overlapped functions between feature extraction and prediction. To correct the results, RPM derives the attention mask from the result calculated in the previous iteration and repeats the process of refining the feature map for prediction using the attention mask. We train our network on VOC07++12 and COCO (COCO use_difficult_gt: false). 2017-11-07 02:40:26
Learning DSOD from ScratchDSOD300Intel Labs ChinaZhiqiang Shen, Jianguo Li, Zhuang Liu, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue, Thomas HuangWe train DSOD for object detection. The training data is VOC 2007 trainval, test and VOC 2012 trainval without ImageNet pre-trained models. The input image size is 300x300.2017-03-17 00:42:36
Learning DSOD from ScratchDSOD300+Intel Labs ChinaZhiqiang Shen, Jianguo Li, Zhuang Liu, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue, Thomas HuangWe train DSOD for object detection. The training data is VOC 2007 trainval, test, VOC 2012 trainval and MS COCO without ImageNet pre-trained models. The input image size is 300x300.2017-03-16 23:06:59
DSSD513 ResNet-101 07++12DSSD513_ResNet101_07++12UNC Chapel Hill, AmazonCheng-Yang Fu*, Wei Liu*, Ananth Ranga, Ambrish Tyagi, Alexander C. Berg (* equal contribution)We first train SSD513 model using ResNet-101 on VOC07 trainval + test and VOC12 trainval for the 20 PASCAL classes. Then we use that SSD513 as the pre-trained model to train the DSSD513 on same training data. We only test a single model on a single scale image (513x513), and don't have any post-processing steps. Details can be found at : https://arxiv.org/abs/1701.066592017-02-15 18:02:47
Deformable R-FCN, ResNet-101 (VOC+COCO)Deformable R-FCN, ResNet-101 (VOC+COCO)Microsoft Research AsiaJifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei This entry is based on Deformable Convlutional Networks [a], R-FCN [b] and ResNet-101 [c]. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. OHEM and multi-scale training are applied on our model. Multi-scale testing and horizontal flipping are applied during inference. [a] "Deformable Convolutional Networks", Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei (https://arxiv.org/abs/1703.06211) [b] "R-FCN: Object Detection via Region-based Fully Convolutional Networks", Jifeng Dai, Yi Li, Kaiming He, Jian Sun (http://arxiv.org/abs/1605.06409). [c] "Deep Residual Learning for Image Recognition", Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (https://arxiv.org/abs/1512.03385)2017-03-23 03:46:36
DenseSSD-300 07++12DenseSSD-300 07++12CASIApei xuDenseSSD-300 07++122017-11-29 03:38:58
DenseSSD-512 07++12DenseSSD-512 07++12CASIAPei XuDenseSSD-512 07++122017-12-05 09:54:59
EAC-NetEAC-NetJiangnan UniversityWenlong GaoEAC-Net2021-11-16 01:03:31
Object detector with enriched global contextEGCI-NetHuazhong University of Science and TechnologyJingjuan Guo, Caihong Yuan, Zhiqiang Zhao, Ping Feng Our network architecture is motivated by DSOD, and do not need pre-training on Imagenet but from scratch. we simplify the basic framework and introduce a novel pyramid features pool module that can extract more context feature maps.2019-02-26 14:06:46
ESNetESNetPKUZhisheng Lua new feature pyramid2019-02-26 09:16:02
ESNetESNetpkuluzhishenga new feature pyramid2019-02-08 12:37:19
ESNetESNetPKUluisa new feature pyramid2019-02-23 06:29:13
FFD300+FFD_07++12Zhejiang Universityzuwei huangWe train FFD for object detection. The training data is VOC 2007 trainval, test, VOC 2012 trainval without ImageNet pre-trained models. The input image size is 300x300.2018-04-16 07:50:25
FF_CSSD512(07++12+coco), ResNet101FF_CSSD(VOC+COCO, one-stage, single model)Alibaba Group, Machine Intelligence Technology LabZhibin Wang, Hao LiThe FF_CSSD model with ResNet101 as backbone is enhanced by feature fusion and context information. The model is pre-trained on the ImageNet 1K classification training set, fine-tuned on the COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale testing is applied during inference. 2018-05-28 15:06:49
Deformable R-FCN, Focal Loss, ResNet152(VOC+COCO)FOCAL_DRFCN(VOC+COCO, single model)PingAn AI LabZhuzhenwenThis entry is based on ResNet-152, Deformable R-FCN and Focal Loss. The model is pre-trained on the ImageNet training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale traing is applied on our model. Multi-scale testing and horizontal flipping are applied during inference. 2018-03-01 03:09:20
FSD for Weakly supervised object detectionFSDJiangnanUniversitywenlong gaoFSD for Weakly supervised object detection2021-05-08 15:17:03
FSSD300FSSD300Beihang UniversityLi ZuoxinFeature fusion SSD which is based on VGG16. It can run at 68FPS on a single 1080Ti.2017-11-10 03:05:03
FSSD512FSSD512Beihang UniversityLi ZuoxinFeature fusion SSD with 512x512 input image. It can run at 35 fps on a 1080Ti.(VOC07++12+COCO)2017-11-07 13:46:58
FXRCNNFXRCNN (single model)Yi+AI LabHang Zhang, Boyuan Sun, Zhaonan Wang, Hao Zhao, ZiXuan Guan, Wei Miao1) Our model is pre-trained on ImageNet, fine-tuned on the MS COCO; 2) Fine-tuned on Pascal VOC. 3) ResNeXt with FPN is used as our backbone; 4) SoftNMS is used in post processing; 5) We also use multi-scale training strategy; 2018-07-13 03:54:02
Fast R-CNN with YOLO RescoringFast R-CNN + YOLOUniversity of WashingtonJoseph Redmon, Santosh Divvala, Ross Girshick, Ali FarhadiWe use the YOLO detection method to rescore the bounding boxes from Fast R-CNN. This helps mitigate false background detections and improve overall performance. For more information and example code see: http://pjreddie.com/darknet/yolo/2015-11-06 08:03:59
Fast R-CNN VGG16 extra dataFast R-CNN VGG16 extra dataMicrosoft ResearchRoss GirshickFast R-CNN is a new algorithm for training R-CNNs. The training process is a single fine-tuning run that jointly trains for softmax classification and bounding-box regression. Training took ~22 hours on a single GPU and testing takes ~330ms / image. A tech report describing the method is forthcoming. Open source code will be release. This entry was trained on VOC 2012 train+val union with VOC 2007 train+val+test.2015-04-17 17:32:25
Faster-rcnn Resnet50m soft-nms linearFast-rcnnHITixhorsepass2017-10-24 02:43:14
Faster RCNN baseline (VOC+COCO)Faster RCNN baseline (VOC+COCO)Microsoft ResearchShaoqing Ren, Kaiming He, Ross Girshick, Jian SunThis entry is an baseline implementation of the system described in " Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" (arXiv 2015). We use an ImageNet-pre-trained model (VGG-16), and fine-tune it on COCO trainval detection task. Then the COCO fine-tuned model is used for VOC task. The training data of VOC is VOC 2007 trainval, test and VOC 2012 trainval. The entire system takes <200ms per image, including proposal and detection.2015-11-24 03:56:56
Faster RCNN, ResNet (VOC+COCO)Faster RCNN, ResNet (VOC+COCO)Microsoft ResearchShaoqing Ren, Xiangyu Zhang, Kaiming He, Jian SunThis entry is based on an improved Faster R-CNN system [a] and an extremely deep Residual Net [b] with a depth of over 100 layers. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. [a] "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. NIPS 2015. [b] "Deep Residual Learning for Image Recognition", Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Tech Report 2015. 2015-12-10 14:47:49
FM+CRPN+global contextFaster+resnet101+07++12Harbin Institute of TechnologyChu MengdieOur work is based on Faster R-CNN and ResNet101. (1) use FPN to merge features (2) The context features are extracted from the entire image’s feature maps using ROI pooling layer, and then merged with the region’s features maps.(3) use cascade RPN to fine-tuned the bounding boxs (4) The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets only.2017-11-14 03:07:04
FasterRCNN FasterRCNN FasterRCNN FasterRCNN FasterRCNN 2017-07-23 13:42:44
FasterRCNN FasterRCNN FasterRCNN FasterRCNN FasterRCNN 2017-07-23 13:38:24
FasterRcnn-ResNeXt101(COCO+07++12, single model)FasterRcnn-ResNeXt101(COCO+07++12, single model)Beijing University of Posts and Telecommunications, (BUPT-PRIV)Lu Yang; Qing Song; Zhihui Wang; Min YangOur network based on ResNeXt101-32x4d and faster rcnn, multi-scale training / multi-scale testing / image flipping are applied on this submittion. We first train our network on COCO and VOC0712trainval sets, then finetune on VOC07trainvaltest and VOC12trainval sets.2017-05-04 10:57:08
Diamond Frame Bicycle RecognitionGeometric shapeNational Cheng Kung UniversityChung-Ping Young, Yen-Bor Lin, Kuan-Yu ChenBicycle of diamond frame detector for side-view image is proposed based on the observation that a bicycle consists of two wheels in the form of ellipse shapes and a frame in the form of two triangles. Through the design of geometric constraints on the relationship between the triangles and ellipses, the computation is fast comparing to the feature-based classifiers. Besides, the training process is unnecessary and only single image is required for our algorithm. The experimental results are also given in this paper to show the practicability and the performance of the proposed bicycle model and bicycle detection algorithm.2016-06-19 10:06:33
Hierarchical Feature ModelHFM_VGG16Inha UniversityByungjae Lee, Enkhbayar Erdenee, Sungyul Kim, Phill Kyu RheeWe are motivated from the observations that many object detectors are degraded in performance due to ambiguities in inter-class and variations in intra-class appearances; deep features extracted from visual objects show strong hierarchical clustering property. We partition the deep features into unsupervised super-categories in the inter-class level, augmented categories in the object level to discover deep-feature-driven knowledge. We build Hierarchical Feature Model (HFM) using the Latent Topic Model (LTM) algorithm, ensemble one-versus-all SVMs at each node, and constitute hierarchical classification ensemble (HCE). In detection phase, object categorization and localization are processed based on the hypothesis of HCE with hierarchical mechanism.2016-03-21 10:59:33
Faster R-CNN with cascade RPN and global contextHIK_FRCNHikvision Research InstituteQiaoyong Zhong, Chao Li, Yingying Zhang, Di Xie, Shiliang PuOur work on object detection is based on Faster R-CNN. We design and validate the following improvements: * Better network. We find that the identity-mapping variant of ResNet-101 is superior for object detection over the original version. * Better RPN proposals. A novel cascade RPN is proposed to refine proposals' scores and location. A constrained neg/pos anchor ratio further increases proposal recall dramatically. * Pretraining matters. We find that a pretrained global context branch increases mAP by over 3 points. * Training strategies. To attack the imbalance problem, we design a balanced sampling strategy over different classes. Other training strategies, like multi-scale training and online hard example mining are also applied. * Testing strategies. During inference, multi-scale testing, horizontal flipping and weighted box voting are applied. Based on an ImageNet DET pretrained model, we first finetune on COCO+VOC dataset, then finetune on VOC dataset only.2016-09-19 05:50:00
HyperNet_SPHyperNet_SPIntel Labs China Tao Kong, Anbang Yao, Yurong Chen, Fuchun SunWe train hyperNet for object detection. An ImageNet-pre-trained model (VGG-16) is used for training HyperNet, both for proposal and detection. The training data is VOC 2007 trainval, test and VOC 2012 trainval. The proposal num is 100 for each image. This is a speed up version of the basic HyperNet. We move the 3×3×4 convolutional layer to the front of ROI pooling layer. This slight change has two advantages: (a) The channel number of Hyper Feature maps has been significantly reduced (from 126 to 4). (b) The sliding window classifier is more simple (from Conv-FC to FC). Both two characteristics can speed up proposal generation process. The speed is 5 fps using VGG16.2015-10-28 07:36:14
HyperNet_VGG16HyperNet_VGGIntel Labs China Tao Kong, Anbang Yao, Yurong Chen, Fuchun SunWe train hyperNet for object detection. An ImageNet-pre-trained model (VGG-16) is used for training HyperNet, both for proposal and detection. The training data is VOC 2007 trainval, test and VOC 2012 trainval. The proposal num is 100 for each image.2015-10-12 02:52:03
Implicit+Sink+DilationICT_360_ISDInstitute of Computing Technology, Chinese Academy of ScienceYu Li, Min Lin, Sheng Tang, Shuicheng YanWe update the method before 2016-11-18 03:34:32
Improved Feature RCNNIFRN_07+12Tsinghua MIGHaofeng Zou, Guiguang Dingadd improved global & local feature in RCNN and use a iterative detection method2016-06-07 07:47:00
Inside-Outside NetIONCornell UniversitySean Bell, Larry Zitnick, Kavita Bala, Ross GirshickOur "Inside-Outside Net" (ION) detector will be described soon in an arXiv submission. The method is based on Fast R-CNN with VGG16 and was trained on VOC 2012 train+val union VOC 2007 train+val (not VOC 2007 test), as well as the segmentations from SDS (Simultaneous Detection and Segmentation) on the training set images. We use the selective search boxes published with Fast R-CNN. Runtime: ~1.15s/image on a Titan X GPU (excluding proposal generation).2015-11-23 04:37:20
Light R-CNNLight R-CNNIGDPengLight R-CNN2020-01-07 16:05:43
Light R-CNNLight R-CNNIGDLight R-CNNLight R-CNN2020-02-06 06:06:46
Light R-CNNLight R-CNNIGDPengLight R-CNN with400 500 600 700 800 1500 proposals2020-01-15 15:17:15
Improving Localization Accuracy for Object DetectiLocNetENPCSpyros Gidaris, Nikos KomodakisWe propose a novel object localization methodology with the purpose of boosting the localization accuracy of state-of-the-art object detection systems. Our model, given a search region, aims at returning the bounding box of an object of interest inside this region. To accomplish its goal, it relies on assigning conditional probabilities to each row and column of this region, where these probabilities provide useful information regarding the location of the boundaries of the object inside the search region and allow the accurate inference of the object bounding box under a simple probabilistic framework. For implementing our localization model, we make use of a convolutional neural network architecture that is properly adapted for this task, called LocNet. We show experimentally that LocNet achieves a very significant improvement on the mAP for high IoU thresholds on PASCAL VOC2007 test set and that it can be very easily coupled with recent state-of-the-art object detection systems, helping them to boost their performance. 2015-11-06 22:59:43
FRCN with multi-level feature and global contextMCC_FRCN, ResNet101, 07++12Harbin Institute of Technology Shenzhen Graduate SchoolWang Yuan, You LeiOur work is based on Faster R-CNN and ResNet101. (1) The low-level features are down-sampled using the convolution layer (stride 2), adjusted to the same size as the high-level features, and then merged for proposal and detection. (2) The context features are extracted from the entire image’s feature maps using ROI pooling layer, and then merged with the region’s features maps. (3) Weighted box voting are applied. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets only.2016-11-21 03:34:12
spanMLANetHFUTJeremyspan2020-03-26 23:16:44
Multi-task Network CascadesMNC baselineMicrosoft Research AsiaJifeng Dai, Kaiming He, Jian SunOur Multi-task Network Cascades (MNCs) is described in arxiv paper "Multi-task Network Cascades for Instance-aware Semantic Segmentation" (http://arxiv.org/abs/1512.04412). The entry is based on MNCs and VGG-16 net. The training data is VOC 2007 trainval, test, and VOC 2012 trainval, augmented with the segmentation annotations from SBD ("Semantic contours from inverse detectors"). The overall runtime is 0.36sec/image on a K40 GPU.2015-12-15 14:06:18
MONet(VOC+COCO)MONet(VOC+COCO)USTCTao GongThis entry is based on MONet and ResNet-101. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale training is applied on our model. Multi-scale testing and horizontal flipping are applied during inference.2018-04-01 13:02:52
Multi-Region & Semantic Segmentation-Aware CNNMR_CNN_S_CNNUniversite Paris Est, Ecole des Ponts ParisTechSpyros Gidaris, Nikos KomodakisThis entry is an implementation of the system described in "Object detection via a multi-region & semantic segmentation-aware CNN model" (http://arxiv.org/abs/1505.01749). The training data used for this entry are: 1) ImageNet for pre-training (of the 16-layers VGG-Net), 2) VOC2012 train set for fine-tuning of the deep models, and 3) VOC2012 train+val for training the detection SVMs. Abstract of "Object detection via a multi-region & semantic segmentation-aware CNN model": "We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization. We exploit the above properties of our recognition module by integrating it on an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model."2015-05-09 23:15:56
Multi-Region & Semantic Segmentation-Aware CNNMR_CNN_S_CNN_MORE_DATAUniversite Paris Est, Ecole des Ponts ParisTechSpyros Gidaris, Nikos KomodakisThis entry is an implementation of the system described in "Object detection via a multi-region & semantic segmentation-aware CNN model" (http://arxiv.org/abs/1505.01749). The training data used for this entry are: 1) ImageNet for pre-training (of the 16-layers VGG-Net), 2) VOC2007 train+val and VOC2012 train+val sets for fine-tuning the deep models and training the detection SVMs. Abstract of "Object detection via a multi-region & semantic segmentation-aware CNN model": "We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization. We exploit the above properties of our recognition module by integrating it on an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model."2015-06-06 15:49:11
Multi-Task Learning for Human Pose EstimationMetu_Unified_NetMiddle East Technical UniversitySalih Karagoz, Muhammed Kocabas, Emre AkbasMulti-Task Learning for Multi-Person Pose Estimation, Human Semantic Segmentation and Human Detection. The model works simultaneously. We just only trained with coco-dataset. No additional data has used.2018-03-10 12:39:37
The NIN extension of RCNNNUS_NINNUSJian Dong, Qiang Chen, Min Lin, Shuicheng YanThe entry is based on Ross Girshick's RCNN framework. We employ a single Network in Network [1] as the feature extractor to improve the model discriminative capability. We follow Girshick's RCNN protocal for training: (1) We used data from ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. This entry is used as the baseline for the journal version of [2]. [1] Min Lin, Qiang Chen, Shuicheng Yan. Network In Network. In ICLR 2014. [2] Jian Dong, Qiang Chen, Min Lin, Shuicheng Yan, Alan Yuille: Towards Unified Object Detection and Semantic Segmentation. 2014-10-30 15:47:28
The NIN extension of RCNNNUS_NIN_c2000NUSJian Dong, Qiang Chen, Min Lin, Shuicheng YanThe entry is based on Ross Girshick's RCNN framework. We employ a single Network in Network [1] as the feature extractor to improve the model discriminative capability. We follow Girshick's RCNN protocal for training: (1) We used data from ILSVRC 2012 + 1000 extra categories of ImageNet to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. This entry is used as the baseline for the journal version of [2]. [1] Min Lin, Qiang Chen, Shuicheng Yan. Network In Network. In ICLR 2014. [2] Jian Dong, Qiang Chen, Min Lin, Shuicheng Yan, Alan Yuille: Towards Unified Object Detection and Semantic Segmentation. 2014-10-30 15:45:29
NoCNetworks on Convolutional Feature MapsMicrosoft ResearchShaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian SunThis entry is an implementation of the system described in “Object Detection Networks on Convolutional Feature Maps” (http://arxiv.org/abs/1504.06066). We train a “Network on Convolutional feature maps” (NoC) for fast and accurate object detection. Training data for this entry include: (i) ImageNet data for pre-training (VGG-16); (ii) VOC 2007 trainval and 2012 trainval for training the NoC on pooled region features. Selective Search and EdgeBoxes are used for proposal.2015-04-17 17:21:10
Online Hard Example Mining for Fast R-CNN (VGG16)OHEM+FRCN, VGG16Carnegie Mellon University, Facebook AI ResearchAbhinav Shrivastava, Abhinav Gupta, Ross GirshickWe propose an online hard example mining (OHEM) algorithm to train region-based ConvNet detectors. This entry uses OHEM to train the Fast R-CNN (FRCN) object detection system. We use an ImageNet pre-trained VGG16 model and fine-tune it on VOC 2012 trainval dataset. For more details, please refer to 'Training Region-based Object Detectors with Online Hard Example Mining', CVPR 2016 (http://arxiv.org/abs/1604.03540).2016-04-18 05:16:35
Online Hard Example Mining for Fast R-CNN (VGG16)OHEM+FRCN, VGG16, VOC+COCOCarnegie Mellon University, Facebook AI ResearchAbhinav Shrivastava, Abhinav Gupta, Ross GirshickWe propose an online hard example mining (OHEM) algorithm to train region-based ConvNet detectors. This entry uses OHEM to train the Fast R-CNN (FRCN) object detection system. We use an ImageNet pre-trained VGG16 model, use OHEM to fine-tune on COCO trainval set and further fine-tune on VOC 2012 trainval, VOC 2007 trainval and VOC 2007 test dataset. For more details, please refer to 'Training Region-based Object Detectors with Online Hard Example Mining', CVPR 2016 (http://arxiv.org/abs/1604.03540).2016-04-18 05:18:28
PACITYAI DetectionPACITYAIDetectionPing An International Smart City Technology Co., Ltd.Zhenxing ZhaoFaster RCNN(The backbone network is ResNeXt101 + DCN + FPN) pretrained on coco, Multi-scale train and test2019-09-26 04:05:36
PFPNet512 VGG16 07++12+COCOPFPNet512 VGG16 07++12+COCOKorea UniversitySeung-Wook Kim, Hyong-Keun Kook, Young-Hyun Kim, Ji-Young Sun, Sang-Won Lee, and Sung-Jea KoOur network model constructs a feature-pyramid along the network width via the spatial pyramid pooling (SPP) network. Different from object detectors using a feature pyramid across the network height, the feature-maps in the proposed feature pyramid are abstracted in parallel, and thus the detection performance on small-sized objects can be improved. The base network of our model is VGG-16 pretrained on the 1,000-class ImageNet classification training set. From this, the model is fine-tuned on the MS COCO trainval35k set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets.2017-10-18 14:01:43
PFPNet512_ECCVPFPNet512_ECCVKorea Univ.Seung-Wook KimVOC07++12+COCO multi-testing th: 0.012018-03-22 09:35:27
PLNPLNXXXXKaibing Chen, Xinggang Wang, Zilong HuangPoint Linking Network, trained only on pascal voc 07++12 dataset.2017-03-27 07:53:57
PSSNet(VOC+COCO)PSSNet(VOC+COCO)USTCTao GongThis entry is based on PSSNet and ResNet-101. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale training is applied on our model. Multi-scale testing and horizontal flipping are applied during inference.2018-03-30 15:40:25
Faster R-CNN with PVANet (VOC+COCO)PVANet+Intel Imaging and Camera TechnologySanghoon Hong, Byungseok Roh, Kye-Hyeon Kim, Yeongjae Cheon, Minje ParkBased on Faster R-CNN with a network designed from scratch. The network is designed for efficiency and it takes less than 50 ms including proposal generation and detection (tested with 200 proposals on Titan X). The network is pre-trained with the ImageNet classification training set and fine-tuned with VOC2007/2012/MSCOCO trainval sets and VOC2007 test set. Only single-scale images are used while testing. Please refer to “PVANet: Lightweight Deep Neural Networks for Real-time Object Detection” (https://arxiv.org/abs/1611.08588) and https://github.com/sanghoon/pva-faster-rcnn for more details.2016-10-26 09:25:07
Faster R-CNN with PVANet (VOC+COCO)PVANet+ (compressed)Intel Imaging and Camera TechnologySanghoon Hong, Byungseok Roh, Kye-Hyeon Kim, Yeongjae Cheon, Minje ParkBased on Faster R-CNN with a network designed from scratch. The network is designed for efficiency and it takes only 32 ms (30 fps) including proposal generation and detection (tested with 200 proposals on Titan X). The network is pre-trained with the ImageNet classification training set and fine-tuned with VOC2007/2012/MSCOCO trainval sets and VOC2007 test set. Only single-scale images are used while testing. Please refer to “PVANet: Lightweight Deep Neural Networks for Real-time Object Detection” (https://arxiv.org/abs/1611.08588) and https://github.com/sanghoon/pva-faster-rcnn for more details.2016-11-18 07:05:29
Region-based CNNR-CNNUC BerkeleyRoss Girshick, Jeff Donahue, Trevor Darrell, Jitendra MalikThis entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524 version 5). Code is available at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. The same detection SVMs were used for the 2012 and 2010 results. For this submission, we used the 16-layer ConvNet from Simonyan & Zisserman instead of Krizhevsky et al.'s ConvNet.2014-10-25 21:09:52
Regions with Convolutional Neural Network FeaturesR-CNNUC BerkeleyRoss Girshick, Jeff Donahue, Trevor Darrell, Jitendra MalikThis entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524). We made two small changes relative to the arXiv tech report that are responsible for improved performance: (1) we added a small amount of context around each region proposal (16px at the warped size) and (2) we used a higher learning rate while fine-tuning (starting at 0.001). Aside from non-maximum suppression no additional post-processing (e.g., detector or image classification context) was applied. Code will be made available soon at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 train (3) We trained object detector SVMs using 2012 train+val The same detection SVMs were used for the 2012 and 2010 results. 2014-01-30 01:46:58
Regions with Convolutional Neural Network FeaturesR-CNN (bbox reg)UC BerkeleyRoss Girshick, Jeff Donahue, Trevor Darrell, Jitendra MalikThis entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524). We made two small changes relative to the arXiv tech report that are responsible for improved performance: (1) we added a small amount of context around each region proposal (16px at the warped size) and (2) we used a higher learning rate while fine-tuning (starting at 0.001). Aside from non-maximum suppression no additional post-processing (e.g., detector or image classification context) was applied. Code will be made available soon at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 train (3) We trained object detector SVMs using 2012 train+val The same detection SVMs were used for the 2012 and 2010 results. This submission includes a simple regression from pool5 features to bounding box coordinates.2014-03-13 18:08:18
Region-based CNNR-CNN (bbox reg)UC BerkeleyRoss Girshick, Jeff Donahue, Trevor Darrell, Jitendra MalikThis entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524 version 5). Code is available at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. The same detection SVMs were used for the 2012 and 2010 results. For this submission, we used the 16-layer ConvNet from Simonyan & Zisserman instead of Krizhevsky et al.'s ConvNet.2014-10-26 03:29:27
R-DAD (VOC07++12)R-DAD (VOC07++12)Incheon National University (INU)Seung-Hwan BaeWe only use the VOC dataset for training (without using the COCO dataset). We use our region decomposition and assembly detector (R-DAD) based on ResNet152 for this evaluation. 2018-03-06 01:15:31
R-FCN, ResNet (VOC+COCO)R-FCN, ResNet (VOC+COCO)Microsoft ResearchHaozhi Qi*, Yi Li*, Jifeng Dai* (* equal contribution)This entry is based on R-FCN [a] and ResNet-101. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. OHEM and multi-scale training are applied on our model. Multi-scale testing and horizontal flipping are applied during inference. [a] “R-FCN: Object Detection via Region-based Fully Convolutional Networks”, Jifeng Dai, Yi Li, Kaiming He, Jian Sun (http://arxiv.org/abs/1605.06409).2016-10-09 08:33:08
R-FCN, ResNet Ensemble(VOC+COCO)R-FCN, ResNet Ensemble(VOC+COCO)Microsoft ResearchHaozhi Qi*, Yi Li*, Jifeng Dai* (* equal contribution)This entry is based on R-FCN [a] and ResNet models. We utilize an ensemble of R-FCN models pre-trained on 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. OHEM and multi-scale training are applied on our model. Multi-scale testing and horizontal flipping are applied during inference. [a] “R-FCN: Object Detection via Region-based Fully Convolutional Networks”, Jifeng Dai, Yi Li, Kaiming He, Jian Sun (http://arxiv.org/abs/1605.06409).2016-10-09 08:45:02
R4D_faster_rcnnR4D_faster_rcnnTsinghua universityzeming li gang yuR4D_faster_rcnn2016-11-20 00:54:51
RFCN_DCNRFCN_DCNXXXtesterRFCN_DCN2017-06-27 12:55:51
Region Proposal NetworkRPNMicrosoft ResearchShaoqing Ren, Kaiming He, Ross Girshick, Jian Sun This entry is an implementation of the system described in " Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" (arXiv 2015). An ImageNet-pre-trained model (VGG-16) is used for training a Region Proposal Network (RPN) and Fast R-CNN detector. The training data is VOC 2007 trainval, test and VOC 2012 trainval. The entire system takes <200ms per image, including proposal and detection.2015-06-01 10:29:23
RUN300_3WAY, VGG16, 07++12 RUN300_3WAY, VGG16, 07++12 Seoul National UniversityKyoungmin Lee, Jaeseok Choi, Jisoo Jeong, Nojun KwakWe focused on solving a structural contradiction and enhancing the contextual information of the multi-scale feature maps. We propose a network, based on SSD, using ResBlock and deconvolution layers to enrich the representation power of feature maps. In addition, a unified prediction module is applied to generalize output result. It takes 15.6ms for Titan X Pascal GPU, which indicates that it maintains the advantage of fast computation of a single stage detector.(https://arxiv.org/abs/1707.05031)2017-09-26 04:26:07
RUN_3WAY_300, VGG16, 07++12+COCORUN_3WAY_300, VGG16, 07++12+COCOSeoul National UniversityKyoungmin Lee, Jaeseok Choi, Jisoo Jeong, Nojun KwakWe fine-tuned RUN 3WAY model trained using VGG16 on MS COCO. (https://arxiv.org/abs/1707.05031)2017-10-13 03:17:59
RUN_3WAY_512, VGG16, 07++12RUN_3WAY_512, VGG16, 07++12Seoul National UniversityKyoungmin Lee, Jaeseok Choi, Jisoo Jeong, Nojun KwakWe focused on solving a structural contradiction and enhancing the contextual information of the multi-scale feature maps. We propose a network, based on SSD, using ResBlock and deconvolution layers to enrich the representation power of feature maps. In addition, a unified prediction module is applied to generalize output result. It takes 15.6ms for Titan X Pascal GPU, which indicates that it maintains the advantage of fast computation of a single stage detector.(https://arxiv.org/abs/1707.05031)2017-10-22 04:10:01
Rank of experts (VOC07++12)Rank of experts (VOC07++12)Incheon National University (INU) and Electronics and Telecommunications Research Institute (ETRI)Seung-Hwan Bae (INU), Youngjoo Jo (ETRI) and Youngwan Lee (ETRI)We only use the VOC dataset for training (without using the COCO dataset). We train the three types of convolutional detectors for this challenge: (1) Faster RCNN type 1: We use the pre-trained resnet101/152/269 models as CLS-Net. We then add region proposal networks to the CLS-Net. (2) Faster RCNN type 2: We apply a resizing method with bilinear interpolation on resnet152 model instead of ROI Pooling. The method also is used to make new hyper-feature layer. (3) SSD type: We use DSSD with VGGNet and SSDwith WR-Inception Network. Network ensemble: To ensemble the results, we combined the detections results of models our Rank of Experts algorithm. After that, the soft-NMS has been performed.2017-11-15 14:31:16
Single-Shot Refinement Neural NetworkRefineDet (VOC+COCO,single model,VGG16,one-stage)CASIAShifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. LiWe propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to (1) filter out negative anchors to reduce search space for the classifier, and (2) coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as the input from the former to further improve the regression and predict multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes and class labels of objects in the object detection module. The multi-task loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency. Code is available at https://github.com/sfzhang15/RefineDet.2018-03-16 05:52:32
Res101+FasterRCNNRes101+FasterRCNN(COCO+0712trainval)MeituKang YangI use ResNet-101 + FasterRCNN train on COCO, fine tuning on voc_2007_tranval+voc_2012_trainval, test on voc_2012_test2017-02-05 03:16:39
Res101+hyper+FasterRCNN(COCO+0712trainval)Res101+hyper+FasterRCNN(COCO+0712trainval)MeituKang YangI use Res101+hyper+FasterRCNN(COCO+0712trainval)2017-02-10 03:03:50
SDSSDSUC BerkeleyBharath Hariharan Pablo Arbelaez Ross Girshick Jitendra MalikWe aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [1]), introducing a novel architecture tailored for SDS. We then use category-specific, top-down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 4 point boost (8% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work.2014-07-21 22:46:22
cascaded mil for wsodSGCMInstitute of computing, Chinese Academy of SciencesYan Gao,Boxiao LiuSGCM is a segmentation guided cascade MIL for weakly supervised object detection method. which use a cascade mil architecture to detect more complete data2019-03-09 09:03:43
Modified_FasterRCNN_v2SHS_Faster_RCNN_Upgrade_v2SHSzhg pengA modified faster RCNN is used as backbone, and multi-scale voting is applied. The model is pre-trained on ImageNet 1K dataset, fine-tuned on COCO detection base, and finally fine-tuned on VOC 0712 data sets.2019-02-25 23:58:50
feature refinement SSDSSD based methodDLUTNovakWe propose to use roi align to extract proposal features in SSD2018-10-24 02:25:54
SSD300SSD300 VGG16 07++12Google, UNC Chapel Hill, ZooxWei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. BergWe train SSD model using VGG16 on 300 x 300 input image. The training data is VOC07 trainval + test and VOC12 trainval. The inference speed is 59 FPS on Titan X with batch size 8 or 46 FPS with batch size 1. We only test a single model on a single scale image (300x300), and don't have any post-processing steps. Check out our code and more details at: https://github.com/weiliu89/caffe/tree/ssd2016-10-18 17:53:04
SSD300ftSSD300 VGG16 07++12+COCOGoogle, UNC Chapel Hill, ZooxWei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. BergWe first train SSD300 model using VGG16 on MS COCO trainval35k, then fine-tune it on VOC07 trainval + test and VOC12 trainval for the 20 PASCAL classes.2016-10-03 07:08:37
SSD512SSD512 VGG16 07++12Google, UNC Chapel Hill, ZooxWei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. BergWe train SSD model using VGG16 on 512 x 512 input image. The training data is VOC07 trainval + test and VOC12 trainval. The inference speed is 22 FPS on Titan X with batch size 8 or 19 FPS with batch size 1. We only test a single model on a single scale image (512x512), and don't have any post-processing steps. Check out our code and more details at: https://github.com/weiliu89/caffe/tree/ssd2016-10-13 17:28:35
SSD512ftSSD512 VGG16 07++12+COCOGoogle, UNC Chapel Hill, ZooxWei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. BergWe first train SSD512 model using VGG16 on MS COCO trainval35k, then fine-tune it on VOC07 trainval + test and VOC12 trainval for the 20 PASCAL classes. We only test a single model on a single scale image (512x512), and don't have any post-processing steps.2016-10-10 19:35:42
GCFE_RCNNSogou_MM_GCFE_RCNN(ensemble model)Sogou IncHongyuan Zhang, Bin LiWe proposed “Global concatenating feature enhancement network for instance segementation”, 1) Our model is pre-trained on ImageNet, fine-tuned on the MS COCO; 2) Fine-tuned on Pascal VOC. 3) ResNeXt152 with FPN is used as our backbone; 4) We also use multi-scale training strategy2018-09-25 03:44:24
GCFE_RCNNSogou_MM_GCFE_RCNN(single model)Sogou IncHongyuan Zhang, Bin LiWe proposed “Global concatenating feature enhancement network for instance segementation”, 1) Our model is pre-trained on ImageNet, fine-tuned on the MS COCO; 2) Fine-tuned on Pascal VOC. 3) ResNeXt152 with FPN is used as our backbone; 4) We also use multi-scale training strategy2018-09-25 03:43:13
Fine-grained search using R-CNN with StructObjUMICH_FGS_STRUCTUniversity of Michigan & Zhejiang UniversityYuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak LeeWe performed the Bayesian optimization based fine-grained search (FGS) using the R-CNN detector trained with structured objective: (1) We used the 16-layer network pretrained by VGG group. (2) We finetuned the network with softmax classifier using VOC2012 detection trainval set. (3) Structured SVMs are trained using VOC2012 trainval as object detector. (4) FGS is applied based on the R-CNN initial solutions. (5) Bounding box regression is adopted. Please refer to this paper for details: Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak Lee, “Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction”, CVPR 2015.2015-06-20 21:39:43
VIM_SSD(COCO+07++12, single model)VIM_SSDVimicroAIMin Yang, Guo Ai, YunDong ZhangThis entry is based on SSD and VGG16. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale testing and horizontal flipping are applied during inference.2018-05-11 10:10:59
VIM_SSDVIM_SSD(COCO+07++12, single model, one-stage)VimicroAIMin Yang, Guo Ai, YunDong ZhangThis entry is based on SSD and VGG16. The model is pre-trained on the 1000-class ImageNet classification training set, fine-tuned on the MS COCO trainval set, and then fine-tuned on the VOC 2007 trainval+test and VOC 2012 trainval sets. Multi-scale testing and horizontal flipping are applied during inference.2018-06-27 14:09:40
Imorove the detection performance of WSOD with edgWSODEJiangnan UniversityWenlong Gao, Ying Chen, Yong PengImprove WSOD with edge information2020-12-17 14:26:31
detectionWithoutFR_CEPzzusuhuqidetection2021-09-23 06:25:21
You Only Look Once: Unified, Real-Time DetectionYOLOUniversity of WashingtonJoseph Redmon, Santosh Divvala, Ross Girshick, Ali FarhadiWe train a convolutional neural network to perform end-to-end object detection. Our network processes the full image and outputs multiple bounding boxes and class probabilities. At test time we process images in real-time at 45fps. For more information and example code see: http://pjreddie.com/darknet/yolo/2015-11-06 07:36:38
YOLOv1YOLOv1Jiangxi University of Science and Technologylijiajunoffice YOLOv1 from pjreddie2021-09-16 10:03:33
YOLOv1-resnet-18-50YOLOv1-resnet-18-50personalHaoyun Qinreimplementation of yolo v1 with tricks applied. switched backbone to resnet18-cmp3 and resnet50-cmp4.2022-05-13 12:24:19
YOLOv2YOLOv2University of WashingtonJoe Redmon, Ali FarhadiWe use a variety of tricks to increase the performance of YOLO including dimension cluster priors and multi-scale training. Details at https://pjreddie.com/yolo/2017-02-23 16:37:58
YOLOv2 (VOC + COCO)YOLOv2 (VOC + COCO)University of WashingtonJoseph Redmon, Ali FarhadiWe use a variety of tricks to increase the performance of YOLO including dimension cluster priors and multi-scale training. Details at https://pjreddie.com/yolo/2017-10-21 18:07:57
YOLOv2-resnet-18-101YOLOv2-resnet-18-101personalHaoyun Qinreimplementation of yolo v2 using pytorch and resnet2022-05-18 10:34:21
asasasasas2019-11-14 07:27:50
COCO+VOCfasterRCNN+COCO+VOC+MCCnonefasterRCNN+COCO+VOC+MCCfasterRCNN+COCO+VOC+MCC2017-07-23 13:54:24
innovisgroupinnovisgroup Faster R-CNNinnovisgroupyanjichenThis network is based on Faster R-CNN.2018-05-22 14:56:57
CNN with Segmentation and Context Cues segDeepMUniversity of TorontoYukun Zhu, Ruslan Salakhutdinov, Raquel Urtasun, Sanja Fidler segDeepM on PASCAL2012, w/ bounding box regression 2016-03-04 19:28:43
shufflenetv2_yolov3shufflenetv2_yolov3PQLabsXiuyang Leioptimal yolov3 with adjusted shufflenetv2. trained with 07++12 data, backbone pre-trained with imagenet. the whole model only has 3.0bflops2020-02-25 06:25:18
semi supervised pclss-pclHuazhong University of Science and TechnologyWanYusensemi supervised pcl detection results after 17499 training iterates while attenuation coefficient is 0.9 and compensation coefficient is 1.12021-12-20 08:25:57
semi-supervised pclss-pclHuazhong University of Science and TechnologyWanYusensemi-supervised pcl after 24999 training iterates while attenuation coefficient is 0.9, compensation coefficient is 1.12021-12-20 02:45:31
semi supervised pclss-pclHuazhong University of Science and TechnologyWan Yusensemi supervised pcl detection results after 19999 training iterates while attenuation coefficient is 0.9 and compensation coefficient is 1.1.2021-12-20 02:37:55
semi-supervised pclss-pclHuazhong University of Science and TechnologyYusen Wansemi-supervised pcl detection results after 17499 iterates training2021-12-15 07:53:33
semi-supervised pclss-pclHuazhong University of Science and TechnologyWanYusensemi-supervised pcl detection results after 24999 training iterate2021-12-18 04:06:21
tencent_retail_ft:DETtencent_retail_ft:DETtencent_retail_ftXingXing Wangmuti-test and muti-train,using voc2007+voc2012+mscoco daataset,firstly i train model using mscoco,then funetuning on the voc2007 and voc2012 dataset,resnet152 as backbone using feature map fusion, using focal loss and so on.2019-01-21 15:43:45
CloudMinds CV&AR DetectionCM-CV&AR: DETCloudMindsXiaoya Zhu, Yibing Nan, Wenqi WangCMDET is pre-trained on ImageNet dataset, fine-tuned on the MS COCO detection dataset. We use ResNeXt-101 as the backbone network, we adopt deformable convolution in the last stage of the backbone. Multi-scale + random flip techniques are used during training, in each iteration, the scale of short edge is randomly sampled from [400,1400], and the scale of long edge is fixed as 1600; while in the testing phase, Multi-scale techniques are used, and we use NMS to combine different scale results. 2019-08-20 10:47:35
Feature Edit with CNN featuresFeature EditThe University of FUDANZhiqiang Shen, Xiangyang Xue et al.We edit 5th CNN features with the network defined by Krizhevsky(2012), then add the new features to original feature set. Two stages are contained to find out the variables to inhibit. Step one is to find out the largest variance of subset within a class and step two is to find out ones with smallest inter-class variance. This edit operation is to handle the separation of different properties. A linear-SVM is boosted to classify the proposal regions and a bounding-box regression is also employed to reduce the localization errors.2014-09-06 15:58:29
Deep poseletsPoselets2FacebookFei Yang, Rob FergusPoselets trained with CNN. Ran original poselets on a large set of images, collected weakly labelled training data, trained a convolutional neural net and applied it to the test data. This method allows for training deep poselets without the need of lots of manual keypoint annotations. Poselets trained with CNN. Ran original poselets on a large set of images, collected weakly labelled training data, trained a convolutional neural net and applied it to the test data. THis field seems to be broken. Really you want that long of a description??2014-06-06 14:02:45