PASCAL VOC Challenge performance evaluation and download server |
|
Home | Leaderboard |
mean | aero plane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | dining table | dog | horse | motor bike | person | potted plant | sheep | sofa | train | tv/ monitor | submission date | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fast R-CNN + YOLO [?] | 70.8 | 82.7 | 77.7 | 74.3 | 59.1 | 47.1 | 78.0 | 73.1 | 89.2 | 49.6 | 74.3 | 55.9 | 87.4 | 79.8 | 82.2 | 75.3 | 43.1 | 71.4 | 67.8 | 81.9 | 65.6 | 05-Jun-2015 | |
Fast R-CNN VGG16 extra data [?] | 68.8 | 82.0 | 77.8 | 71.6 | 55.3 | 42.4 | 77.3 | 71.7 | 89.3 | 44.5 | 72.1 | 53.7 | 87.7 | 80.0 | 82.5 | 72.7 | 36.6 | 68.7 | 65.4 | 81.1 | 62.7 | 18-Apr-2015 | |
segDeepM [?] | 67.2 | 82.3 | 75.2 | 67.1 | 50.7 | 49.8 | 71.1 | 69.6 | 88.2 | 42.5 | 71.2 | 50.0 | 85.7 | 76.6 | 81.8 | 69.3 | 41.5 | 71.9 | 62.2 | 73.2 | 64.6 | 29-Jan-2015 | |
BabyLearning [?] | 63.8 | 77.7 | 73.8 | 62.3 | 48.8 | 45.4 | 67.3 | 67.0 | 80.3 | 41.3 | 70.8 | 49.7 | 79.5 | 74.7 | 78.6 | 64.5 | 36.0 | 69.9 | 55.7 | 70.4 | 61.7 | 12-Nov-2014 | |
R-CNN (bbox reg) [?] | 62.9 | 79.3 | 72.4 | 63.1 | 44.0 | 44.4 | 64.6 | 66.3 | 84.9 | 38.8 | 67.3 | 48.4 | 82.3 | 75.0 | 76.7 | 65.7 | 35.8 | 66.2 | 54.8 | 69.1 | 58.8 | 27-Oct-2014 | |
R-CNN [?] | 59.8 | 76.5 | 70.4 | 58.0 | 40.2 | 39.6 | 61.8 | 63.7 | 81.0 | 36.2 | 64.5 | 45.7 | 80.5 | 71.9 | 74.3 | 60.6 | 31.5 | 64.7 | 52.5 | 64.6 | 57.2 | 27-Oct-2014 | |
YOLO [?] | 58.8 | 78.0 | 67.3 | 59.4 | 42.0 | 25.7 | 68.6 | 56.7 | 81.7 | 37.4 | 62.8 | 48.0 | 77.8 | 72.9 | 72.2 | 63.9 | 29.9 | 53.4 | 53.4 | 74.8 | 50.8 | 06-Nov-2015 | |
Feature Edit [?] | 56.4 | 74.8 | 69.2 | 55.7 | 41.9 | 36.1 | 64.7 | 62.3 | 69.5 | 31.3 | 53.3 | 43.7 | 69.9 | 64.0 | 71.8 | 60.5 | 32.7 | 63.0 | 44.1 | 63.6 | 56.6 | 04-Sep-2014 | |
R-CNN (bbox reg) [?] | 53.7 | 71.8 | 65.8 | 53.0 | 36.8 | 35.9 | 59.7 | 60.0 | 69.9 | 27.9 | 50.6 | 41.4 | 70.0 | 62.0 | 69.0 | 58.1 | 29.5 | 59.4 | 39.3 | 61.2 | 52.4 | 13-Mar-2014 | |
R-CNN [?] | 50.2 | 67.1 | 64.1 | 46.7 | 32.0 | 30.5 | 56.4 | 57.2 | 65.9 | 27.0 | 47.3 | 40.9 | 66.6 | 57.8 | 65.9 | 53.6 | 26.7 | 56.5 | 38.1 | 52.8 | 50.2 | 30-Jan-2014 | |
poselets [?] | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 59.3 | - | - | - | - | - | 06-Jun-2014 | |
Head-Detect-Segment [?] | - | - | - | - | - | - | - | - | 41.7 | - | - | - | - | - | - | - | - | - | - | - | - | 30-Aug-2010 | |
BERKELEY POSELETS [?] | - | 33.2 | 51.9 | 8.5 | 8.2 | 34.8 | 39.0 | 48.8 | 22.2 | - | 20.6 | - | 18.5 | 48.2 | 44.1 | 48.5 | 9.1 | 28.0 | 13.0 | 22.5 | 33.0 | 29-Aug-2010 | |
UCI_LSVM-MDPM-10X [?] | - | - | 48.1 | - | - | - | 54.7 | - | - | - | 25.1 | 6.0 | - | 46.7 | 41.1 | - | - | 31.2 | 17.7 | - | 32.3 | 30-Aug-2010 |
Title | Method | Affiliation | Contributors | Description | Date |
---|---|---|---|---|---|
Multiclass poselets | BERKELEY POSELETS | UC Berkeley / Adobe | Subhransu Maji, Thomas Brox, Jitendra Malik | Poselets based on Bourdev et al ECCV 2010, extended for multiple categories. | 2010-08-29 21:20:30 |
Computational Baby Learning | BabyLearning | National University of Singapore | Xiaodan Liang, Si Liu, Yunchao Wei, Luoqi Liu, Liang Lin, Shuicheng Yan | This entry is an implementation of the framework described in "Computational Baby Learning" (http://arxiv.org/abs/1411.2861). We build a computational model to interpret and mimic the baby learning process, based on prior knowledge modelling, exemplar learning, and learning with video contexts. Training data: (1) We used only two positive instances along with ~20,000 unlabelled videos to train the detector for each object category. (2) We used data from ILSVRC 2012 to pre-train the Network in Network [1] and fine-tuned the network with our newly mined instances. [1] Min Lin, Qiang Chen, Shuicheng Yan. Network In Network. In ICLR 2014. | 2014-11-12 03:35:57 |
Fast R-CNN with YOLO Rescoring | Fast R-CNN + YOLO | University of Washington | Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi | We use the YOLO detection method to rescore the bounding boxes from Fast R-CNN. This helps mitigate false background detections and improve overall performance. For more information and example code see: http://pjreddie.com/darknet/yolo/ | 2015-06-05 04:05:10 |
Fast R-CNN VGG16 extra data | Fast R-CNN VGG16 extra data | Microsoft Research | Ross Girshick | Fast R-CNN is a new algorithm for training R-CNNs. The training process is a single fine-tuning run that jointly trains for softmax classification and bounding-box regression. Training took ~22 hours on a single GPU and testing takes ~330ms / image. A tech report describing the method is forthcoming. Open source code will be release. This entry was trained on VOC 2012 train+val union with VOC 2007 train+val+test. | 2015-04-18 19:42:04 |
Feature Edit with CNN features | Feature Edit | The University of FUDAN | Zhiqiang Shen, Xiangyang Xue et al. | We edit 5th CNN features with the network defined by Krizhevsky(2012), then add the new features to original feature set. Two stages are contained to find out the variables to inhibit. Step one is to find out the largest variance of subset within a class and step two is to find out ones with smallest inter-class variance. | 2014-09-04 03:46:10 |
Cat-Cut | Head-Detect-Segment | CVIT-IIIT,Hyderabad, VGG University of Oxford | Omkar M Parkhi, Andrea Vedaldi, C.V.Jawahar, Andrew Zisserman | Detector is trained to detect cat heads. The detections returned are used to initialize seeds for GrabCut which segments the cat. Bounding box is then inferred from these segmentations. | 2010-08-30 21:56:06 |
Region-based CNN | R-CNN | UC Berkeley | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | This entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524 version 5). Code is available at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. The same detection SVMs were used for the 2012 and 2010 results. For this submission, we used the 16-layer ConvNet from Simonyan & Zisserman instead of Krizhevsky et al.'s ConvNet. | 2014-10-27 15:56:23 |
Regions with Convolutional Neural Network Features | R-CNN | UC Berkeley | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | This entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524). We made two small changes relative to the arXiv tech report that are responsible for improved performance: (1) we added a small amount of context around each region proposal (16px at the warped size) and (2) we used a higher learning rate while fine-tuning (starting at 0.001). Aside from non-maximum suppression no additional post-processing (e.g., detector or image classification context) was applied. Code will be made available soon at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 train (3) We trained object detector SVMs using 2012 train+val The same detection SVMs were used for the 2012 and 2010 results. | 2014-01-30 02:13:56 |
Regions with Convolutional Neural Network Features | R-CNN (bbox reg) | UC Berkeley | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | This entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524). We made two small changes relative to the arXiv tech report that are responsible for improved performance: (1) we added a small amount of context around each region proposal (16px at the warped size) and (2) we used a higher learning rate while fine-tuning (starting at 0.001). Aside from non-maximum suppression no additional post-processing (e.g., detector or image classification context) was applied. Code will be made available soon at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 train (3) We trained object detector SVMs using 2012 train+val The same detection SVMs were used for the 2012 and 2010 results. This submission includes a simple regression from pool5 features to bounding box coordinates. | 2014-03-13 18:50:34 |
Region-based CNN | R-CNN (bbox reg) | UC Berkeley | Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik | This entry is an implementation of the system described in "Rich feature hierarchies for accurate object detection and semantic segmentation" (http://arxiv.org/abs/1311.2524 version 5). Code is available at http://www.cs.berkeley.edu/~rbg/. Training data: (1) We used ILSVRC 2012 to pre-train the ConvNet (using caffe) (2) We fine-tuned the resulting ConvNet using 2012 trainval (3) We trained object detector SVMs using 2012 trainval. The same detection SVMs were used for the 2012 and 2010 results. For this submission, we used the 16-layer ConvNet from Simonyan & Zisserman instead of Krizhevsky et al.'s ConvNet. | 2014-10-27 15:48:35 |
10x train set for LSVM, mixtures, deformable parts | UCI_LSVM-MDPM-10X | University of California, Irvine | Xiangxin Zhu, Carl Vondrick, Deva Ramanan, Charless Fowlkes | We downloaded additional images from Flickr that match the distribution of the testing set. We used Amazon's Mechanical Turk to annotate these training sets that are 10 times larger the standard trainval set. We used our larger training set to train models with the detector from Felzenswalb et. all. | 2010-08-30 04:33:45 |
You Only Look Once: Unified, Real-Time Detection | YOLO | University of Washington | Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi | We train a convolutional neural network to perform end-to-end object detection. Our network processes the full image and outputs multiple bounding boxes and class probabilities. At test time we process images in real-time at 45fps. For more information and example code see: http://pjreddie.com/darknet/yolo/ | 2015-11-06 07:24:21 |
Deep poselets | poselets | Fei Yang, Rob Fergus | Poselets trained with CNN. Ran original poselets on a large set of images, collected weakly labelled training data, trained a convolutional neural net and applied it to the test data. This method allows for training deep poselets without the need of lots of manual keypoint annotations. Poselets trained with CNN. Ran original poselets on a large set of images, collected weakly labelled training data, trained a convolutional neural net and applied it to the test data. THis field seems to be broken. Really you want that long of a description?? | 2014-06-06 16:42:35 | |
CNN with Segmentation and Context Cues | segDeepM | University of Toronto | Yukun Zhu, Ruslan Salakhutdinov, Raquel Urtasun, Sanja Fidler | Our method exploits object segmentation in order to improve the accuracy of object detection. We frame object detection problem as inference in a Markov Random Field, in which each detection hypothesis scores object appearance as well as contextual information using Convolutional Neural Networks, and allows the hypothesis to choose and score a segment out of a large pool of accurate object segmentation proposals. This implementation adopts 16 layer CNN instead of Krizhevsky et al.'s network for appearance model and CPMC for generating segmentation proposals. | 2015-01-29 02:09:13 |