Mark Everingham
The goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). There are ten object classes:
For each of the ten object classes predict the presence/absence of at least one object of that class in a test image. The output from your system should be a real-valued confidence of the object's presence so that an ROC curve can be drawn.
For each of the ten classes predict the bounding boxes of each object of that class in a test image (if any). Each bounding box should be output with an associated real-valued confidence of the detection so that a precision/recall curve can be drawn.
There are four sets of images provided, for use in both the classification and detection tasks.
train | val | trainval | test | |||||
img | obj | img | obj | img | obj | img | obj | |
Bicycle | 127 | 161 | 143 | 162 | 270 | 323 | 268 | 326 |
Bus | 93 | 118 | 81 | 117 | 174 | 235 | 180 | 233 |
---|---|---|---|---|---|---|---|---|
Car | 271 | 427 | 282 | 427 | 553 | 854 | 544 | 854 |
Cat | 192 | 214 | 194 | 215 | 386 | 429 | 388 | 429 |
Cow | 102 | 156 | 104 | 157 | 206 | 313 | 197 | 315 |
Dog | 189 | 211 | 176 | 211 | 365 | 422 | 370 | 423 |
Horse | 129 | 164 | 118 | 162 | 247 | 326 | 254 | 324 |
Motorbike | 118 | 138 | 117 | 137 | 235 | 275 | 234 | 274 |
Person | 319 | 577 | 347 | 579 | 666 | 1156 | 675 | 1153 |
Sheep | 119 | 211 | 132 | 210 | 251 | 421 | 238 | 422 |
Total | 1277 | 2377 | 1341 | 2377 | 2618 | 4754 | 2686 | 4753 |
Table 1 summarizes the number of objects and images (containing at least one object of a given class) for each class and image set. The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. In total there are 5,304 images, containing 9,507 annotated objects.
Objects of the ten classes listed above are annotated in the ground truth. For each object, the following annotation is present:
Four competitions are defined according to the task and the choice of training data: (i) taken from the VOC trainval data provided, or (ii) from any source excluding the VOC test data provided:
No. | Task | Training data | Test data |
1 | Classification | trainval | test |
2 | Classification | any but VOC test | test |
3 | Detection | trainval | test |
4 | Detection | any but VOC test | test |
Any annotation provided in the VOC train and val sets may be used for training, for example bounding boxes or particular views e.g. 'frontal' or 'side'. Participants are free to perform manual annotation on the training data if they wish. Manual annotation of the test data to optimize algorithm performance is not permitted.
In competitions 2 and 4, any source of training data may be used except the provided test images. Researchers who have pre-built systems trained on other data are particularly encouraged to participate. The test data includes images from the Microsoft Research Cambridge object recognition database, and ``flickr'' (www.flickr.com); these sources of images may not be used for training. Participants who have acquired images from flickr for training must submit them to the organizers to check for overlap with the test set.
For each competition, participants may choose to tackle all, or any subset of object classes, for example ``cars only'' or ``motorbikes and cars''.
In contrast to the 2005 VOC challenge, ground truth for the test images will not be released. Participants are therefore required to submit the raw output of their classifier and detector implementations, rather than ROC and precision/recall curves as in the 2005 challenge.
Results for each competition are to be submitted as text files. Details on the submission procedure will be given at a later date. The file formats required for the classification and detection tasks are described here:
For the classification task, a separate text file of results should be generated for each competition (1 or 2) and each class e.g. `car'. Each line should contain a single identifier and the confidence output by the classifier, separated by a space, for example:
comp1_cls_val_car.txt: 000004 0.702732 000006 0.870849 000008 0.532489 000018 0.477167 000019 0.112426Greater confidence values signify greater confidence that the image contains an object of the class of interest. The example classifier implementation (section 4.1) includes code for generating a results file in the required format.
For the detection task, a separate text file of results should be generated for each competition (3 or 4) and each class e.g. `car'. Each line should be a detection output by the detector in the following format:
<image identifier> <confidence> <left> <top> <right> <bottom>where (left,top)-(right,bottom) defines the bounding box of the detected object. The top-left pixel in the image has coordinates . Greater confidence values signify greater confidence that the detection is correct. An example file excerpt is shown below. Note that for the image 000006, multiple objects are detected:
comp3_det_val_car.txt: 000004 0.702732 89 112 516 466 000006 0.870849 373 168 488 229 000006 0.852346 407 157 500 213 000006 0.914587 2 161 55 221 000008 0.532489 175 184 232 201The example detector implementation (section 4.2) includes code for generating a results file in the required format.
Participants are expected to submit a single set of results per method employed. Participants who have investigated several algorithms may submit one result per method. Changes in algorithm parameters do not constitute a different method - all parameter tuning must be conducted using the training and validation data alone.
The classification task will be judged by the Receiver Operating Characteristic (ROC) curve. The principal quantitative measure used will be the area under curve (AUC). Example code for computing the ROC and AUC measure is provided in the development kit.
Images which contain only objects marked as `difficult' (section 1.3.1) are currently ignored by the evaluation. The final evaluation may include separate results including such ``difficult'' images, depending on the submitted results.
The detection task will be judged by the precision/recall curve. The principal quantitative measure used will be the average precision (AP). Example code for computing the precision/recall and AP measure is provided in the development kit.
Detections are considered true or false positives based on the
area of overlap with ground truth bounding boxes. To be considered
a correct detection, the area of overlap between the
predicted bounding box and ground truth bounding box
must exceed by the formula:
(1) |
Objects marked as `difficult' (section 1.3.1) are currently ignored by the evaluation. The final evaluation may include separate results including such ``difficult'' images, depending on the submitted results.
The development kit is packaged in a single gzipped tar file containing code and (this) documentation. The images, annotation, and lists specifying training/validation sets for the challenge are provided in a separate archive which can be obtained via the VOC web pages.
The simplest installation is achieved by placing the development kit and challenge databases in a single location. After untarring the development kit, download the challenge image database and untar into the same directory, resulting in the following directory structure:
VOCdevkit/ % development kit VOCdevkit/VOCcode/ % PASCAL/VOC utility code VOCdevkit/results/ % directory for your results VOCdevkit/local/ % directory for example code VOCdevkit/VOC2006/ImageSets % image sets VOCdevkit/VOC2006/Annotations % annotation files VOCdevkit/VOC2006/PNGImages % images
If you set the current directory in MATLAB to the VOCdevkit directory you should be able to run the example functions example_classifier and example_detector.
If desired, you can store the code, images/annotation, and results in separate directories, for example you might want to store the image data in a common group location. To specify the locations of the image/annotation, results, and working directories, edit the VOCinit.m file, e.g.
% change this path to point to your copy of the PASCAL VOC data VOCopts.datadir='/homes/group/VOCdata/'; % change this path to a writable directory for your results VOCopts.resdir='/homes/me/VOCresults/'; % change this path to a writable local directory for the example code VOCopts.localdir='/tmp/';
Note that in developing your own code you need to include the VOCdevkit/VOCcode directory in your MATLAB path, e.g.
>> addpath /homes/me/code/VOCdevkit/VOCcode
Example implementations for both the classification and detection tasks are provided. The aim of these implementations is solely to demonstrate use of the code in the development kit.
The file example_classifier.m contains a complete implementation of the classification task. For each VOC object class a simple classifier is trained on the train set; the classifier is then applied to the val set and the output saved to a results file in the format required by the challenge; a Receiver Operating Characteristic (ROC) curve is plotted and the `area under curve' (AUC) measure displayed.
The file example_detector.m contains a complete implementation of the detection task. For each VOC object class a simple (and not very successful!) detector is trained on the train set; the detector is then applied to the val set and the output saved to a results file in the format required by the challenge; a precision/recall curve is plotted and the `average precision' (AP) measure displayed.
The development kit provides functions for loading annotation data. Example code for computing ROC and precision/recall curves and viewing annotation is also provided.
The VOC2006/ImageLists/ directory contains text files specifying lists of images for the train and val sets. Corresponding files for the test set will be provided at a later date.
The files train.txt, val.txt, and trainval.txt list the image identifiers for the corresponding image sets (training, validation, and training+validation). Each line of the file contains a single image identifier. The following MATLAB code reads the image list into a cell array of strings:
ids=textread(sprintf(VOCopts.imgsetpath,'imgset'),'%s');For a given image identifier ids{i}, the corresponding image and annotation file paths can be produced thus:
imgpath=sprintf(VOCopts.imgpath,ids{i}); annopath=sprintf(VOCopts.annopath,ids{i});Note that the image sets used are the same for all classes. For each competition, participants are expected to provide output for all images in the test set.
To simplify matters for participants tackling only the classification competition, class-specific image sets with per-image ground truth are also provided. The file <class>_<imgset>.txt contains image identifiers and ground truth for a particular class and image set, for example the file car_train.txt applies to the `car' class and train image set.
Each line of the file contains a single image identifier and ground truth label, separated by a space, for example:
000601 -1 000604 0 000610 1
There are three ground truth labels:
The ``difficult'' label indicates that all objects of the class of interest have been annotated as ``difficult'', for example an object which is clearly visible but difficult to recognize without substantial use of context. Currently the evaluation ignores such images, contributing nothing to the ROC curve or AUC measure. The final evaluation may include separate results including such ``difficult'' images, depending on the submitted results. Participants are free to include these images in training as either positive or negative examples.
The VOCinit script initializes a single structure VOCopts which contains options for the PASCAL functions including directories containing the VOC data and options for the evaluation functions (not to be modified).
The field classes lists the object classes for the challenge in a cell array:
VOCopts.classes={'bicycle','bus','car','cat','cow','dog',... 'horse','motorbike','person','sheep'};
The field testset specifies the image set used by the example evaluation functions for testing:
VOCopts.testset='val'; % use validation data for development
Other fields provide, for convenience, paths for the image and annotation data and results files. The use of these paths is illustrated in the example classifier and detector implementations.
The PASreadrecord function reads the annotation data for a particular image from the annotation file specified by filename, for example:
>> rec=PASreadrecord(sprintf(VOCopts.annopath,'000008')) rec = imgname: 'VOC2006/PNGImages/000008.png' imgsize: [500 375 3] database: 'The VOC2006 Database' objects: [1x2 struct]The imgname field specifies the path (relative to the main VOC data path) of the corresponding image. The imgsize field specifies the image dimensions as (width,height,depth). Objects annotated in the image are stored in the struct array objects, for example:
>> rec.objects(1) ans = label: 'PAShorseTrunc' orglabel: 'PAShorseTrunc' bbox: [267 72 498 326] polygon: [] mask: '' class: 'horse' view: '' truncated: 1 difficult: 0The label field specifies the PASCAL label for the object. For convenience, the class field contains the corresponding VOC challenge class. The view field contains the view: Frontal, Rear, Left (side view, facing left of image), Right (side view, facing right of image), or an empty string indicating another, or un-annotated view.
The bbox field specifies the bounding box of the object in the image, as [left,top,right,bottom]. The top-left pixel in the image has coordinates .
The truncated field being set to 1 indicates that the object is ``truncated'' in the image. The definition of truncated is that the bounding box of the object specified does not correspond to the full extent of the object e.g. an image of a person from the waist up, or a view of a car extending outside the image. Participants are free to use or ignore this field as they see fit.
The difficult field being set to 1 indicates that the object has been annotated as ``difficult'', for example an object which is clearly visible but difficult to recognize without substantial use of context. Currently the evaluation ignores such objects, contributing nothing to the precision/recall curve. The final evaluation may include separate results including such ``difficult'' objects, depending on the submitted results. Participants may include or exclude these objects from training as they see fit.
The VOCroc function computes an ROC curve for the classification task. The arguments id and cls specify the results file to be loaded, for example:
>> [fp,tp,auc]=VOCroc(VOCopts,'comp1','car',true);See example_classifier for further examples. If the argument draw is true, the ROC curve is drawn in a figure window. The function returns vectors of false and true positive rates in fp and tp, and the area under curve (AUC) measure in auc.
The VOCpr function computes a precision/recall curve for the detection task. The arguments id and cls specify the results file to be loaded, for example:
>> [rec,prec,ap]=VOCpr(VOCopts,'comp3','car',true);See example_detector for further examples. If the argument draw is true, the precision/recall curve is drawn in a figure window. The function returns vectors of recall and precision rates in rec and prec, and the average precision measure in ap.
The viewanno function displays the annotation for images in the image set specified by imgset. The image set may be one of 'train', 'trainval' or 'val', or include a class, for example 'car_train':
>> viewanno('train'); >> viewanno('car_train');
The viewdet function displays the detections stored in a results file for the detection task. The arguments id and cls specify the results file to be loaded, for example:
>> viewdet('comp3','car',true)If the onlytp argument is true, only the detections considered true positives by the VOC evaluation measure are displayed.
This appendix lists the guidelines on annotation which were given to annotators.
This document was generated using the LaTeX2HTML translator Version 2002 (1.62)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -no_navigation -show_section_numbers voc.tex
The translation was initiated by me on 2006-02-14