PASCAL Visual Object Classes Challenge 2006 (VOC2006) Annotation Guidelines
This document reproduces the guidelines used for annotating images in the VOC2006 data set.
Guidelines on what and how to label
What to label
All objects of the defined categories, unless:
If this is not possible because of too many objects, mark the
image as bad.
- you are unsure what the object is.
- the object is very small (at your discretion).
- less than 10-20% of the object is visible.
Record the viewpoint of the 'bulk' of the object e.g. the body
rather than the head. Allow viewpoints within 10-20 degrees. If
ambiguous, leave as 'Unspecified'.
Mark the bounding box of the visible area of the object (not the
estimated total extent of the object). The bounding box should
contain all visible pixels, except where the bounding box would
have to be made excessively large to include a few additional
pixels (<5%) e.g. a car aerial.
If more than 15-20% of the object is occluded and lies outside
the bounding box, mark as 'Truncated'. Do not mark as truncated if
the occluded area lies within the bounding box.
Images which are poor quality (e.g. excessive motion blur) should
be marked bad. However, poor illumination (e.g. objects in
silhouette) should not count as poor quality unless objects cannot
If an object is 'occluded' by a close-fitting occluder e.g.
clothing, mud, snow etc., then the occluder should be treated as
part of the object.
Do label objects visible through glass, but treat reflections on
the glass as occlusion.
Do label objects in mirrors.
Label objects in pictures/posters/signs only if they are
photorealistic but not if cartoons, symbols etc.
Guidelines on categorization
Includes cars, vans, people carriers etc. Do not label where only
the vehicle interior is shown.
Objects were marked as 'difficult' by a single annotator. Only
the image area corresponding to the bounding box of each object
was displayed and a subjective judgement of the difficulty of
recognizing the object was made. Reasons for marking an object as
difficult included small image area, blur, clutter, high level of
occlusion, occlusion of a very characteristic part of the object,