PASCAL Visual Object Classes Challenge 2006 (VOC2006) Annotation Guidelines

This document reproduces the guidelines used for annotating images in the VOC2006 data set.

Guidelines on what and how to label

What to label

All objects of the defined categories, unless: If this is not possible because of too many objects, mark the image as bad.


Record the viewpoint of the 'bulk' of the object e.g. the body rather than the head. Allow viewpoints within 10-20 degrees. If ambiguous, leave as 'Unspecified'.

Bounding box

Mark the bounding box of the visible area of the object (not the estimated total extent of the object). The bounding box should contain all visible pixels, except where the bounding box would have to be made excessively large to include a few additional pixels (<5%) e.g. a car aerial.


If more than 15-20% of the object is occluded and lies outside the bounding box, mark as 'Truncated'. Do not mark as truncated if the occluded area lies within the bounding box.

Image quality/illumination

Images which are poor quality (e.g. excessive motion blur) should be marked bad. However, poor illumination (e.g. objects in silhouette) should not count as poor quality unless objects cannot be recognized.

Clothing/mud/snow etc.

If an object is 'occluded' by a close-fitting occluder e.g. clothing, mud, snow etc., then the occluder should be treated as part of the object.


Do label objects visible through glass, but treat reflections on the glass as occlusion.


Do label objects in mirrors.


Label objects in pictures/posters/signs only if they are photorealistic but not if cartoons, symbols etc.

Guidelines on categorization


Includes cars, vans, people carriers etc. Do not label where only the vehicle interior is shown.

'Difficult' flag

Objects were marked as 'difficult' by a single annotator. Only the image area corresponding to the bounding box of each object was displayed and a subjective judgement of the difficulty of recognizing the object was made. Reasons for marking an object as difficult included small image area, blur, clutter, high level of occlusion, occlusion of a very characteristic part of the object, etc.