Dataset Open Access

CAD 120 affordance dataset

Sawatzky, Johann; Srikantha, Abhilash; Gall, Juergen

JSON-LD ( Export

  "description": "<p>% ==============================================================================<br>\n% CAD 120 Affordance Dataset<br>\n% Version 1.0<br>\n% ------------------------------------------------------------------------------<br>\n% If you use the dataset please cite:<br>\n%<br>\n% Johann Sawatzky, Abhilash Srikantha, Juergen Gall.<br>\n% Weakly Supervised Affordance Detection.<br>\n% IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17)<br>\n%<br>\n% and<br>\n%<br>\n% H. S. Koppula and A. Saxena.<br>\n% Physically grounded spatio-temporal object affordances.<br>\n% European Conference on Computer Vision (ECCV'14)<br>\n%<br>\n% Any bugs or questions, please email sawatzky AT iai DOT uni-bonn DOT de.<br>\n% ==============================================================================</p>\n\n<p>This is the CAD 120 Affordance Segmentation Dataset based on the Cornell Activity<br>\nDataset CAD 120 (see</p>\n\n<p>Content</p>\n\n<p>frames/*.png:<br>\nRGB frames selected from Cornell Activity Dataset. To find out the location of the frame<br>\nin the original videos, see video_info.txt.</p>\n\n<p>object_crop_images/*.png<br>\nimage crops taken from the selected frames and resized to 321*321. Each crop is a padded<br>\nbounding box of an object the human interacts with in the video. Due to the padding,<br>\nthe crops may contain background and other objects.<br>\nIn each selected frame, each bounding box was processed. The bounding boxes are already<br>\ngiven in the Cornell Activity Dataset.<br>\nThe 5-digit number gives the frame number, the second number gives the bounding box number<br>\nwithin the frame.</p>\n\n<p>segmentation_mat/*.mat<br>\n321*321*6 segmentation masks for the image crops. Each channel corresponds to an<br>\naffordance (openabe, cuttable, pourable, containable, supportable, holdable, in this order).<br>\nAll pixels belonging to a particular affordance are labeled 1 in the respective channel,<br>\notherwise 0. \u00a0</p>\n\n<p>segmentation_png/*.png<br>\n321*321 png images, each containing the binary mask for one of the affordances.</p>\n\n<p>lists/*.txt<br>\nLists containing the train and test sets for two splits. The actor split ensures that<br>\ntrain and test images stem from different videos with different actors while the object split ensures<br>\nthat train and test data have no (central) object classes in common.<br>\nThe train sets are additionally subdivided into 3 subsets A,B and C. For the actor split,<br>\nthe subsets stem from different videos. For the object split, each subset contains<br>\nevery third crop of the train set.</p>\n\n<p>crop_coordinate_info.txt<br>\nMaps image crops to their coordinates in the frames.</p>\n\n<p>hpose_info.txt<br>\nMaps frames to 2d human pose coordinates. Hand annotated by us.</p>\n\n<p>object_info.txt<br>\nMaps image crops to the (central) object it contains.</p>\n\n<p>visible_affordance_info.txt<br>\nMaps image crops to affordances visible in this crop</p>\n\n<p>\u00a0</p>\n\n<p>%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%55<br>\nThe crops contain the following object classes:<br>\n1.table<br>\n2.kettle<br>\n3.plate<br>\n4.bottle<br>\n5.thermal cup<br>\n6.knife<br>\n7.medicine box<br>\n8.can<br>\n9.microwave<br>\n10.paper box<br>\n11.bowl<br>\n12.mug</p>\n\n<p>Affordances in our set:<br>\n1.openable<br>\n2.cuttable<br>\n3.pourable<br>\n4.containable<br>\n5.supportable<br>\n6.holdable</p>\n\n<p>Note that our object affordance labeling differs from the Cornell Activity Dataset:<br>\nE.g. the cap of a pizza box is considered to be supportable.</p>\n\n<p>\u00a0</p>", 
  "license": "", 
  "creator": [
      "affiliation": "University of Bonn", 
      "@type": "Person", 
      "name": "Sawatzky, Johann"
      "affiliation": "Carl Zeiss AG", 
      "@type": "Person", 
      "name": "Srikantha, Abhilash"
      "affiliation": "University of Bonn", 
      "@type": "Person", 
      "name": "Gall, Juergen"
  "url": "", 
  "datePublished": "2017-04-07", 
  "@type": "Dataset", 
  "keywords": [
    "computer vision", 
    "semantic image segmentation", 
    "weakly supervised learning", 
    "convolutional neural network", 
    "anticipating human behavior", 
    "mapping on demand"
  "@context": "", 
  "distribution": [
      "contentUrl": "", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
  "identifier": "", 
  "@id": "", 
  "workFeatured": {
    "alternateName": "CVPR", 
    "@type": "Event", 
    "name": "Computer Vision and Pattern Recognition"
  "name": "CAD 120 affordance dataset"
All versions This version
Views 3,7893,791
Downloads 2,5542,554
Data volume 6.6 TB6.6 TB
Unique views 3,2513,253
Unique downloads 861861


Cite as