Dataset Open Access

CAD 120 affordance dataset

Sawatzky, Johann; Srikantha, Abhilash; Gall, Juergen

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.495570", 
  "title": "CAD 120 affordance dataset", 
  "issued": {
    "date-parts": [
  "abstract": "<p>% ==============================================================================<br>\n% CAD 120 Affordance Dataset<br>\n% Version 1.0<br>\n% ------------------------------------------------------------------------------<br>\n% If you use the dataset please cite:<br>\n%<br>\n% Johann Sawatzky, Abhilash Srikantha, Juergen Gall.<br>\n% Weakly Supervised Affordance Detection.<br>\n% IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17)<br>\n%<br>\n% and<br>\n%<br>\n% H. S. Koppula and A. Saxena.<br>\n% Physically grounded spatio-temporal object affordances.<br>\n% European Conference on Computer Vision (ECCV'14)<br>\n%<br>\n% Any bugs or questions, please email sawatzky AT iai DOT uni-bonn DOT de.<br>\n% ==============================================================================</p>\n\n<p>This is the CAD 120 Affordance Segmentation Dataset based on the Cornell Activity<br>\nDataset CAD 120 (see</p>\n\n<p>Content</p>\n\n<p>frames/*.png:<br>\nRGB frames selected from Cornell Activity Dataset. To find out the location of the frame<br>\nin the original videos, see video_info.txt.</p>\n\n<p>object_crop_images/*.png<br>\nimage crops taken from the selected frames and resized to 321*321. Each crop is a padded<br>\nbounding box of an object the human interacts with in the video. Due to the padding,<br>\nthe crops may contain background and other objects.<br>\nIn each selected frame, each bounding box was processed. The bounding boxes are already<br>\ngiven in the Cornell Activity Dataset.<br>\nThe 5-digit number gives the frame number, the second number gives the bounding box number<br>\nwithin the frame.</p>\n\n<p>segmentation_mat/*.mat<br>\n321*321*6 segmentation masks for the image crops. Each channel corresponds to an<br>\naffordance (openabe, cuttable, pourable, containable, supportable, holdable, in this order).<br>\nAll pixels belonging to a particular affordance are labeled 1 in the respective channel,<br>\notherwise 0. \u00a0</p>\n\n<p>segmentation_png/*.png<br>\n321*321 png images, each containing the binary mask for one of the affordances.</p>\n\n<p>lists/*.txt<br>\nLists containing the train and test sets for two splits. The actor split ensures that<br>\ntrain and test images stem from different videos with different actors while the object split ensures<br>\nthat train and test data have no (central) object classes in common.<br>\nThe train sets are additionally subdivided into 3 subsets A,B and C. For the actor split,<br>\nthe subsets stem from different videos. For the object split, each subset contains<br>\nevery third crop of the train set.</p>\n\n<p>crop_coordinate_info.txt<br>\nMaps image crops to their coordinates in the frames.</p>\n\n<p>hpose_info.txt<br>\nMaps frames to 2d human pose coordinates. Hand annotated by us.</p>\n\n<p>object_info.txt<br>\nMaps image crops to the (central) object it contains.</p>\n\n<p>visible_affordance_info.txt<br>\nMaps image crops to affordances visible in this crop</p>\n\n<p>\u00a0</p>\n\n<p>%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%55<br>\nThe crops contain the following object classes:<br>\n1.table<br>\n2.kettle<br>\n3.plate<br>\n4.bottle<br>\n5.thermal cup<br>\n6.knife<br>\n7.medicine box<br>\n8.can<br>\n9.microwave<br>\n10.paper box<br>\n11.bowl<br>\n12.mug</p>\n\n<p>Affordances in our set:<br>\n1.openable<br>\n2.cuttable<br>\n3.pourable<br>\n4.containable<br>\n5.supportable<br>\n6.holdable</p>\n\n<p>Note that our object affordance labeling differs from the Cornell Activity Dataset:<br>\nE.g. the cap of a pizza box is considered to be supportable.</p>\n\n<p>\u00a0</p>", 
  "author": [
      "family": "Sawatzky, Johann"
      "family": "Srikantha, Abhilash"
      "family": "Gall, Juergen"
  "id": "495570", 
  "note": "Acknowledgments. The work has been financially sup-\nported by the DFG projects GA 1927/5-1 (DFG Research\nUnit FOR 2535 Anticipating Human Behavior) and GA\n1927/2-2 (DFG Research Unit FOR 1505 Mapping on De-\nmand).", 
  "type": "dataset", 
  "event": "Computer Vision and Pattern Recognition (CVPR)"
All versions This version
Views 3,7883,790
Downloads 2,5542,554
Data volume 6.6 TB6.6 TB
Unique views 3,2503,252
Unique downloads 861861


Cite as